Run an Offline, Unfiltered ChatGPT on your Laptop.

Introduction

The abbreviation 'LLM' stands for Large Language Model, these are Machine Learning Algorithms trained on huge stores of data to create programs that can converse in a human-like manner. ChatGPT is one such LLM, we're familiar with it, and this tutorial will enable you to quickly run an Unfiltered, Offline LLM similar to ChatGPT.

I started at Zero, so I can tell you that you don't need to know anything about Programming or AI to get started. I've made sure this article is as straightforward and easy-to-use as possible. I expect you to know how to Open a Terminal and Navigate Directories.

If you still have issues (unlikely), my contact information is here.

Why?

The number of good reasons for running an LLM on your Laptop extend beyond 'because you can'. (Though it was my reason to try 🙂)

Data Privacy. Your conversations don't leave your computer. Online Services (Primarily OpenAI) read and train their models using your conversations, similar concerns over data privacy caused Italy to temporarily ban ChatGPT.
Run anywhere, Anytime. You don't need an internet connection, nor do you need to depend on a server. It's just you.
Unfiltered, Unbiased. The models you'll be using from today onwards are completely unbiased and unfiltered. They'll give you what you ask for without any qualms.

I'll get straight to it. I tested multiple LLMs to see which would satisfy the following conditions:

Run on my Laptop (12 CPU Cores, 16 GB RAM - This is similar to most current computers)
Write Unfiltered Content (Ever see the "Content Guidelines" response on ChatGPT? You won't have that issue here)

I tested multiple models, the name-worthy ones being LLaMa 7B/13B/30B, GPT4All-Lora, WizardLM-7B, and Wizard-Vicuna-7B/13B. I linked to the three best models in this article.

Download the Model

Pick the most suitable one for you below. I'd recommend WizardLM-7B (the first link) to get started, if you have the storage, go with Vicuna-13B (the second link).

Format: Model Name: URL (size) (speed)

WizardLM 7B: https://huggingface.co/ehartford/WizardLM-7B-Uncensored (29 GB) (133ms/token)
Wizard-Vicuna 13B: https://huggingface.co/ehartford/Wizard-Vicuna-13B-Uncensored/ (104 GB) (250ms/token)
Wizard-Vicuna 7B: https://huggingface.co/ehartford/Wizard-Vicuna-7B-Uncensored/ (54 GB) (133ms/token)

(Advanced Readers: Find more unfiltered models here)

(The simplest way to download these files is via Git and Git-LFS. Download both of them first, then pick one of the models above and press the link beside it. Once the page loads, press the three dots to the left of the 'Train' button, and click on 'Clone Repository'. Copy the first three lines and paste them into your terminal.)

The Wizard-Vicuna 13B Model, and the WizardLM 7B Model outperformed all the other competitors in terms of speed and prompt adherence, with Wizard-Vicuna 13B being the smartest, with WizardLM 7B being considerably smart, and twice as fast as Wizard-Vicuna 13B.

While the model(s) download, follow the steps below.

Run the Model

(I'm working on a Dockerfile to make this easier, but I've explained every step. You'll be fine, follow the instructions exactly)

Clone https://github.com/ggerganov/llama.cpp (Command: git clone https://github.com/ggerganov/llama.cpp)
Enter the directory (llama.cpp) and compile (Instructions here)
Create Python Virtual Environment (I used Python3.10) (Command: python -m venv venv) (Install Python first)
Install Python Requirements (Command: pip install -r requirements.txt)
Once your model is downloaded, move the downloaded folder to the ./models folder. Remember the name of this folder, whenever you see <model_name>, replace it with the name of this folder.
Convert each model to the GGML Format: (Command: python3 convert.py ./models/<model_name>). Once this command is done executing, copy the filename of the created file (it'll be on the last line). Whenever you see <ggml_file>, replace it with the filename.
Quantize each model: ./quantize ./models/<model_name>/<ggml_file> ./models/<model_name>/ggml-model-q4_0.bin q4_0
Run the model: ./main -m ./models/<model_name>/ggml-model-q4_0.bin --color -r "User:" --in-prefix " " --in-suffix "Assistant:" --keep -1 --repeat_last_n -1 --interactive-first --n-predict 512 --temp 0.8 (Use this command to run the model in the future.)

(--n-predict is the number of tokens to generate, --temp is temperature, --color makes your input colored green, --interactive-first allows you to type first before the model generates, --repeat_last_n 'last n tokens to consider for penalize', -1 means the same number of tokens as ctx_size. View all options with ./main --help)

Aand, that's it! You now have a private, unbiased and unfiltered ChatGPT-like Chatbot running your device :)

My Sincere Appreciation to the Open-Source AI Community

Wholly due to the Open-Source Community's contributions towards democratizing the wave of Artificial Intelligence developments, we're able to run huge, billion parameter models that are comparable to OpenAI's ChatGPT on our laptops. Going from running on expensive data-center hardware to a consumer-grade device at similar speeds, without a big loss in quality. It's all due to the pioneers front-heading the Open-Source AI Revolution.

This project makes heavy use of llama.cpp, which is a project that allows standardize, quantize (make a model easier to run, for a small loss in quality) and run resource-intensive models completely on absolutely normal technology.

Thank you, to all the developers that made this possible.

Offline, Unfiltered ChatGPT on a Laptop

0-Hero Tutorial to run an Unfiltered LLM (like ChatGPT) on a Laptop with fast text generation. No knowledge (or GPU) required.

Table of contents

Introduction

Why?

Download the Model

Run the Model

My Sincere Appreciation to the Open-Source AI Community