I want to create a self hosted LLM model that will be able to have a context of my own custom data (Slack conversations for that matter).
I've heard Vicuna is a great alternative to ChatGPT and so I made the below code:
from llama_index import SimpleDirectoryReader, LangchainEmbedding, GPTListIndex, \
GPTSimpleVectorIndex, PromptHelper, LLMPredictor, Document, ServiceContext
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
import torch
from langchain.llms.base import LLM
from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM
!export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512
class CustomLLM(LLM):
model_name = "eachadea/vicuna-13b-1.1"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
pipeline = pipeline("text2text-generation", model=model, tokenizer=tokenizer, device=0,
model_kwargs={"torch_dtype":torch.bfloat16})
def _call(self, prompt, stop=None):
return self.pipeline(prompt, max_length=9999)[0]["generated_text"]
def _identifying_params(self):
return {"name_of_model": self.model_name}
def _llm_type(self):
return "custom"
llm_predictor = LLMPredictor(llm=CustomLLM())
But sadly I'm hitting the below error:
OutOfMemoryError: CUDA out of memory. Tried to allocate 270.00 MiB (GPU 0; 22.03 GiB total capacity; 21.65 GiB
already allocated; 94.88 MiB free; 21.65 GiB reserved in total by PyTorch) If reserved memory is >> allocated
memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and
PYTORCH_CUDA_ALLOC_CONF
Here's the output of !nvidia-smi
(before running anything):
Thu Apr 20 18:04:00 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02 Driver Version: 530.30.02 CUDA Version: 12.1 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA A10G Off| 00000000:00:1E.0 Off | 0 |
| 0% 23C P0 52W / 300W| 0MiB / 23028MiB | 18% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
Any idea how to modify my code to make it work?