CUDA memory issue when running langchain Q&A bot with python dash app: how to fix 'torch.cuda.OutOfMemoryError'?
Asked Answered
D

0

7

Building a langchain Q&A bot and serving up with a python dash app.

Error:

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 64.00 MiB (GPU 0; 4.00 GiB total capacity; 3.44 GiB already allocated; 0 bytes free; 3.44 GiB reserved in total by PyTorch)

If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Runs fine on CPU; attempting to get CUDA to work for scalability.

What I tried:

  1. Setting PYTORCH_CUDA_ALLOC_CONF to 512mb.
  2. Introducing batch_size=1;.
  3. Switching between 'stuff' and 'map_reduce' for chain_type.

None of the above solved the issue.

vector_db = Chroma(
    persist_directory = "",
    embedding_function = HuggingFaceInstructEmbeddings(
        model_name = "hkunlp/instructor-xl",
        model_kwargs = {
            "device": "cuda"
        }))

llm = AzureOpenAI("",batch_size=1)

qa_chain = RetrievalQA.from_chain_type(
    llm = llm, chain_type = "map_reduce",
    retriever = vector_db.as_retriever(
        search_kwargs = {
            'k': 1
        }), return_source_documents = True)
Dishabille answered 27/5, 2023 at 12:48 Comment(2)
How are you calling the qa_chain? I assume you're doing it async and this is causing the issue. If you do it sequentially, does it work? Have you tried renting a GPU with larger memory?Digitiform
import torch import gc torch.cuda.empty_cache() gc.collect()Shipman

© 2022 - 2024 — McMap. All rights reserved.