How do i add memory to RetrievalQA.from_chain_type? or, how do I add a custom prompt to ConversationalRetrievalChain?
Asked Answered
M

4

16

How do i add memory to RetrievalQA.from_chain_type? or, how do I add a custom prompt to ConversationalRetrievalChain?

For the past 2 weeks ive been trying to make a chatbot that can chat over documents (so not in just a semantic search/qa so with memory) but also with a custom prompt. I've tried every combination of all the chains and so far the closest I've gotten is ConversationalRetrievalChain, but without custom prompts, and RetrievalQA.from_chain_type but without memory

Mar answered 13/5, 2023 at 2:43 Comment(0)
N
5

Here's a solution with ConversationalRetrievalChain, with memory and custom prompts, using the default 'stuff' chain type.

There are two prompts that can be customized here. First, the prompt that condenses conversation history plus current user input (condense_question_prompt), and second, the prompt that instructs the Chain on how to return a final response to the user (which happens in the combine_docs_chain).

from langchain import PromptTemplate

# note that the input variables ('question', etc) are defaults, and can be changed

condense_prompt = PromptTemplate.from_template(
    ('Do X with user input ({question}), and do Y with chat history ({chat_history}).')
)

combine_docs_custom_prompt = PromptTemplate.from_template(
    ('Write a haiku about a dolphin.\n\n'
     'Completely ignore any context, such as {context}, or the question ({question}).')
)

Now we can initialize the ConversationalRetrievalChain with the custom prompts.

from langchain.llms import OpenAI
from langchain.chains import ConversationalRetrievalChain
from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)

chain = ConversationalRetrievalChain.from_llm(
    OpenAI(temperature=0), 
    vectorstore.as_retriever(), # see below for vectorstore definition
    memory=memory,
    condense_question_prompt=condense_prompt,
    combine_docs_chain_kwargs=dict(prompt=combine_docs_custom_prompt)
)

Note that this calls _load_stuff_chain() under the hood, which allows for an optional prompt kwarg (that's what we can customize). This is used to set the LLMChain , which then goes to initialize the StuffDocumentsChain.

We can test the setup with a simple query to the vectorstore (see below for example vectorstore data) - you can see how the output is determined completely by the custom prompt:

chain("What color is mentioned in the document about cats?")['answer']
#'\n\nDolphin leaps in sea\nGraceful and playful in blue\nJoyful in the waves'

And memory is working correctly:

chain.memory
#ConversationBufferMemory(chat_memory=ChatMessageHistory(messages=[HumanMessage(content='What color is mentioned in the document about cats?', additional_kwargs={}), AIMessage(content='\n\nDolphin leaps in sea\nGraceful and playful in blue\nJoyful in the waves', additional_kwargs={})]), output_key=None, input_key=None, return_messages=True, human_prefix='Human', ai_prefix='AI', memory_key='chat_history')

Example vectorstore dataset with ephemeral ChromaDB instance:

from langchain.vectorstores import Chroma
from langchain.document_loaders import DataFrameLoader
from langchain.embeddings.openai import OpenAIEmbeddings

data = {
    'index': ['001', '002', '003'], 
    'text': [
        'title: cat friend\ni like cats and the color blue.', 
        'title: dog friend\ni like dogs and the smell of rain.', 
        'title: bird friend\ni like birds and the feel of sunshine.'
    ]
}

df = pd.DataFrame(data)
loader = DataFrameLoader(df, page_content_column="text")
docs = loader.load()

embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(docs, embeddings)
Nozzle answered 15/5, 2023 at 20:31 Comment(2)
Condense question is the prompt that processes user input and chat history. Combine docs is how the output/response back to the user is handled after the retrieval happens. Definitely a mix of art and science to arrive on the best way to engineer those prompts in the chain for your specific use case - I am also still learning. Does this solution answer your original question, how to include memory and custom prompts in a conversational chain?Nozzle
what is the difference between RetrievalQA, RetrievalQAWithSources and ConversationalRetrievalChain?Amourpropre
L
22

Update: This post answers the first part of OP's question:

how do i add memory to RetrievalQA.from_chain_type?

For the second part, see @andrew_reece's answer

or, how do I add a custom prompt to ConversationalRetrievalChain?

Original:

Have you tried passing in chain_type_kwargs (at the bottom is a screenshot from the source code for quick references)?

The documentation hasn't make it very easy to unserstand what's under the hood, but here is something that could achieve your goal.

You could find the notebook at this GitHub Link setup

from langchain.chat_models import ChatOpenAI
from langchain.chains import RetrievalQA
from langchain.memory import ConversationBufferMemory
from langchain import PromptTemplate
from langchain.retrievers import TFIDFRetriever


retriever = TFIDFRetriever.from_texts(
    ["Our client, a gentleman named Jason, has a dog whose name is Dobby",
     "Jason has a good friend called Emma",
     "Emma has a cat whose name is Sullivan"])

Then define your customized prompt:

template = """
Use the following context (delimited by <ctx></ctx>) and the chat history (delimited by <hs></hs>) to answer the question:
------
<ctx>
{context}
</ctx>
------
<hs>
{history}
</hs>
------
{question}
Answer:
"""
prompt = PromptTemplate(
    input_variables=["history", "context", "question"],
    template=template,
)

Take the note of what you used for your input variables, especially 'history' and 'question', since you will need to match these when setting up the memory:

qa = RetrievalQA.from_chain_type(
    llm=ChatOpenAI(),
    chain_type='stuff',
    retriever=retriever,
    verbose=True,
    chain_type_kwargs={
        "verbose": True,
        "prompt": prompt,
        "memory": ConversationBufferMemory(
            memory_key="history",
            input_key="question"),
    }
)

Now you can call qa.run({"query": "who's the client's friend?"})

"The client's friend is Emma."

and then qa.run("and her pet's name is?")

"Emma's pet's name is Sullivan."

To check and verify the memory/chat history: qa.combine_documents_chain.memory

ConversationBufferMemory(chat_memory=ChatMessageHistory(messages=[HumanMessage(content="who's the client's friend?", additional_kwargs={}), AIMessage(content="The client's friend is Emma.", additional_kwargs={}), HumanMessage(content="and her pet's name is?", additional_kwargs={}), AIMessage(content="Emma's pet's name is Sullivan.", additional_kwargs={})]), output_key=None, input_key='question', return_messages=False, human_prefix='Human', ai_prefix='AI', memory_key='history') enter image description here

enter image description here

Litha answered 14/5, 2023 at 7:35 Comment(6)
Wow, thank you for the detailed answer, unfortunately I have already tried that and that's how I've been getting the custom prompt to work with RetrievalQA.from_chain_type. It's the memory that is the issue. What is the qa.combine_documents_chain.memory? does that give the llm memory of the conversation to be able to chat and not just answer one off questions?Mar
Could you elaborate on the memory issue? From the logging with verbose=True, I can see the chat history has already been appended to the qa.combine_documents_chain.memory so to your question, yes, it's not just answering one-off question, but rather is capable of understanding the converssation. I have added a screenshot from the GitHub Jupyter notebook for your reference.Litha
for the answer to the second part of OP's question, please checkout @andrew_reece 's responseLitha
@Litha - Can you check this #76698181? Tried a variation of RetreivalQA + multiple input prompt + Memory. However, it didn't run successfully.Macrospore
Somehow the content in the "<hs> {history} </hs> " stays empty when chatting. Any ideas why this happens? When printing "print(dbqa.combine_documents_chain.memory)" I only see the question I asked in the ChatMessageHistory and no previous questionAmourpropre
Great answer. I'm wondering though, is this memory for the running instance? What if this app would serve [n] number of visitors on a website, would this memory be per session? How would we prevent mixing up chat history, as it's one python instance running?Messere
N
5

Here's a solution with ConversationalRetrievalChain, with memory and custom prompts, using the default 'stuff' chain type.

There are two prompts that can be customized here. First, the prompt that condenses conversation history plus current user input (condense_question_prompt), and second, the prompt that instructs the Chain on how to return a final response to the user (which happens in the combine_docs_chain).

from langchain import PromptTemplate

# note that the input variables ('question', etc) are defaults, and can be changed

condense_prompt = PromptTemplate.from_template(
    ('Do X with user input ({question}), and do Y with chat history ({chat_history}).')
)

combine_docs_custom_prompt = PromptTemplate.from_template(
    ('Write a haiku about a dolphin.\n\n'
     'Completely ignore any context, such as {context}, or the question ({question}).')
)

Now we can initialize the ConversationalRetrievalChain with the custom prompts.

from langchain.llms import OpenAI
from langchain.chains import ConversationalRetrievalChain
from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)

chain = ConversationalRetrievalChain.from_llm(
    OpenAI(temperature=0), 
    vectorstore.as_retriever(), # see below for vectorstore definition
    memory=memory,
    condense_question_prompt=condense_prompt,
    combine_docs_chain_kwargs=dict(prompt=combine_docs_custom_prompt)
)

Note that this calls _load_stuff_chain() under the hood, which allows for an optional prompt kwarg (that's what we can customize). This is used to set the LLMChain , which then goes to initialize the StuffDocumentsChain.

We can test the setup with a simple query to the vectorstore (see below for example vectorstore data) - you can see how the output is determined completely by the custom prompt:

chain("What color is mentioned in the document about cats?")['answer']
#'\n\nDolphin leaps in sea\nGraceful and playful in blue\nJoyful in the waves'

And memory is working correctly:

chain.memory
#ConversationBufferMemory(chat_memory=ChatMessageHistory(messages=[HumanMessage(content='What color is mentioned in the document about cats?', additional_kwargs={}), AIMessage(content='\n\nDolphin leaps in sea\nGraceful and playful in blue\nJoyful in the waves', additional_kwargs={})]), output_key=None, input_key=None, return_messages=True, human_prefix='Human', ai_prefix='AI', memory_key='chat_history')

Example vectorstore dataset with ephemeral ChromaDB instance:

from langchain.vectorstores import Chroma
from langchain.document_loaders import DataFrameLoader
from langchain.embeddings.openai import OpenAIEmbeddings

data = {
    'index': ['001', '002', '003'], 
    'text': [
        'title: cat friend\ni like cats and the color blue.', 
        'title: dog friend\ni like dogs and the smell of rain.', 
        'title: bird friend\ni like birds and the feel of sunshine.'
    ]
}

df = pd.DataFrame(data)
loader = DataFrameLoader(df, page_content_column="text")
docs = loader.load()

embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(docs, embeddings)
Nozzle answered 15/5, 2023 at 20:31 Comment(2)
Condense question is the prompt that processes user input and chat history. Combine docs is how the output/response back to the user is handled after the retrieval happens. Definitely a mix of art and science to arrive on the best way to engineer those prompts in the chain for your specific use case - I am also still learning. Does this solution answer your original question, how to include memory and custom prompts in a conversational chain?Nozzle
what is the difference between RetrievalQA, RetrievalQAWithSources and ConversationalRetrievalChain?Amourpropre
V
5

#This is the function

 def qasystem(query):

loader = TextLoader("details.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
documents = text_splitter.split_documents(documents)

vectordb = Chroma.from_documents(
    documents,
    embedding=OpenAIEmbeddings(),
    persist_directory='./data'
)
vectordb.persist()

_template = """Given the following conversation and a follow up question, rephrase the follow up question to be a 
standalone question without changing the content in given question.

Chat History:
{chat_history}
Follow Up Input: {question}
Standalone question:"""
condense_question_prompt_template = PromptTemplate.from_template(_template)

prompt_template = """You are helpful information giving QA System and make sure you don't answer anything 
not related to following context. You are always provide useful information & details available in the given context. Use the following pieces of context to answer the question at the end. 
If you don't know the answer, just say that you don't know, don't try to make up an answer. 

{context}

Question: {question}
Helpful Answer:"""

qa_prompt = PromptTemplate(
    template=prompt_template, input_variables=["context", "question"]
)

memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)
llm = ChatOpenAI(temperature=0.1)
question_generator = LLMChain(llm=llm, prompt=condense_question_prompt_template, memory=memory)
doc_chain = load_qa_chain(llm, chain_type="stuff", prompt=qa_prompt)
qa_chain = ConversationalRetrievalChain(
    retriever=vectordb.as_retriever(search_kwargs={'k': 6}),
    question_generator=question_generator,
    combine_docs_chain=doc_chain,
    memory=memory,

)

chat_history = []
while True:
    result = qa_chain({'question': question, 'chat_history': chat_history})

    response = result['answer']
    chat_history.append((question, response))
    return result['answer'
Virescent answered 11/7, 2023 at 4:56 Comment(2)
This was worked for me.Virescent
you are not supposed to use both ConversationBufferMemory and explicit manual memory. "In the above example, we used a Memory object to track chat history. We can also just pass it in explicitly. In order to do this, we need to initialize a chain without any memory object." Reference: python.langchain.com/docs/use_cases/question_answering/how_to/…Saidel
A
1

When using the ConversationBufferMemory I am using a very simple test to confirm whether memory is working on my chatbot, which is asking the chatbot "What was the first question I asked".

I always seems to get the same incorrect answer:

I'm sorry, but I don't have access to your initial question as I am an AI language model and I don't have the capability to track previous interactions. Could you please repeat your question?

Apart from this though it does seem that memory is working to an extent, for example if I ask non-contextual questions, it does seem to be able to answer correctly.

Has anyone else encountered this anomaly.

I'm not sure if I need to work on the prompting or why else this anomaly is arising.

Ani answered 11/6, 2023 at 6:14 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.