I have not found documentation regarding Question-Answering based on multiple text files, while referencing the text files individually.
Example:
I have file1.txt
through file20.txt
.
file1.txt
is from April 2023 and file5.txt
is from March 2023.
Given two files, I would like chatGPT to read both files and answer comparison questions such as:
"How has sentiment regarding ___ changed/progressed from the March file up to the April file?"
"What are the differences between the two files with respect to the discussion of ___?"
"How many times is ___ being mentioned in each file?"
Here is non-working code that illustrates what I would like to achieve:
from langchain.chains.qa_with_sources import load_qa_with_sources_chain
from langchain.llms import OpenAI
chain = load_qa_with_sources_chain(OpenAI(temperature=0), chain_type="stuff")
query = "How has sentiment regarding medical devices changed/progressed from March to April?"
docs = [March_file, April_file]
chain({"input_documents": docs, "question": query}, return_only_outputs=True)
Problems I've run into:
Chunking - QA with langchain supposes chunking all your personal files into many separate, although still continuous, chunks/
documents
. However, the nature of my task requires separation of files given the need to somehow reference them by their different dates (and it still requires chunking because the files are large).Reference to specific files - The files are large(~12000 tokens), and therefore require chunking, yet after chunking, I need to be able to call to specific dated documents. How do I do this when the file has been chunked into many 1000 token chunks?
What is the current best method to solve this problem?