Langchain - Can't solve the dynamic filtering problem from vectorstore
M

0

7

I am using Langchain version 0.218, and was wondering if anyone was able to filter a seeded vectorstore dynamically during runtime? Such as when running by a Agent.

My motive is to put this dynamic filter in a Conversational Retrieval QA chain, where I filter a retriever with a filename extracted from conversation inputs and retrieve all its chunks (k set to count of chunks belonging to the filename in search_kwargs using a mapper file).

I am able to filter a seeded vectorstore (like Chroma) manually such as:

from langchain.chains import ConversationalRetrievalChain
from langchain.memory import ConversationBufferMemory

# init a vectorstore and seed documents
vectorstore = Chroma.from_documents(..)

# 'somehow' I get hands on the filename from user input or chat history
found_filename = "report.pdf"

# filter using a search arg, such as 'filename' provided in the metadata of all chunks
file_chunk_mapper = {"report.pdf" : ["chunk1", "chunk2", ... ]
one_doc_retiever = vectorstore.as_retriever(search_kwargs={"where" : {"filename": found_filename}, 'k': len(file_chunk_mapper})

# QA Chain which will be used as a Tool by Agents
QA_chain = ConversationalRetrievalChain(.., retriever=one_doc_retiever, memory=memory)

# this would be run by an Agent
QA_chain.run("all person names in file report")

## ANSWER
## I found all the names like: ...

I have tried using no-filters and other methods such as Self-Query Retrieval and Compression Query Retrievals, but none worked like this, when the model had a specific and definite set of chunks to look at.

As far as I have read the documentation, I think creating a CustomChain, with two chains, where first extracts the filename, filters a retriever and then executes a second chain with that new retriever seems to the only option.

Am I missing something here? Is there a simpler or smarter way about this?

But how do I use it in a Agent Execution where chains are automated. Its boggling my mind from past two days.

Mouthwatering answered 30/6, 2023 at 7:0 Comment(2)
Did you figure it out? I have the same problem and it seems like something that should be possible to do...Bifid
I tried using the configurable field in the retriever to modify the search_kwargs during runtime and it worked, but my problem is that mlflow do not recognize this after I log the model... anyone having the same problem?Exenterate

© 2022 - 2024 — McMap. All rights reserved.