Query existing Pinecone index without re-loading the context data
Asked Answered
R

1

6

I'm learning Langchain and vector databases.

Following the original documentation I can read some docs, update the database and then make a query.

https://python.langchain.com/en/harrison-docs-refactor-3-24/modules/indexes/vectorstores/examples/pinecone.html

I want to access the same index and query it again, but without re-loading the embeddings and adding the vectors again to the ddbb.

How can I generate the same docsearch object without creating new vectors?

# Load source Word doc
loader = UnstructuredWordDocumentLoader("C:/Users/ELECTROPC/utilities/openai/data_test.docx", mode="elements")
data = loader.load()

# Text splitting
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(data)

# Upsert vectors to Pinecone Index
pinecone.init(
    api_key=PINECONE_API_KEY,  # find at app.pinecone.io
    environment=PINECONE_API_ENV
)
index_name = "mlqai"
embeddings = OpenAIEmbeddings(openai_api_key=os.environ['OPENAI_API_KEY'])

docsearch = Pinecone.from_texts([t.page_content for t in texts], embeddings, index_name=index_name)


# Query
llm = OpenAI(temperature=0, openai_api_key=os.environ['OPENAI_API_KEY'])
chain = load_qa_chain(llm, chain_type="stuff")

query = "que sabes de los patinetes?"
docs = docsearch.similarity_search(query)
answer = chain.run(input_documents=docs, question=query)
print(answer)
Roy answered 19/5, 2023 at 21:25 Comment(0)
E
12

You need to access the existing index. In order to do this, you must know the name of the index, and what embeddings were used to create it.

index_name = "mlqai"
embeddings = OpenAIEmbeddings(openai_api_key=os.environ['OPENAI_API_KEY'])
docsearch = Pinecone.from_existing_index(index_name, embeddings)

Documentation.

Extensive answered 19/5, 2023 at 22:36 Comment(1)
Thanks @Nick I noticed I was using the documentation of harrison-docs-refactor-3-24, not the latest,Roy

© 2022 - 2024 — McMap. All rights reserved.