I'm using langchain to process a whole bunch of documents which are in an Mongo database.
I can load all documents fine into the chromadb vector storage using langchain. Nothing fancy being done here. This is my code:
from langchain.embeddings.openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()
from langchain.vectorstores import Chroma
db = Chroma.from_documents(docs, embeddings, persist_directory='db')
db.persist()
Now, after storing the data, I want to get a list of all the documents and embeddings WITH id's.
This is so I can store them back into MongoDb.
I also want to put them through Bertopic to get the topic categories.
Question 1 is: how do I get all documents I've just stored in the Chroma database? I want the documents, and all the metadata.
Many thanks for your help!