How to load a folder of Json files in Langchain?
Asked Answered
B

1

12

I am trying to load a folder of JSON files in Langchain as:

loader = DirectoryLoader(r'C:...')
documents = loader.load()

But I got such an error message:

ValueError: Json schema does not match the Unstructured schema

Can anyone tell me how to solve this problem?

I tried using glob='**/*.json', but it is not working. The documentation on the Langchain website is limited as well.

Bard answered 17/5, 2023 at 15:31 Comment(0)
C
16

If you want to read the whole file, you can use loader_cls params:

from langchain.document_loaders import DirectoryLoader, TextLoader

loader = DirectoryLoader(DRIVE_FOLDER, glob='**/*.json', show_progress=True, loader_cls=TextLoader)

Also, you can use JSONLoader with schema params like:

from langchain.document_loaders.json_loader import JSONLoader

DRIVE_FOLDER = "/content/drive/MyDrive/Colab Notebooks/demo"
loader = DirectoryLoader(DRIVE_FOLDER, glob='**/*.json', show_progress=True, loader_cls=JSONLoader, loader_kwargs = {'jq_schema':'.content'})

documents = loader.load()

print(f'document count: {len(documents)}')
print(documents[0] if len(documents) > 0 else None)\

jq_schema You can follow this: https://github.com/hwchase17/langchain/blob/master/langchain/document_loaders/json_loader.py#L10

more usage for DirectoryLoader: https://github.com/hwchase17/langchain/blob/master/langchain/document_loaders/directory.py

Coxcomb answered 23/5, 2023 at 11:16 Comment(1)
Got the following error. Any solution for this? ValueError: Expected page_content is string, got <class 'NoneType'> instead.Coulson

© 2022 - 2024 — McMap. All rights reserved.