llama_index get the document referenced from node_sources
Asked Answered
R

2

10

I'm getting good results with llama_index having indexed PDFs, however I am having trouble finding which PDF it found the results in to base its answers upon. result.node_sources uses a Doc id which it seems to internally generate. How can I get a reference back to the document?

Rover answered 22/5, 2023 at 16:13 Comment(0)
S
6

Got this answer directly from the Llama team -

Thanks for the questions and for your support of LlamaIndex. There are a few general approaches you can do:

  • Inject metadata into the extra_info of each Document, such as file name, link, etc. A lot of LlamaHub loaders should already automatically add metadata into the extra_info, but you can add/remove extra_info yourself if you'd like. This extra_info gets injected into each Node. When you get a response from a query engine, you can do response.source_nodes to fetch the relevant sources.

These sources will contain both the original text as well as the metadata. Take a look at this doc: https://gpt-index.readthedocs.io/en/stable/core_modules/data_modules/documents_and_nodes/usage_documents.html

  • Assuming you add the appropriate metadata to the extra_info field, you can choose to either modify the query string, or the QA/refine prompts and say something like "Please cite sources along with your answer" in either of those.

The query string you can just append to, for customizing prompts, take a look at https://gpt-index.readthedocs.io/en/latest/how_to/customization/custom_prompts.html

Sternwheeler answered 1/6, 2023 at 15:15 Comment(0)
S
3

It seems that they changed 'extra_info' to 'metadata'.

I used this code and it works perfectly:

    if hasattr(response, 'metadata'):
        document_info = str(response.metadata)
        find = re.findall(r"'page_label': '[^']*', 'file_name': '[^']*'", document_info)

        print('\n'+'=' * 60+'\n')
        print('Context Information')
        print(str(find))
        print('\n'+'=' * 60+'\n')
Supersensible answered 17/8, 2023 at 8:35 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.