Solr-Retrieve name of document where the word is found
Asked Answered
C

1

6

I am using queries (Solr Admin) to search words through two text documents that are in my HDFS. How can i retrieve the name of the document that the word is found in. I am using this project https://github.com/lucidworks/hadoop-solr

I am creating a collection using bin/solr -e cloud and i am using "data_driven_schema_configs" from server/solr/configsets/ directory.

I tryied adding <field name="fileName" type="string" indexed="true" stored="true" /> inside managed-schema at ~/solr-6.1.0/server/solr/configsets/data_driven_schema_configs/conf, and also change it name to schema.xml, but in this directory there isn't any dataConfig file to add <field column="file" name="fileName"/> as i see it in some other posts with similar questions, but not for SolrCloud, so i don't know if that i am trying is correct. What changes, and in which directories, i have to do, to be able to make it happen.

Example: I am searching the word "greatest" which can found in both documents. How can i see in which document is every result, sample1.txt or sample2.txt

enter image description here

Cubature answered 10/9, 2016 at 14:45 Comment(5)
If those are the only fields in your index that describes the documents, you can't. How did you generate the index files? Those id values seems to be actual text from the documents, and not suitable unique ids.Outlier
I am using this project github.com/LucidWorks/hadoop-solr @OutlierCubature
You should start reading Solr basics before asking. As @Outlier said, the first thing is that you should provide suitable unique ids for the id field. The actual text from the documents should be indexed in an apropriated text field, see Solr Field Types. Also if you want the name of the matched documents, why not indexing & storing the name of the documents ?Upton
@Cubature please provide a sample of the data you send to Solr, with the update request. Are you runnning Solr in schemaless mode ?Upton
@n0tting i forgot to mention that i am using SolrCloud. The data that i am using is same books in .txt format from gutenberg.orgCubature
C
3

Same thing I said when you mentioned this question on IRC:

Your Solr schema must contain a field where you put the name, set to stored="true", and you must include that field, with a relevant value, in every document when you index. Most schema changes require a full reindex.

https://wiki.apache.org/solr/HowToReindex

Crider answered 13/9, 2016 at 15:40 Comment(4)
i have added this line, at manged-schema <field name="fileName" type="string" indexed="true" stored="true" /> at this directory: /solr-6.1.0/server/solr/configsets/data_driven_schema_configs/conf is that what you mean?Cubature
And did you assure that this field is not only present, but also filled during the indexing process? And how should old documents of your index get a value into that field? Someone needs to write it in there. Henceforth, did you re-index after the schema extension?Bereave
@Crider what do you mean with that " and you must include that field, with a relevant value, in every document when you index."Cubature
In order for a field to actually be useful, it must be populated. For the specific example (seeing a filename when indexing files) it's particularly important that every document actually include a filename field, and that the filename field has something useful in it.Crider

© 2022 - 2024 — McMap. All rights reserved.