Solr-Retrieve name of document where the word is found - McMap

About

Solr-Retrieve name of document where the word is found

Asked 10/9, 2016 at 14:45 Answered 13/9, 2016 at 15:40

solr lucene apache-zookeeper solrcloud

C

1

6

I am using queries (Solr Admin) to search words through two text documents that are in my HDFS. How can i retrieve the name of the document that the word is found in. I am using this project https://github.com/lucidworks/hadoop-solr

I am creating a collection using bin/solr -e cloud and i am using "data_driven_schema_configs" from server/solr/configsets/ directory.

I tryied adding <field name="fileName" type="string" indexed="true" stored="true" /> inside managed-schema at ~/solr-6.1.0/server/solr/configsets/data_driven_schema_configs/conf, and also change it name to schema.xml, but in this directory there isn't any dataConfig file to add <field column="file" name="fileName"/> as i see it in some other posts with similar questions, but not for SolrCloud, so i don't know if that i am trying is correct. What changes, and in which directories, i have to do, to be able to make it happen.

Example: I am searching the word "greatest" which can found in both documents. How can i see in which document is every result, sample1.txt or sample2.txt

Cubature answered 10/9, 2016 at 14:45 Comment(5)

If those are the only fields in your index that describes the documents, you can't. How did you generate the index files? Those id values seems to be actual text from the documents, and not suitable unique ids. – Outlier 10/9, 2016 at 22:15

I am using this project github.com/LucidWorks/hadoop-solr @Outlier – Cubature 11/9, 2016 at 12:59

You should start reading Solr basics before asking. As @Outlier said, the first thing is that you should provide suitable unique ids for the id field. The actual text from the documents should be indexed in an apropriated text field, see Solr Field Types. Also if you want the name of the matched documents, why not indexing & storing the name of the documents ? – Upton 15/9, 2016 at 9:0

@Cubature please provide a sample of the data you send to Solr, with the update request. Are you runnning Solr in schemaless mode ? – Upton 17/9, 2016 at 19:47

@n0tting i forgot to mention that i am using SolrCloud. The data that i am using is same books in .txt format from gutenberg.org – Cubature 17/9, 2016 at 21:1

C

3

Same thing I said when you mentioned this question on IRC:

Your Solr schema must contain a field where you put the name, set to stored="true", and you must include that field, with a relevant value, in every document when you index. Most schema changes require a full reindex.

https://wiki.apache.org/solr/HowToReindex

Crider answered 13/9, 2016 at 15:40 Comment(4)

i have added this line, at manged-schema <field name="fileName" type="string" indexed="true" stored="true" /> at this directory: /solr-6.1.0/server/solr/configsets/data_driven_schema_configs/conf is that what you mean? – Cubature 14/9, 2016 at 21:28

And did you assure that this field is not only present, but also filled during the indexing process? And how should old documents of your index get a value into that field? Someone needs to write it in there. Henceforth, did you re-index after the schema extension? – Bereave 19/9, 2016 at 11:5

@Crider what do you mean with that " and you must include that field, with a relevant value, in every document when you index." – Cubature 19/9, 2016 at 11:41

In order for a field to actually be useful, it must be populated. For the specific example (seeing a filename when indexing files) it's particularly important that every document actually include a filename field, and that the filename field has something useful in it. – Crider 27/9, 2016 at 16:49

Recommended topics

#Godot #Unity #Godot 4.X #Mongodb

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

© 2022 - 2024 — McMap. All rights reserved.