Word frequency in Solr
Asked Answered
M

2

5

I am trying to get frequency of words using solr. When I give this query :

localSolr/solr/select?q=someQuery&rows=0&facet=true&facet.field=content&wt=xml

solr gives me the frequencies like;

<lst name="facet_counts">
<lst name="facet_queries"/>
<lst name="facet_fields">
<lst name="content">
<int name="word1">24</int>
<int name="word2">12</int>
<int name="word3">8</int>

But when I count the words; I find that word2's actual count number is 13. Solr counts same words in the field as one.

For example;

field text consists; word2 word5 word7 word9 word2. Solr doesn't return word2's count number 2 instead it returns 1. It returns 1 for the count of word2 for the both sentences below;

word2 word10 word11 word12
word2 word9 word7 word2 word23

So frequencies return wrongly. I have checked facet fields but didn't find the proper parameter for that. How can I fix it so that it counts same words in sentence?

edit : relevant part of schema.xml :

<fieldType name="text_tr" class="solr.TextField" positionIncrementGap="100">
    <field name="content" type="text_tr" stored="true" indexed="true" multiValued="true"/>
    <copyField source="content" dest="text"/>
    <field name="text" type="text_tr" stored="false" indexed="true" multiValued="true"/>
Mariselamarish answered 23/10, 2012 at 13:28 Comment(0)
H
3

if the field you're faceting on is multivalued, then each word in the facet gets the proper count

i forgot to mention one thing: Term Vector Component will get you where you need

in the query, tv.tf will give you the term frequency for each term, while tv.fl tells solr on which fields the frequency should be calculated

NB this makes your indexing time slower than now (aka: you have to try it)

Harmattan answered 23/10, 2012 at 13:30 Comment(5)
thank you for your answer. I have changed field to make multivalued parameter true but it still returns the wrong answer.Mariselamarish
can you post your schema.xml, so maybe i can give you more info?Harmattan
Sorry I cannot put all the schema.xml but I edited and put the relevant part. I hope it helps.Mariselamarish
Nice explanation thereImpropriate
@Samuele and yns I know its been a while you asked/answered this question but I have similar problem and I followed the guidelines in TermVectorComponent but I can't figure out what to change in the http request that is shown in yns question after setting up TermVectorComponent for 'text' field?Moorer
D
0

Use the luke request handler

http://localhost:8983/solr/admin/luke?fl=YOUR_TEXT_FIELD&numTerms=500

more info: http://wiki.apache.org/solr/LukeRequestHandler

Debarath answered 23/10, 2012 at 16:20 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.