SOLR - Grouping results with group.limit return wrong numFound
Asked Answered
E

5

6

When I do a search with grouping result and perform group limit, I get that numFound is the same as I when I don’t use the limit.

It looks like SOLR first performs search and calculates numFound and then limit the results.

I can't use pagination and other stuff. Is there any workaround or I missed something ?


Example:

======================================
| id |  publisher | book_title      |
======================================
| 1  | A1         | Title Book      |
| 2  | A1         | Book title 123  |
| 3  | A1         | My book         |
| 4  | B2         | Hi book title   |
| 5  | B2         | Another Book    |

If I perform query:

q=book_title:book
&group=true 
&group.field=publisher 
&group.limit=1
&group.main=true 

I will get numFound 5 but only 2 in the results.

"response": {
    "numFound": 5,
    "docs": [
        {
            "book_title": "My book",
            "publisher":  "A1"
        },
        {
            "book_title": "Another Book",
            "publisher":  "B2"
        }
    ]
}
Equivocate answered 25/12, 2013 at 18:30 Comment(0)
F
4

Set group.ngroups to true. That will produce

"grouped": {
"bl_version_id": {
  "matches": 53,
  "ngroups": 18,
  "groups": [
    {
...
Foggy answered 24/6, 2014 at 11:1 Comment(2)
it's important to not use group.main=true which discards this informationSpatola
@Spatola well said, the result obtained without it would have enough information to sort the groups manually.Enshroud
A
1

I had the same problem, couldn't find a way to fix the root cause, but I will share my solution as a workaround.

What I did is

  1. Facet by the field I'm grouping on.
  2. Count the number of unique facets. This will match the number of unique documents (2 in your case)

Add these faceting parameters to your query:

&facet=true
&facet.limit=-1
&facet.field=publisher

Notes:

  • This is a bit expensive, but it's the only way that worked for me (so far).
  • This will only work if publisher is not multi-valued
Among answered 27/12, 2013 at 21:54 Comment(0)
E
1

numFound indicate total no. of document matched for current query, here in your case 5 is correct, though you gave group.limit=1 it will give max. 1 document per group even though there are many documents resides in that group. I suggest you to use group.limit=-1 in your query it will return all 5 documents in result.

For more information please check details given below.

solr fieldcollapsing and maximum group.limit

http://wiki.apache.org/solr/FieldCollapsing

Empathic answered 31/12, 2013 at 5:37 Comment(0)
E
1

group.limit isn't real limit, it's only NumRows to return.

There is no easy solution implemented in Solr for my problem.

You may find answer here Solr User Group

Equivocate answered 19/12, 2014 at 8:41 Comment(0)
B
-1

numFound refers to the total number of documents found by solr after executing your query, which is also something that you're gonna need to do pagination based on that query.

Pagination in solr is pretty much like you handle it with regular RDBMSs, you're gonna need to use the start and the rows parameters, for instance, executing the following query will result to fetch 10 documents starting from document number 20:

?q=you_key_word&start=20&rows=10

This query will fetch for you the desired content for the target page "this would generate page number 3 in this case assuming that you have 10 docs/page", and of course instead of executing another query to get the total number of documents to know the number of pages, you would have this info auto generated for you represented by the value of "numFound".

Hope this helps

Backhand answered 25/12, 2013 at 18:47 Comment(2)
But that value that solr found isn't correct because I have limit in every group. I am not showing all results but only limited. I got a higher numFound than expected.Equivocate
In any case, numFound refers to the TOTAL number of documents, you have to keep that in mind.Hibernate

© 2022 - 2024 — McMap. All rights reserved.