How to return all documents for each bucket after ElasticSearch term aggregation?
Asked Answered
M

1

11

I use the following simple query to search across documents in my Elastic index:

{
    "query": { "query_string": { "query": "*test*" } },
    "aggregations": {
        "myaggregation": {
            "terms": { "field": "myField.raw", "size": 0 }
        }
    }
}

This returns me the number of documents per distinct value of myField.raw.

Since I'm interested into all actual documents than the total number, I tried to add the following top_hits sub aggregation:

{
    "query": { "query_string": { "query": "*test*" } },
    "aggregations": {
        "myaggregation": {
            "terms": { "field": "myField.raw", "size": 0 },
            "aggregations": {
                "hits": {
                    "top_hits": { "size": 2000000 }
                }
            }
        }
    }
}

This ugly usage of top_hits works, but is slow as hell.

Is there any proper way to fetch the actual documents for each bucket after doing the term aggregation?

Mandola answered 24/6, 2015 at 14:59 Comment(3)
No, no other way. And aggregations are not meant to return all documents. Also, no use case is ok if it wants to return all documents in Elasticsearch. It would a very memory intensive operations and, also, slow.Assizes
too bad. i used the terms aggregation now without any subaggregation and build up my specific result in the client. thanks anyway! :)Mandola
supposing I can wait for top_hits, but it returns 100 hits per bucket, and i have 1 million buckets. Suppose also my size + from limit is the default 10000, then can I get all the hits for all the buckets?Kubis
D
2

Have you considered using collapse on field?

It returns doc grouped under inner_hits (hits.hits[].inner_hits.<collapse-group-name>.hits.hits[]._source)

Refer - https://www.elastic.co/guide/en/elasticsearch/reference/6.8/search-request-collapse.html

Deflagrate answered 30/4, 2021 at 7:48 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.