How to group results in elasticsearch?
Asked Answered
W

5

14

I am storing Book Titles in elasticsearch and they all belong to many shops. Like this:

{
    "books": [
        {
            "id": 1,
            "title": "Title 1",
            "store": "store1" 
        },
        {             
            "id": 2,
            "title": "Title 1",
            "store": "store2" 
        },
        {             
            "id": 3,
            "title": "Title 1",
            "store": "store3" 
        },
        {             
            "id": 4,
            "title": "Title 2",
            "store": "store2" 
        },
        {             
            "id": 5,
            "title": "Title 2",
            "store": "store3" 
        }
    ]
}

How can I get all the books and group them by title... and one result per group (one row with group with the same title so i can get all ids and stores)?

Based on data above I want to get two results with all ids and stores in them.

Expected results:

{
"hits":{
    "total" : 2,
    "hits" : [
        {                
            "0" : {
                "title" : "Title 1",
                "group": [
                     {
                         "id": 1,
                         "store": "store1"
                     },
                     {
                         "id": 2,
                         "store": "store2"
                     },
                     {
                         "id": 3,
                         "store": "store3"
                     },
                ]
            }
        },
        {                
            "1" : {
                "title" : "Title 2",
                "group": [
                     {
                         "id": 4,
                         "store": "store2"
                     },
                     {
                         "id": 5,
                         "store": "store3"
                     }
                ]
            }
        }
    ]
}
}
Warp answered 15/4, 2014 at 14:51 Comment(2)
I've been looking for this kind of thing all day! ES is fast moving. Take a look at elastic.co/guide/en/elasticsearch/reference/current/…Devilfish
elastic.co/guide/en/elasticsearch/reference/current/… ?Secularity
S
10

What you are looking for is not possible in Elasticsearch, at least not with the current version (1.1).

There is a long outstanding issue for this feature with a lot of +1's and demand behind it.

As for statements: Simon says, it requires a lot of refactoring and although it is planned, there is no way of saying, when it will be implemented or even shipped.

A similar statement was made by Clinton Gormley in his webinar, that field grouping needs a lot of effort to be done right, especially since Elasticsearch is a sharded and distributed environment by nature. It would be not that big of a deal, if you'd ignore sharding, but Elasticsearch wants to ship only with features, that can scale with the complete system and work as well on hundreds of machines as they would on a single box.

If you're not tied to Elasticsearch, Solr offers such a feature.

Otherwise, probably the best solution at the moment is to do this client side. That is, query for some documents, do the grouping on you client and if needed, fetch some more results to satisfy your desired group size (as far as i know, this is what Solr is doing under the hood).

Not exactly what you wanted, but you could also go for aggregations; create one bucket for your title and have a sub-aggregation done on the id field. You won't get the store values with this, but you could retrieve them from your datastore once you have the ids.

{
    "aggs" : {
        "titles" : {
            "terms" : { "field" : "title" },
            "aggs": {
                "ids": {
                    "terms": { "field" : "id" }
                }
            }
        }
    }
}

Edit: It seems, that with the top_hits aggregations, result grouping could be implemented soon.

Simonetta answered 24/4, 2014 at 12:38 Comment(1)
The specified isse is already closed. The top hits aggregation functionality is addedLyricism
S
3

You can implement above desired result using Aggregation in aggregation with top_hits aggs. ex.

aggs: {
        "set": {
            "terms": {
                field: "id"
            },
            "aggs": {
                "color": {
                    "terms": {
                        field: "color"
                    },
                    "aggs": {
                        "products": {
                            "top_hits": {
                                _source:{
                                    "include":["size"]
                                }
                            }
                        }
                    }
                },
                "product": {
                    "top_hits": {
                        _source:{
                            "include":["productDetails"]
                        },
                        size: 1
                    }
                }
            }
        }
    }
Shag answered 7/4, 2015 at 4:33 Comment(0)
C
0

On the similar lines with SQL'S GROUP BY Elasticsearch provides aggregation

With aggregation queries, Elasticsearch responsds with Buckets.

One bucket corresponds to one category (group).

Congruency answered 10/1, 2015 at 18:49 Comment(0)
C
0

I have the same problem but the best solution that I have found is change the mapping. You can convert the mapping to that the field "store" will be of type nested. This is because you have an relation many to many. In that way you can apply sorting, pagination. I hope to help.

Chichi answered 13/12, 2018 at 19:49 Comment(0)
P
0

It is not tested, but with top_hits aggregation, the query will look like this:

{
    "aggs" : {
        "titles" : {
            "terms" : { "field" : "title" }
        },
        "books": {
          "top_hits": {
            "size": 100,
            "_source": {
              "includes": [
                "*"
              ]
            }
          }
        },
    }
}

The problem here is that top_hits aggregation allows size maximum 100.

Panegyric answered 20/12, 2023 at 10:32 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.