Filtering, sorting and paginating by sub-aggregations in ElasticSearch 6
Asked Answered
B

1

7

I have a collection of documents, where each document indicates the available rooms for a given hotel and day, and their cost for that day:

{
    "hotel_id": 2016021519381313,
    "day": "20200530",
    "rooms": [
        {
            "room_id": "00d70230ca0142a6874358919336e53f",
            "rate": 87
        },
        {
            "room_id": "675a5ec187274a45ae7a5fdc20f72201",
            "rate": 53
        }
    ]
}

Being the mapping:

{
    "properties": {
        "day": {
            "type": "keyword"
        },
        "hotel_id": {
            "type": "long"
        },
        "rooms": {
            "type": "nested",
            "properties": {
                "rate": {
                    "type": "long"
                },
                "room_id": {
                    "type": "keyword"
                }
            }
        }
    }
}

I am trying to figure out, how to do a query where I can get the available rooms for a set of days which total cost is less than a given amount, ordered by total cost in ascending order and paginated.

So far I came up with the way of getting rooms available for the set of days and their total cost. Basically filtering by the days, and grouping per hotel and room IDs, requiring that the minimum count in the aggregation is the number of days I am looking for.

{
    "size" : 0,
    "query": {
        "bool": { 
            "must": [
                {
                    "terms" : {
                        "day" : ["20200423", "20200424", "20200425"]
                    }
                }
            ]
        } 
    } ,
    "aggs" : {
        "hotel" : {
            "terms" : { 
                "field" : "hotel_id"
            },
            "aggs" : {
                "rooms" : {
                    "nested" : {
                        "path" : "rooms"
                    },
                    "aggs" : {
                        "rooms" : {
                            "terms" : {
                                "field" : "rooms.room_id",
                                "min_doc_count" : 3
                            },
                            "aggs" : {
                                "sum_price" : { 
                                    "sum" : { "field" : "rooms.rate" } }
                            }
                        }

                    }
                }
            }
        }
    }
}

So now I am interesting in ordering the result buckets in descending order at the "hotel" level based on the value of the sub-aggregation with "rooms", and also filtering the buckets that do not contains enough documents or which "sum_price" is bigger than a given budget. But I cannot manage how to do it.

I have been taking a look at "bucket_sort", but I cannot find the way to sort in base a subaggregation. I have been also taking a look to "bucket_selector", but it gives me empty buckets when they do not fit the predicate. I am probably not using them correctly in my case.

Which would be the right way of accomplish it?

Beautiful answered 24/7, 2019 at 17:13 Comment(0)
G
2

Here is the query without pagination:

{
   "size":0,
   "query":{
      "bool":{
         "must":[
            {
               "terms":{
                  "day":[
                     "20200530",
                     "20200531",
                     "20200532"
                  ]
               }
            }
         ]
      }
   },
   "aggs":{
      "rooms":{
         "nested":{
            "path":"rooms"
         },
         "aggs":{
            "rooms":{
               "terms":{
                  "field":"rooms.room_id",
                  "min_doc_count":3,
                  "order":{
                     "sum_price":"asc"
                  }
               },
               "aggs":{
                  "sum_price":{
                     "sum":{
                        "field":"rooms.rate"
                     }
                  },
                  "max_price":{
                     "bucket_selector":{
                        "buckets_path":{
                           "var1":"sum_price"
                        },
                        "script":"params.var1 < 100"
                     }
                  }
               }
            }
         }
      }
   }
}

Please note that the following variables should be changed for the desired results:

  • day
  • min_doc_count
  • script in max_price
Geranial answered 2/8, 2019 at 16:34 Comment(6)
wow thanks! I am pretty new to ES, could you explain a little what I got wrong in my query?Beautiful
First, you had extra bucket creation based on hotel ids.Geranial
Second, you needed filtering based on the total cost. For this purpose, bucket selectors are added.Geranial
Third, for sorting based on the total cost, order is used.Geranial
For further reading, you can search for pipeline aggregations.Geranial
You can also check this question: #46908860 your case can translate to SQL as: SELECT ... GROUP BY ... HAVINGGeranial

© 2022 - 2024 — McMap. All rights reserved.