I am learning elastic search and would like to count distinct values. So far I can count values but not distinct.
Here is the sample data:
curl http://localhost:9200/store/item/ -XPOST -d '{
"RestaurantId": 2,
"RestaurantName": "Restaurant Brian",
"DateTime": "2013-08-16T15:13:47.4833748+01:00"
}'
curl http://localhost:9200/store/item/ -XPOST -d '{
"RestaurantId": 1,
"RestaurantName": "Restaurant Cecil",
"DateTime": "2013-08-16T15:13:47.4833748+01:00"
}'
curl http://localhost:9200/store/item/ -XPOST -d '{
"RestaurantId": 1,
"RestaurantName": "Restaurant Cecil",
"DateTime": "2013-08-16T15:13:47.4833748+01:00"
}'
And what I tried so far:
curl -XPOST "http://localhost:9200/store/item/_search" -d '{
"size": 0,
"aggs": {
"item": {
"terms": {
"field": "RestaurantName"
}
}
}
}'
Output:
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0.0,
"hits": []
},
"aggregations": {
"item": {
"buckets": [
{
"key": "restaurant",
"doc_count": 3
},
{
"key": "cecil",
"doc_count": 2
},
{
"key": "brian",
"doc_count": 1
}
]
}
}
}
How can I get count of cecil
as 1 instead of 2
cardinality
serves the purpose here, but I would like to point out a few things here: 1.cardinality
aggregation is an "approximate" algorithm based on [HyperLogLog++ (HLL)][static.googleusercontent.com/media/research.google.com/en//pubs/… algorithm. Quoting from documentation:HLL works by hashing your input and using the bits from the hash to make probabilistic estimations on the cardinality
. There is a trade-off between "precision" and "memory". – Theis