Count distinct values using elasticsearch
Asked Answered
C

5

21

I am learning elastic search and would like to count distinct values. So far I can count values but not distinct.

Here is the sample data:

curl http://localhost:9200/store/item/ -XPOST -d '{
  "RestaurantId": 2,
  "RestaurantName": "Restaurant Brian",
  "DateTime": "2013-08-16T15:13:47.4833748+01:00"
}'

curl http://localhost:9200/store/item/ -XPOST -d '{
  "RestaurantId": 1,
  "RestaurantName": "Restaurant Cecil",
  "DateTime": "2013-08-16T15:13:47.4833748+01:00"
}'

curl http://localhost:9200/store/item/ -XPOST -d '{
  "RestaurantId": 1,
  "RestaurantName": "Restaurant Cecil",
  "DateTime": "2013-08-16T15:13:47.4833748+01:00"
}'

And what I tried so far:

curl -XPOST "http://localhost:9200/store/item/_search" -d '{
  "size": 0,
  "aggs": {
    "item": {
      "terms": {
        "field": "RestaurantName"
      }
    }
  }
}'

Output:

{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 0.0,
    "hits": []
  },
  "aggregations": {
    "item": {
      "buckets": [
        {
          "key": "restaurant",
          "doc_count": 3
        },
        {
          "key": "cecil",
          "doc_count": 2
        },
        {
          "key": "brian",
          "doc_count": 1
        }
      ]
    }
  }
}

How can I get count of cecil as 1 instead of 2

Congenial answered 9/7, 2014 at 16:15 Comment(0)
B
14

You have to use cardinality option as mentioned by @coder that you can find in the doc

$ curl -XGET "http://localhost:9200/store/item/_search" -d'
{
"aggs" : {
    "restaurant_count" : {
        "cardinality" : {
            "field" : "RestaurantName",
            "precision_threshold": 100, 
            "rehash": false 
            }
          }
         }
}'

This worked for me ...

Babylonia answered 4/10, 2014 at 9:54 Comment(3)
As correctly pointed by @c24b, cardinality serves the purpose here, but I would like to point out a few things here: 1. cardinality aggregation is an "approximate" algorithm based on [HyperLogLog++ (HLL)][static.googleusercontent.com/media/research.google.com/en//pubs/… algorithm. Quoting from documentation: HLL works by hashing your input and using the bits from the hash to make probabilistic estimations on the cardinality. There is a trade-off between "precision" and "memory".Theis
For more details read here: elastic.co/guide/en/elasticsearch/guide/current/… My apologies for citing the link as I was not able to explain more due to space constraints.Theis
Got this error with rehash option: [7:23] [cardinality] rehash doesn't support values of type: VALUE_BOOLEANRosetta
G
5

Use could use cardinality here: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-metrics-cardinality-aggregation.html

Goatsbeard answered 10/7, 2014 at 3:4 Comment(3)
I even tried : curl -XPOST "localhost:9200/store/item/_search" -d' {"size":0,"aggs":{"item":{"cardinality":{"field":"RestaurantName" } } } }' but not getting distinct countsCongenial
Should be curl -XGET?Babylonia
and is it ok to rename the aggregate as item?Babylonia
R
4

It is too late for me to answer this question for the original Author, but for anybody who is facing the same issue and reached here, my answer might help.

ES provides Cardinality for sure to get distinct count, but it is not accurate. For accuracy, a proper solution can be used. I have written an article on this which might help : Accurate Distinct Count and Values from Elasticsearch.

Rubetta answered 20/1, 2021 at 16:7 Comment(0)
H
0

There's no support for distinct counting in ElasticSearch, although non-deterministic counting exists. Use "terms" aggregation and count buckets in result. See Count distinct on elastic search question.

Harkness answered 12/6, 2017 at 14:2 Comment(0)
T
0

Use Cardinality Feature: Docs : https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-cardinality-aggregation.html

Example :

 "aggs": {
                "unquieValues": {
                  "cardinality": {
                    "field": "ourUniqueId.keyword",
                    "precision_threshold": 100
                  }
                }
              }
Thaumatology answered 17/8, 2023 at 12:33 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.