How to write Azure Cosmos COUNT DISTINCT query
Asked Answered
S

1

6

Goal:

To return the count of distinct values of a particular document field within a partition of a Cosmos collection.

Steps:

If I run the following query on my Azure Cosmos database,

SELECT DISTINCT c.field
FROM c
WHERE c.field = 'abc' AND c.partitionKeyField = '123'

I get one row as expected, e.g. the following response

[
    {
        "field": "abc"
    }
]

However, if I then run the following query in an attempt to count the number of distinct documents in the response, via the following query

SELECT VALUE COUNT(1)
FROM (
    SELECT DISTINCT c.field
    FROM c
    WHERE c.field = 'abc' AND c.partitionKeyField = '123'
)

It returns

[
    6
]

This is the total number of documents with c.field set to "abc" rather than the number of distinct values of c.field.

Question:

Please could you help me understand why the query returns the number of documents rather the number of distinct values for c.field, and if there is a query which will return the number of distinct values of c.field, i.e. 1?

Edit - PS. I know this a contrived example as by definition the number of unique values of c.field is always 1 - I have deliberately simplified this from the real case.

Singles answered 16/5, 2020 at 22:36 Comment(4)
Interestingly GROUP BY returns what I'd like, e.g. sql SELECT d.field, COUNT(1) FROM ( SELECT DISTINCT c.field FROM c WHERE c.field = 'abc' AND c.partitionKeyField = '123' ) AS d GROUP BY d.field Returns json [ { "field": "abc", "$1": 1 } ] Singles
This type of query is not reliable in my experience. I reported something similar here and got a ludicrous response that totally missed the point but raised it again through other channels and think it is on the radar to be fixed at some point...Localism
Thanks. Good to know I should avoid using this query. Your link was one of the few pages I found with someone with a similar issue. Did you solve it with a GROUP BY like in the comment above, or something else?Singles
I think I just concluded that I was out of luck trying to run this on CosmosDb and didn't come up with any workaround. Hopefully the group by works for you!Localism
S
4

At the moment of writing this (22/01/2021) this query provides correct number of distinct values:

SELECT COUNT(UniqueFields) AS UniqueCount
FROM (SELECT DISTINCT c.field
    FROM c
    WHERE c.field = 'abc' AND c.partitionKeyField = '123') as UniqueFields
Schulman answered 22/1, 2021 at 12:46 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.