How to do count(distinct) for multiple columns

B

6

5

This does not work:

select count(distinct colA, colB) from mytable

I know I can simply solve this by making a double select.

select count(*) from (
    select distinct colA, colB from mytable
)

Is there anyway I can do this without having to do the sub-select?

Boswall answered 4/11, 2021 at 8:39 Comment(2)

Keep the subquery solution. – Concupiscent 4/11, 2021 at 8:46

What is wrong with a subquery? – Retiarius 4/11, 2021 at 8:58

B

1

[TL;DR] Just use a sub-query.

If you are trying to use concatenation then you need to ensure that you delimit the terms with a string that is never going to appear in the values otherwise you will find non-distinct terms grouped together.

For example: if you have a two numeric column then using COUNT(DISTINCT col1 || col2) will group together 1||23 and 12||3 and count them as one group.

You could use COUNT(DISTINCT col1 || '-' || col2) but if the columns are string values and you have 'ab-'||'-'||'c' and 'ab'||'-'||'-c' then, once again, they would be identical once concatenated.

The simplest method is to use a sub-query.

If you can't do that then you can combine columns via string-concatenation but you need to analyse the contents of the column and pick a delimiter that does not appear in your strings otherwise your results might be erroneous. Even better is to ensure that the delimiter character will never be in the sub-string with check constraints.

ALTER TABLE mytable ADD CONSTRAINT mytable__col1__chk CHECK (col1 NOT LIKE '%¬%');
ALTER TABLE mytable ADD CONSTRAINT mytable__col2__chk CHECK (col2 NOT LIKE '%¬%');

Then:

SELECT COUNT(DISTINCT col1 || '¬' || col2)
FROM   mytable;

Blake answered 4/11, 2021 at 8:59 Comment(1)

Also, I suppose the subquery performance will be much better than the concatenation. – Concupiscent 4/11, 2021 at 9:3

C

2

Subquery is standard solution which I recommend too. Concatenation-based solutions, except they are error-prone if dangerous character occurs, might be also worse in performance.

Note: in case you collected obscure solutions how to avoid subquery, window function usage is also possible here (Not to be used in production - your code reviewers won't praise you for it):

select distinct count(*) over ()
from my_table
group by colA, colB

Crazyweed answered 4/11, 2021 at 9:10 Comment(0)

T

2

Just for fun, you can (ab)use window functions and limit clause. These are evaluated after grouping. So:

SELECT COUNT(*) OVER()
FROM t
GROUP BY col_a, col_b
OFFSET 0 ROWS FETCH NEXT 1 ROWS ONLY

Toxicology answered 4/11, 2021 at 9:25 Comment(0)

T

1

If you're trying to avoid sub-selects at all costs, one variant would be to concatenate them as such:

SELECT count(DISTINCT concat(colA, colB)) FROM mytable;

Thou answered 4/11, 2021 at 8:42 Comment(1)

Will count ('AB', 'C') and ('A', 'BC') as one distinct combination. – Concupiscent 4/11, 2021 at 8:45

B

1

[TL;DR] Just use a sub-query.

If you are trying to use concatenation then you need to ensure that you delimit the terms with a string that is never going to appear in the values otherwise you will find non-distinct terms grouped together.

For example: if you have a two numeric column then using COUNT(DISTINCT col1 || col2) will group together 1||23 and 12||3 and count them as one group.

You could use COUNT(DISTINCT col1 || '-' || col2) but if the columns are string values and you have 'ab-'||'-'||'c' and 'ab'||'-'||'-c' then, once again, they would be identical once concatenated.

The simplest method is to use a sub-query.

If you can't do that then you can combine columns via string-concatenation but you need to analyse the contents of the column and pick a delimiter that does not appear in your strings otherwise your results might be erroneous. Even better is to ensure that the delimiter character will never be in the sub-string with check constraints.

ALTER TABLE mytable ADD CONSTRAINT mytable__col1__chk CHECK (col1 NOT LIKE '%¬%');
ALTER TABLE mytable ADD CONSTRAINT mytable__col2__chk CHECK (col2 NOT LIKE '%¬%');

Then:

SELECT COUNT(DISTINCT col1 || '¬' || col2)
FROM   mytable;

Blake answered 4/11, 2021 at 8:59 Comment(1)

Also, I suppose the subquery performance will be much better than the concatenation. – Concupiscent 4/11, 2021 at 9:3

M

0

Concatenate them.

Select count(distinct colA ||'-'|| colB) from mytable;

Michelinamicheline answered 4/11, 2021 at 8:41 Comment(5)

well I know the Concatenate method, so now I am ashames I did not think of it myself :D Thank! – Boswall 4/11, 2021 at 8:45

Will count ('AB', 'C') and ('A', 'BC') as one distinct combination. – Concupiscent 4/11, 2021 at 8:45

Now it won't; thank you, @jarlh. – Michelinamicheline 4/11, 2021 at 8:50

Now ('A-', 'B') and ('A', '-B') are both counted as ('A--B'). – Concupiscent 4/11, 2021 at 8:56

Oh well, if there's really such a data that nothing of suggested simple options helps, then subquery it is. – Michelinamicheline 4/11, 2021 at 9:0

Q

0

NB: The answer is for SQLite

There's another option using JSON:

SELECT count(DISTINCT json_object('colA', colA, 'colB', colB))
FROM mytable;

Quaternary answered 3/9, 2023 at 23:21 Comment(2)

In PostgreSQL you could simply use row values for this, e.g., SELECT count(DISTINCT (colA, colB)) FROM mytable;, but SQLite limits the usage of row values for some reason -- sqlite.org/rowvalue.html. json_object() can be used as a substitution where you'd normally use a row value. – Quaternary 3/9, 2023 at 23:37

Sorry, I didn't notice the tags for this question -- I thought it was for SQLite as I've been searching for SQLite options. – Quaternary 3/9, 2023 at 23:50

Recommended topics

Hot tags