Top N rows by group in ClickHouse
Asked Answered
T

2

5

What is the proper way to query top N rows by group in ClickHouse?
Lets take an example of tbl having id2, id4, v3 columns and N=2. I tried the following

SELECT                                                                          
    id2,                                                                        
    id4,                 
    v3 AS v3        
FROM tbl
GROUP BY                 
    id2,                 
    id4                  
ORDER BY v3 DESC                                                                
LIMIT 2 BY                       
    id2,                 
    id4      

but getting error

Received exception from server (version 19.3.4):
Code: 215. DB::Exception: Received from localhost:9000, 127.0.0.1. DB::Exception
: Column v3 is not under aggregate function and not in GROUP BY..

I could put v3 into GROUP BY and it does seems to work, but it is not efficient to group by a metric.

There is any aggregate function, but we actually want all values (limited to 2 by LIMIT BY clause) not any value, so it doesn't sound like to be proper solution here.

SELECT                                                                          
    id2,                                                                        
    id4,                 
    any(v3) AS v3        
FROM tbl
GROUP BY                 
    id2,                 
    id4                  
ORDER BY v3 DESC                                                                
LIMIT 2 BY                       
    id2,                 
    id4      
Tetrabrach answered 26/2, 2019 at 14:34 Comment(0)
S
7

It can be used aggregate functions like this:

SELECT
    id2,
    id4,
    arrayJoin(arraySlice(arrayReverseSort(groupArray(v3)), 1, 2)) v3
FROM tbl
GROUP BY
    id2,
    id4
Stereoscopic answered 26/2, 2019 at 19:39 Comment(3)
thanks for solution but it does not return result in expected format, multiple rows are collapsed into array, which is not an expected format. I don't want to have nested data structures on output.Tetrabrach
the query fixed ;)Stereoscopic
This does the job. Does anybody have a notion about performance of this solution?Handkerchief
H
1

You can also do it the way you would do it in "normal" SQL as described in this thread

While vladimir's solutions works for many cases, it didn't work for my case. I have a table, that looks like this:

column    | group by    
++++++++++++++++++++++
A         | Yes
B         | Yes
C         | No

Now, imagine column A identifies the user and column B stands for whatever action a user could do e. g. on your website or your online game. Column C is the sum of how often the user has done this particular action. Vladimir's solution would allow me to get column A and C, but not the action the user has done (column B), meaning I would know how often a user has done something, but not what.

The reason for this is that it doesn't make sense to group by both A and B. Every row would be a unique group and you aren't able to find the top K rows since every group has only 1 member. The result is the same table you query against. Instead, if you group only by A, you can apply vladimir's solution but would get only columns A and C. You can't output column B because it's not part of the Group By statement as explained.

If you would like to get the top 2 (or top 5, or top 100) actions a user has done, you might look for a solution that this:

SELECT rs.id2, rs.id4, rs.v3
    FROM (
        SELECT id2, id4, v3, row_number()
        OVER (PARTITION BY id2, id4 ORDER BY v3 DESC) AS Rank
        FROM tbl
    ) rs WHERE Rank <= 2

Note: To use this, you have to set allow_experimental_window_functions = 1.

Handkerchief answered 8/5, 2021 at 10:35 Comment(2)
Any idea about performance? How it compares to the other?Tetrabrach
Interesting question. Haven't benchmarked it yet. I'm about to work with ClickHouse in the next months and test it for our needs, maybe I'll compare these methods to each other and let you know, if nobody else does in the meantime.Handkerchief

© 2022 - 2024 — McMap. All rights reserved.