ROW_NUMBER( ) OVER in impala
Asked Answered
S

4

7

I have a use case where I need to use ROW_NUMBER() over PARTITION: Something like:

SELECT
  Column1 , Column 2
  ROW_NUMBER() OVER (
    PARTITION BY ACCOUNT_NUM
    ORDER BY FREQ, MAN, MODEL) as LEVEL
FROM
  TEST_TABLE

I need a workaround for this in Impala. Unfortunately Impala does not support sub queries and also does not support ROW_NUMBER() OVER functionality. Thank you for your help.

Stephie answered 6/10, 2014 at 19:20 Comment(1)
Impala will support both analytic window functions (including ROW_NUMBER()) as well as correlated subqueries in the upcoming 2.0 release.Tompion
T
4

Impala is rather limited for this type of query. With some assumptions, this query is possible:

  • The four columns in the partitioning clause are never NULL
  • The four columns in the partitioning clause uniquely identify a row

The query is rather ugly and expensive:

select tt.column1, tt.column2, count(*) as level
from test_table tt join
     test_table tt2
     on tt.account_num = tt2.account_num and
        (tt2.freq < tt.freq or
         tt2.freq = tt.freq and tt2.man < t.man or
         tt2.freq = tt.freq and tt2.man = t.man and tt2.model <= t.model
        )
group by tt.column1, tt.column2, tt.account_num, tt.freq, tt.man, tt.model;
Tonsillitis answered 6/10, 2014 at 19:35 Comment(0)
T
10

ROW_NUMBER() OVER PARTITION was added in CDH 5.2:

https://www.cloudera.com/documentation/enterprise/latest/topics/impala_analytic_functions.html#row_number

ROW_NUMBER() OVER([partition_by_clause] order_by_clause)
Triumvir answered 8/11, 2016 at 20:39 Comment(0)
T
4

Impala is rather limited for this type of query. With some assumptions, this query is possible:

  • The four columns in the partitioning clause are never NULL
  • The four columns in the partitioning clause uniquely identify a row

The query is rather ugly and expensive:

select tt.column1, tt.column2, count(*) as level
from test_table tt join
     test_table tt2
     on tt.account_num = tt2.account_num and
        (tt2.freq < tt.freq or
         tt2.freq = tt.freq and tt2.man < t.man or
         tt2.freq = tt.freq and tt2.man = t.man and tt2.model <= t.model
        )
group by tt.column1, tt.column2, tt.account_num, tt.freq, tt.man, tt.model;
Tonsillitis answered 6/10, 2014 at 19:35 Comment(0)
V
2

Impala supports now the over clause. Syntax is the same as in the question.

SELECT
  Column1 , Column 2
  ROW_NUMBER() OVER (
    PARTITION BY ACCOUNT_NUM
    ORDER BY FREQ, MAN, MODEL) as LEVEL
FROM
  TEST_TABLE

Impala documentation: https://www.cloudera.com/documentation/enterprise/5-6-x/topics/impala_analytic_functions.html#over

Venetis answered 17/4, 2017 at 3:44 Comment(0)
U
0

Impala supports sub-queries. Both in parentheses and using the with function.

Underhung answered 27/3, 2019 at 19:9 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.