Using Hive ntile results in where clause
Asked Answered
L

2

8

I want to get summary data of the first quartile for a table in Hive. Below is a query to get the maximum number of views in each quartile:

SELECT NTILE(4) OVER (ORDER BY total_views) AS quartile, MAX(total_views)
FROM view_data
GROUP BY quartile
ORDER BY quartile;

And this query is to get the names of all the people that are in the first quartile:

SELECT name, NTILE(4) OVER (ORDER BY total_views) AS quartile
FROM view_data
WHERE quartile = 1

I get this error for both queries:

Invalid table alias or column reference 'quartile'

How can I reference the ntile results in the where clause or group by clause?

Luxury answered 21/7, 2015 at 13:28 Comment(0)
N
7

You can't put a windowing function in a where clause because it would create ambiguity if there are compound predicates. So use a subquery.

select quartile, max(total_views) from
(SELECT total_views, NTILE(4) OVER (ORDER BY total_views) AS quartile,
FROM view_data) t
GROUP BY quartile
ORDER BY quartile
;

and

select * from 
(SELECT name, NTILE(4) OVER (ORDER BY total_views) AS quartile
FROM view_data) t
WHERE quartile = 1
;
Nevernever answered 21/7, 2015 at 14:59 Comment(2)
I don't really understand what you mean by ambiguity caused by compound predicates, could you give an example please?Luxury
SELECT col1 FROM T1 WHERE ROW_NUMBER() OVER (ORDER BY col1) <= 3 AND col1 > '100' What would the order of operation be here? Should the col1 > '100' part of the predicate be run first or should the row numbering.Nevernever
F
-1

The WHERE statement in SQL can only select on an existing column in a table schema. In order to perform that functionality on a calculated column, use HAVING instead of WHERE.

SELECT name, NTILE(4) OVER (ORDER BY total_views) AS quartile
FROM view_data
HAVING quartile = 1
Funnyman answered 1/6, 2018 at 20:12 Comment(1)
This answer also results in an error. Please see the accepted answer.Luxury

© 2022 - 2024 — McMap. All rights reserved.