In my CENSUS table, I'd like to group by State, and for each State get the median county population and the number of counties.
In psql, redshift, and snowflake, I can do this:
psql=> SELECT state, count(county), PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY "population2000") AS median FROM CENSUS GROUP BY state;
state | count | median
----------------------+-------+----------
Alabama | 67 | 36583
Alaska | 24 | 7296.5
Arizona | 15 | 116320
Arkansas | 75 | 20229
...
I'm trying to find a nice way to do this in standard BigQuery. I've noticed that there's undocumented percentile_cont
analytical function available, but I have to do some major hacks to get it to do what I want.
I'd like to be able to do the same sort thing with what I've gathered are the correct arguments:
SELECT
state,
COUNT(county),
PERCENTILE_CONT(population2000,
0.5) OVER () AS `medPop`
FROM
CENSUS
GROUP BY
state;
but this query yields the error
SELECT list expression references column population2000 which is neither grouped nor aggregated at
I can get the answer I want, but I'd be very disappointed if this is the recommended way to do what I want to do:
SELECT
MAX(nCounties) AS nCounties,
state,
MAX(medPop) AS medPop
FROM (
SELECT
nCounties,
T1.state,
(PERCENTILE_CONT(population2000,
0.5) OVER (PARTITION BY T1.state)) AS `medPop`
FROM
census T1
LEFT OUTER JOIN (
SELECT
COUNT(county) AS `nCounties`,
state
FROM
census
GROUP BY
state) T2
ON
T1.state = T2.state) T3
GROUP BY
state
Is there a better way to do what I want to do? Also, is the PERCENTILE_CONT
function ever going to be documented?
Thanks for reading!