Find the mode in BigQuery

Asked 27/10, 2015 at 14:32 Answered 4/9, 2024 at 16:25

The mode is the value that appears most often in a set.

I would like something like:

SELECT
    t.id as t_id,
    GROUP_CONCAT(t.value) as value_list,
    MODE(t.value) AS value_mode
FROM dataset.table as t
GROUP BY t_id

such that, for example:

t_id    value_list     value_mode
1       2,2,2,3,6,6    2

How is that done?

EDIT: The value_list is just there for illustration purpose. Only need the mode

Particle answered 27/10, 2015 at 14:32 Comment(0)

select id, value as value_list, v as value_mode
from (
  select 
    id, value, v, 
    count(1) as c, 
    row_number() over(partition by id order by c desc) as top
  from (
    select id, value, split(value) as v 
    from dataset.table 
  )
  group by id, value, v
)
where top = 1

Townshend answered 27/10, 2015 at 20:25 Comment(0)

I often have to find the mode of prices for respective groups (e.g. length and amps) to filter out sale prices and the like. I typically use two methods both with creating an array and un-nesting it in order of frequency. One method I use is by a LIMIT another with an [OFFSET(0)] in case you want to get Nth values.

Both are included below:

WITH t AS (SELECT 18 AS length, 
'HIGH' as amps, 
99.95 price UNION ALL
SELECT 18,  "HIGH", 99.95 UNION ALL
SELECT 18,  "HIGH", 5.95 UNION ALL
SELECT 18,  "LOW", 33.95 UNION ALL
SELECT 18,  "LOW", 33.95 UNION ALL
SELECT 18,  "LOW", 4.5 UNION ALL
SELECT 3,  "HIGH", 77.95 UNION ALL
SELECT 3,  "HIGH", 77.95 UNION ALL
SELECT 3,  "HIGH", 9.99 UNION ALL
SELECT 3,  "LOW", 44.95 UNION ALL
SELECT 3,  "LOW", 44.95 UNION ALL
SELECT 3,  "LOW", 5.65 
)

SELECT
  length,
  amps,

  -- By Limit
  (SELECT x FROM UNNEST(price_array) x 
    GROUP BY x ORDER BY COUNT(*) DESC LIMIT 1 ) most_freq_price,
  (SELECT x FROM UNNEST(price_array) x 
    GROUP BY x ORDER BY COUNT(*) ASC  LIMIT 1 ) least_freq_price,

  -- By Offset
  ARRAY((SELECT x FROM UNNEST(price_array) x 
    GROUP BY x ORDER BY COUNT(*) DESC))[OFFSET(0)] most_freq_price_offset,
  ARRAY((SELECT x FROM UNNEST(price_array) x 
    GROUP BY x ORDER BY COUNT(*) ASC))[OFFSET(0)] least_freq_price_offset

FROM (
  SELECT 
    length,
    amps,
    ARRAY_AGG(price) price_array
  FROM t
  GROUP BY 1,2
 )

Haileyhailfellowwellmet answered 6/12, 2019 at 11:11 Comment(0)

There is a direct function available now

approx_top_count()

Here is an example of its usage https://cloud.google.com/bigquery/docs/reference/standard-sql/approximate_aggregate_functions#approx_top_count

Drought answered 11/11, 2022 at 8:39 Comment(1)

Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center. – Lenette 15/11, 2022 at 12:24

For your example, this is how I would solve it:

SELECT x, w mode
FROM (
  SELECT COUNT(*) c, w, ROW_NUMBER() OVER(ORDER BY c DESC) rn, FIRST(x) x
  FROM (
    SELECT SPLIT(x) w, x FROM (SELECT "2,2,2,3,6,6" x)
  )
  GROUP BY 2
)
WHERE rn=1

And with the GROUP_CONCAT within query:

SELECT gc, w mode
FROM (
  SELECT COUNT(*) c, w, ROW_NUMBER() OVER(ORDER BY c DESC) rn, FIRST(gc) gc
  FROM (
    SELECT GROUP_CONCAT(w) OVER() gc, w
    FROM (FLATTEN((
      SELECT SPLIT(x) w, x FROM (SELECT "2,2,2,3,6,6" x)), w)
    )
  )
  GROUP BY 2
)
WHERE rn=1

And handling partitions:

SELECT tid, gc value_list, w value_mode
FROM (
  SELECT tid, COUNT(*) c, w, ROW_NUMBER() OVER(PARTITION BY tid ORDER BY c DESC) rn, FIRST(gc) gc
  FROM (
    SELECT tid, GROUP_CONCAT(w) OVER(PARTITION BY tid) gc, w
    FROM (FLATTEN((
      SELECT 1 tid, SPLIT(x) w, x FROM (SELECT "2,2,2,3,6,6" x)), w)
    )
  )
  GROUP BY tid, w
)
WHERE rn=1

Newel answered 27/10, 2015 at 17:44 Comment(2)

How would this work with GROUP BY t_id ? Do you need to add PARTITION BY to your query ? – Swanner 27/10, 2015 at 17:51

I would have done that if the example in the question would have been more complex :). (My answers are way more specific when the questions include a table I can work with) – Newel 27/10, 2015 at 18:51

To expand on the idea of using APPROX_TOP_COUNT given here, you can emulate (approximative) mode like this:

SELECT (APPROX_TOP_COUNT(x, 1))[0].value as mode
FROM UNNEST(["apple", "apple", "pear", "pear", "pear", "banana"]) as x;

Hisakohisbe answered 4/9, 2024 at 16:25 Comment(0)

Recommended topics

Hot tags