Quick and dirty:
SELECT group_name, color, count(*) AS ct
FROM (
SELECT group_name, unnest(favorite_colors) AS color
FROM tbl
) sub
GROUP BY 1,2
ORDER BY 1,3 DESC;
In Postgres 9.3 or later this is the cleaner form:
SELECT group_name, color, count(*) AS ct
FROM tbl t, unnest(t.favorite_colors) AS color
GROUP BY 1,2
ORDER BY 1,3 DESC;
The above is shorthand for
...
FROM tbl t
JOIN LATERAL unnest(t.favorite_colors) AS color ON TRUE
...
And like with any other INNER JOIN
, it would exclude rows without color (favorite_colors IS NULL
) - as did the first query.
To include such rows in the result, use instead:
SELECT group_name, color, count(*) AS ct
FROM tbl t
LEFT JOIN LATERAL unnest(t.favorite_colors) AS color ON TRUE
GROUP BY 1,2
ORDER BY 1,3 DESC;
You can easily aggregate the "most common" colors per group in the next step, but you'd need to define "most common colors" first ...
Most common colors
As per comment, pick colors with > 3 occurrences.
SELECT t.group_name, color, count(*) AS ct
FROM tbl t, unnest(t.favorite_colors) AS color
GROUP BY 1,2
HAVING count(*) > 3
ORDER BY 1,3 DESC;
To aggregate the top colors in an array (in descending order):
SELECT group_name, array_agg(color) AS top_colors
FROM (
SELECT group_name, color
FROM tbl t, unnest(t.favorite_colors) AS color
GROUP BY 1,2
HAVING count(*) > 3
ORDER BY 1, count(*) DESC
) sub
GROUP BY 1;
-> SQLfiddle demonstrating all.
HAVING
so I could ask for only the colors that have more than 3 occurrences in a group, where would I put theHAVING
. I keep getting syntax errors. – Pollster