must appear in the GROUP BY clause or be used in an aggregate function

Asked 26/10, 2013 at 1:50 Answered 8/1, 2019 at 17:9

Solved sql group-by aggregate-functions postgresql-9.1

471

I have a table that looks like this caller 'makerar'

cname	wmname	avg
canada	zoro	2.0000000000000000
spain	luffy	1.00000000000000000000
spain	usopp	5.0000000000000000

And I want to select the maximum avg for each cname.

SELECT cname, wmname, MAX(avg)  FROM makerar GROUP BY cname;

but I will get an error,

ERROR:  column "makerar.wmname" must appear in the GROUP BY clause or be used in an   aggregate function 
LINE 1: SELECT cname, wmname, MAX(avg)  FROM makerar GROUP BY cname;

so i do this

SELECT cname, wmname, MAX(avg)  FROM makerar GROUP BY cname, wmname;

however this will not give the intented results, and the incorrect output below is shown.

cname	wmname	max
canada	zoro	2.0000000000000000
spain	luffy	1.00000000000000000000
spain	usopp	5.0000000000000000

Actual Results should be

cname	wmname	max
canada	zoro	2.0000000000000000
spain	usopp	5.0000000000000000

How can I go about fixing this issue?

Note: This table is a VIEW created from a previous operation.

Nepali answered 26/10, 2013 at 1:50 Comment(4)

I don't understand. Why is wmname="usopp" expected and not for example wmname="luffy"? – Aidoneus 7/4, 2020 at 10:14

@Aidoneus Because the max of (1, 5) is 5, and 5 is associated with "usopp", not with "luffy". That is the intended/desired result. – Oys 16/8, 2021 at 22:52

If the avg for (spain, luffy) is the same as for (spain, usopp), then you need to specify what is the intended behavior... Choose one randomly? Use other columns as "tie-breakers"? – Arni 22/6, 2023 at 18:43

383

Yes, this is a common aggregation problem. Before SQL3 (1999), the selected fields must appear in the GROUP BY clause[*].

To workaround this issue, you must calculate the aggregate in a sub-query and then join it with itself to get the additional columns you'd need to show:

SELECT m.cname, m.wmname, t.mx
FROM (
    SELECT cname, MAX(avg) AS mx
    FROM makerar
    GROUP BY cname
    ) t JOIN makerar m ON m.cname = t.cname AND t.mx = m.avg
;

 cname  | wmname |          mx           
--------+--------+------------------------
 canada | zoro   |     2.0000000000000000
 spain  | usopp  |     5.0000000000000000

But you may also use window functions, which looks simpler:

SELECT cname, wmname, MAX(avg) OVER (PARTITION BY cname) AS mx
FROM makerar
;

The only thing with this method is that it will show all records (window functions do not group). But it will show the correct (i.e. maxed at cname level) MAX for the country in each row, so it's up to you:

 cname  | wmname |          mx           
--------+--------+------------------------
 canada | zoro   |     2.0000000000000000
 spain  | luffy  |     5.0000000000000000
 spain  | usopp  |     5.0000000000000000

The solution, arguably less elegant, to show the only (cname, wmname) tuples matching the max value, is:

SELECT DISTINCT /* distinct here matters, because maybe there are various tuples for the same max value */
    m.cname, m.wmname, t.avg AS mx
FROM (
    SELECT cname, wmname, avg, ROW_NUMBER() OVER (PARTITION BY avg DESC) AS rn 
    FROM makerar
) t JOIN makerar m ON m.cname = t.cname AND m.wmname = t.wmname AND t.rn = 1
;


 cname  | wmname |          mx           
--------+--------+------------------------
 canada | zoro   |     2.0000000000000000
 spain  | usopp  |     5.0000000000000000

[*]: Interestingly enough, even though the spec sort of allows to select non-grouped fields, major engines seem to not really like it. Oracle and SQLServer just don't allow this at all. Mysql used to allow it by default, but now since 5.7 the administrator needs to enable this option (ONLY_FULL_GROUP_BY) manually in the server configuration for this feature to be supported...

Goldarned answered 26/10, 2013 at 2:2 Comment(9)

Thanks syntax is corect, but, you must compare values of mx and avg when joining – Nepali 26/10, 2013 at 2:9

Yes your syntax is correct and eliminates duplicates however you need m.avg=t.mx in the end (after you wrote JOING) to get intended results – Nepali 31/10, 2013 at 1:58

@Goldarned It can be done without joining on MAX (see answer by @ypercube, there is also another solution in my answer) but not the way you do it. Check expected output. – Brothel 1/11, 2013 at 9:8

@Goldarned Your solution only adds a column (the MAX avg per cname) but it doesn't restrict the rows of the result (as the OP wants). See the Actual Results should be paragraph in the question. – Twentytwo 1/11, 2013 at 13:4

@ypercube, the join by max value does give a result compatible with the expected results in the question. However, if 2 rows in a group have the same value for avg, both of them will be retrieved. I didn't see anything specifically discarding this use case though. – Goldarned 12/1, 2015 at 0:39

Turning off ONLY_FULL_GROUP_BY in MySQL 5.7 does not activate the way the SQL standard specifies when columns can be omitted from the group by (or makes MySQL behave like Postgres). It just reverts to the old behavior where MySQL returns random (= "indeterminate") results instead. – Rock 17/11, 2015 at 7:40

Is there any other solutions now? I am using aws athena – Tito 10/7, 2020 at 2:25

Note about the "self" JOIN sub-query: If the avg for (spain, luffy) is the same as for (spain, usopp), then there is a "tie" and the final results would include both cases: (canada, zoro 2.000), (spain, usopp, 5.000), (spain, luffy, 5.000) – Arni 23/6, 2023 at 1:0

If you use the window function solution, there is no need for join. Simplified:

SELECT  t.cname, t.wmname, t.avg AS mx FROM (     SELECT cname, wmname, avg, ROW_NUMBER() OVER (PARTITION BY avg DESC) AS rn      FROM makerar ) AS t WHERE t.rn = 1 ;

– Twentytwo 25/1 at 19:1

182

In Postgres, you can also use the special DISTINCT ON (expression) syntax:

SELECT DISTINCT ON (cname) 
    cname, wmname, avg
FROM 
    makerar 
ORDER BY 
    cname, avg DESC ;

Maier answered 1/11, 2013 at 9:5 Comment(9)

It won't work as it's expected if one wants to sort columns like avg – Woodward 26/11, 2014 at 13:14

@Woodward What do you mean? If one wants to have the result set sorted with a different order than BY cname ? – Twentytwo 26/11, 2014 at 13:19

@ypercube, Actually psql sorts first and then applies DISTINCT. In case of sorting by avg we'll get different results for every row minimal and maximal values depending on sort direction – Woodward 26/11, 2014 at 13:42

Of course. If you don't run the query I posted, you'll get different results! That's not the same as "it won't work as expected"... – Twentytwo 26/11, 2014 at 14:32

@Batfan thnx. Note that while this is quite cool, compact and easy to write, it's not often the most efficient way for this kind of queries. – Twentytwo 11/8, 2015 at 19:18

Does it support in Redshift? – Livvie 12/2, 2016 at 3:9

Although I really like the syntax, I've had trouble optimizing these queries, and it's easy to write suboptimal ones. Perhaps I'm just missing something. – Frontogenesis 16/8, 2016 at 20:19

@EricWalker no you are not. While the syntax is nice and compact, it's very often not the most efficient way to write such queries. Perhaps I should add a note in my answer. – Twentytwo 12/4, 2018 at 12:46

@Livvie no, it's not yet supported – Bankbook 26/2, 2020 at 3:31

The problem with specifying non-grouped and non-aggregate fields in group by selects is that engine has no way of knowing which record's field it should return in this case. Is it first? Is it last? There is usually no record that naturally corresponds to aggregated result (min and max are exceptions).

However, there is a workaround: make the required field aggregated as well. In postgres, this should work:

SELECT cname, (array_agg(wmname ORDER BY avg DESC))[1], MAX(avg)
FROM makerar GROUP BY cname;

Note that this creates an array of all wnames, ordered by avg, and returns the first element (arrays in postgres are 1-based).

Lavoie answered 20/11, 2015 at 15:51 Comment(5)

Good point. Though it seems possible that the DB could do an outer join to link the non-aggregate fields from each row to the aggregated result to which the row contributed. I've often been curious why they don't have an option for that. Though I could simply be ignorant of this option :) – Brooklyn 19/1, 2016 at 22:18

it's nice solution for pgsql, but any mysql equivalent? I don't like multi select way. – Creator 21/1, 2021 at 17:53

GREAT use of array_agg I didn't know you can sort right in the params – Epithet 5/5, 2021 at 19:5

array_agg is what I was looking for to return all the items that the GROUP BY matches – Stoughton 25/3, 2022 at 16:33

In this case, MAX(avg) can be replaced with just avgbecause they may no different after having them sorted out by the aggregation before, right? – Acton 1/1, 2023 at 14:2

For me, it is not about a "common aggregation problem", but just about an incorrect SQL query. The single correct answer for "select the maximum avg for each cname..." is

SELECT cname, MAX(avg) FROM makerar GROUP BY cname;

The result will be:

 cname  |      MAX(avg)
--------+---------------------
 canada | 2.0000000000000000
 spain  | 5.0000000000000000

This result in general answers the question "What is the best result for each group?". We see that the best result for spain is 5 and for canada the best result is 2. It is true, and there is no error. If we need to display wmname also, we have to answer the question: "What is the RULE to choose wmname from resulting set?" Let's change the input data a bit to clarify the mistake:

  cname | wmname |        avg           
--------+--------+-----------------------
 spain  | zoro   |  1.0000000000000000
 spain  | luffy  |  5.0000000000000000
 spain  | usopp  |  5.0000000000000000

Which result do you expect on runnig this query: SELECT cname, wmname, MAX(avg) FROM makerar GROUP BY cname;? Should it be spain+luffy or spain+usopp? Why? It is not determined in the query how to choose "better" wmname if several are suitable, so the result is also not determined. That's why SQL interpreter returns an error - the query is not correct.

In the other word, there is no correct answer to the question "Who is the best in spain group?". Luffy is not better than usopp, because usopp has the same "score".

Volnay answered 8/1, 2019 at 17:9 Comment(1)

This solution worked for me too. I had query problems because my ORM also included the associated primary key, resulting in the following incorrect query: SELECT cname, id, MAX(avg) FROM makerar GROUP BY cname;, which did give this misleading error. – Oleta 2/3, 2020 at 15:21

SELECT t1.cname, t1.wmname, t2.max
FROM makerar t1 JOIN (
    SELECT cname, MAX(avg) max
    FROM makerar
    GROUP BY cname ) t2
ON t1.cname = t2.cname AND t1.avg = t2.max;

Using rank() window function:

SELECT cname, wmname, avg
FROM (
    SELECT cname, wmname, avg, rank() 
    OVER (PARTITION BY cname ORDER BY avg DESC)
    FROM makerar) t
WHERE rank = 1;

Note

Either one will preserve multiple max values per group. If you want only single record per group even if there is more than one record with avg equal to max you should check @ypercube's answer.

Brothel answered 26/10, 2013 at 2:6 Comment(0)

This seems to work as well

SELECT *
FROM makerar m1
WHERE m1.avg = (SELECT MAX(avg)
                FROM makerar m2
                WHERE m1.cname = m2.cname
               )

Northeast answered 30/1, 2017 at 21:52 Comment(0)

-1

I recently run into this problem, when trying to count using case when, and found that changing the order of the which and count statements fixes the problem:

SELECT date(dateday) as pick_day,
COUNT(CASE WHEN (apples = 'TRUE' OR oranges 'TRUE') THEN fruit END)  AS fruit_counter

FROM pickings

GROUP BY 1

Instead of using - in the latter, where I got errors that apples and oranges should appear in aggregate functions

CASE WHEN ((apples = 'TRUE' OR oranges 'TRUE') THEN COUNT(*) END) END AS fruit_counter

Bernicebernie answered 9/11, 2016 at 14:7 Comment(1)

The which statement? – Notable 20/3, 2018 at 0:0

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags