There is one more reason for why does SQL requires that I specify on which attributes to group.
Lets sat we have two simple tables: friend
and car
, where we store info about our friends and their cars.
And lets say we want to show all our friends's data (from table friend
) and for everyone of our friends, how many cars they own now, have sold, have crashed and the total number. Oh, and we want the elders first, younger last.
We'd do something like:
SELECT f.id
, f.firstname
, f.lastname
, f.birthdate
, COUNT(NOT c.sold AND NOT c.crashed) AS owned
, COUNT(c.sold) AS sold
, COUNT(c.crashed) AS crashed
, COUNT(c.friendid) AS totalcars
FROM friend f
LEFT JOIN car c <--to catch (shame!) those friends who have never had a car
ON f.id = c.friendid
GROUP BY f.id
, f.firstname
, f.lastname
, f.birthdate
ORDER BY f.birthdate DESC
But do we really need all those fields in the GROUP BY
? Isn't every friend uniquely determined by his id
? In other words, aren't the firstname, lastname and birthdate
functionally dependend on the f.id
? Why not just do (as we can in MySQL):
SELECT f.id
, f.firstname
, f.lastname
, f.birthdate
, COUNT(NOT c.sold AND NOT c.crashed) AS owned
, COUNT(c.sold) AS sold
, COUNT(c.crashed) AS crashed
, COUNT(c.friendid) AS totalcars
FROM friend f
LEFT JOIN car c <--to catch (shame!) those friends who have never had a car
ON f.id = c.friendid
GROUP BY f.id
ORDER BY f.birthdate
And what if we had 20 fields in the SELECT
(plus ORDER BY
) parts? Isn't the second query shorter, clearer and probably faster (in the RDBMS that accept it)?
I say yes. So, do the SQL 1999 and 2003 specs say, if this article is correct: Debunking group by myths