Let's say you have a query like this:
SELECT g, v
FROM t
GROUP BY g;
In this case, for each possible value for g
, MySQL picks one of the corresponding values of v
.
However, which one is chosen, depends on some circumstances.
I read somewhere that for each group of g, the first value of v
is kept, in the order how the records were inserted into the table t
.
This is quite ugly, because the records in a table should be treated as a set where the order of the elements should not matter. This is so "mysql-ish"...
If you want to determine which value for v
to keep, you need to apply a subselect for t
like this:
SELECT g, v
FROM (
SELECT *
FROM t
ORDER BY g, v DESC
) q
GROUP BY g;
This way you define which order the records of the subquery are processed by the external query, thus you can trust which value of v
it will pick for the individual values of g
.
However, if you need some WHERE conditions then be very careful. If you add the WHERE
condition to the subquery then it will keep the behaviour, it will always return the value you expect:
SELECT g, v
FROM (
SELECT *
FROM t
WHERE g = '737a8783-110c-447e-b4c2-1cbb7c6b72c9'
ORDER BY g, v DESC
) q
GROUP BY g;
This is what you expect, the subselect filters and orders the table. It keeps the records where g
has the given value and the external query returns that g
and the first value for v
.
However, if you add the same WHERE
condition to the outer query then you get a non-deterministic result:
SELECT g, v
FROM (
SELECT *
FROM t
-- WHERE g = '737a8783-110c-447e-b4c2-1cbb7c6b72c9'
ORDER BY g, v DESC
) q
WHERE g = '737a8783-110c-447e-b4c2-1cbb7c6b72c9'
GROUP BY g;
Surprisingly, you may get different values for v
when executing the same query again and again which is... strange. The expected behaviour is to get all the records in the appropriate order from the subquery, filtering them in the outer query and then picking the same as it picked in the previous example. But it does not.
It picks a value for v
seemingly randomly. The same query returned different values for v
if I executed more (~20) times, but the distribution was not uniform.
If instead of adding an outer WHERE
, you specify a HAVING
condition like this:
SELECT g, v
FROM (
SELECT *
FROM t1
-- WHERE g = '737a8783-110c-447e-b4c2-1cbb7c6b72c9'
ORDER BY g, v DESC
) q
-- WHERE g = '737a8783-110c-447e-b4c2-1cbb7c6b72c9'
GROUP BY g
HAVING g = '737a8783-110c-447e-b4c2-1cbb7c6b72c9';
Then you get a consistent behaviour again.
CONCLUSION
I would suggest not to rely on this technique at all. If you really want/need to then avoid WHERE
conditions in the outer query. Use it in the inner query if you can or a HAVING
clause in the outer query.
I tested it with this data:
CREATE TABLE t1 (
v INT,
g VARCHAR(36)
);
INSERT INTO t1 VALUES (1, '737a8783-110c-447e-b4c2-1cbb7c6b72c9');
INSERT INTO t1 VALUES (2, '737a8783-110c-447e-b4c2-1cbb7c6b72c9');
in MySQL 5.6.41.
Maybe it is just a bug that gets/got fixed in newer versions, please give feedback if you have experience with newer versions.