Using user variables to number rows
I often find answers here on SO suggesting the use of user variables to number some thing or other. Perhaps the clearest example would be a query to select every second row from a given result set. (This question and query is similar to this answer, but it was this answer which actually triggered this question here).
SELECT *
FROM (SELECT *, (@row := @row + 1) AS rownum
FROM (SELECT @row := 0) AS init, tablename
ORDER BY tablename.ordercol
) sub
WHERE rownum % 2 = 1
This approach does seem to usually work.
Reasons to be careful
On the other hand, the MySQ docs state:
As a general rule, you should never assign a value to a user variable and read the value within the same statement. You might get the results you expect, but this is not guaranteed. The order of evaluation for expressions involving user variables is undefined and may change based on the elements contained within a given statement; in addition, this order is not guaranteed to be the same between releases of the MySQL Server.
Core question
So my question is not how to achieve such an ordering using current servers, but instead whether the suggested solution using user variables is guaranteed to work under all (reasonable) circumstances and for all future versions of MySQL.
By “guarantees” I mean authoritative sources like the MySQL documentation or some standard MySQL claims conformance with. Lacking such authoritative answers, other sources like often-used tutorials or parts of to the MySQL source code might be quoted instead. By “works” I mean the fact that the assignments will be executed sequentially, once per row of the result, and in the order induced by the ORDER BY
line.
Example of a breaking query
To give you an example how easily things fail:
SELECT *
FROM (SELECT *, (@row := @row + 1) AS rownum
FROM (SELECT @row := 0) AS init, tablename
HAVING rownum > 0
ORDER BY tablename.ordercol
) sub
WHERE rownum % 2 = 1
will produce an empty result on the MySQL 5.5.27 currently installed on SQL Fiddle. The reason appears to be that the HAVING
condition causes the rownum
expression to get evaluated twice, so the final result will only have even numbers. I have an idea of what's going on behind the scenes, and I'm not claiming that the query with the HAVING
makes much sense. I just want to demonstrate that there is a fine line between code which works and code which looks very similar but breaks.
ROW_NUMBER
feature request. Dems recently told me about this function. Seems quite powerfull, and well-defined to boot. Too bad there seems to be little progress on that request since 2008. – CoverupROW_NUMBER
was already mentioned in comments and answers to your earlier question :) – CopticMySQL doesn’t evaluate expressions containing user variables until they are sent to the client,
– Transship@row
outside of the derived query? This avoids a bug^W^Wunpleasant behavior that sometimes occurs. – Interjacent