Guarantees when using user variables to number rows
Asked Answered
C

1

6

Using user variables to number rows

I often find answers here on SO suggesting the use of user variables to number some thing or other. Perhaps the clearest example would be a query to select every second row from a given result set. (This question and query is similar to this answer, but it was this answer which actually triggered this question here).

SELECT *
FROM (SELECT *, (@row := @row + 1) AS rownum
      FROM (SELECT @row := 0) AS init, tablename
      ORDER BY tablename.ordercol
     ) sub
WHERE rownum % 2 = 1

This approach does seem to usually work.

Reasons to be careful

On the other hand, the MySQ docs state:

As a general rule, you should never assign a value to a user variable and read the value within the same statement. You might get the results you expect, but this is not guaranteed. The order of evaluation for expressions involving user variables is undefined and may change based on the elements contained within a given statement; in addition, this order is not guaranteed to be the same between releases of the MySQL Server.

Core question

So my question is not how to achieve such an ordering using current servers, but instead whether the suggested solution using user variables is guaranteed to work under all (reasonable) circumstances and for all future versions of MySQL.

By “guarantees” I mean authoritative sources like the MySQL documentation or some standard MySQL claims conformance with. Lacking such authoritative answers, other sources like often-used tutorials or parts of to the MySQL source code might be quoted instead. By “works” I mean the fact that the assignments will be executed sequentially, once per row of the result, and in the order induced by the ORDER BY line.

Example of a breaking query

To give you an example how easily things fail:

SELECT *
FROM (SELECT *, (@row := @row + 1) AS rownum
      FROM (SELECT @row := 0) AS init, tablename
      HAVING rownum > 0
      ORDER BY tablename.ordercol
     ) sub
WHERE rownum % 2 = 1

will produce an empty result on the MySQL 5.5.27 currently installed on SQL Fiddle. The reason appears to be that the HAVING condition causes the rownum expression to get evaluated twice, so the final result will only have even numbers. I have an idea of what's going on behind the scenes, and I'm not claiming that the query with the HAVING makes much sense. I just want to demonstrate that there is a fine line between code which works and code which looks very similar but breaks.

Coverup answered 4/10, 2012 at 13:58 Comment(6)
@hvd, thanks for the pointer at the ROW_NUMBER feature request. Dems recently told me about this function. Seems quite powerfull, and well-defined to boot. Too bad there seems to be little progress on that request since 2008.Coverup
I deleted my comment when I saw that ROW_NUMBER was already mentioned in comments and answers to your earlier question :)Coptic
Have you read this article already? Particularly the part MySQL doesn’t evaluate expressions containing user variables until they are sent to the client, Transship
@MartinSmith: No, I hadn't read that article yet. The amount of experimenting this guy invested shows that there is too little definite knowledge about which things will work for sure. I'm not sure how this “until they are sent to the client” stuff agrees with the double-execution I noticed in the subquery from the breaking example above.Coverup
Well it's only guaranteed for future versions of MySQL if the developers say that it is guaranteed. And from the quote in your question they clearly don't and explicitly warn against assuming that. Seems a bit similar to the quirky update debate in SQL Server land.Transship
You're not running in ANSI mode (for shame!). Can you turn on ONLY_FULL_GROUP_BY and come up with a breaking query? Also, can you do the same after moving initialization of @row outside of the derived query? This avoids a bug^W^Wunpleasant behavior that sometimes occurs.Interjacent
M
10

You misread the statement. It relates to the order of expressions in the SELECT list, when using multiple variables.
As presented, the ORDER BY on this single-variable statement has a guaranteed order up to the current version of MySQL and nothing in that text suggests it will change.

But guarantee the future? Who knows.


Regarding the breaking query, you've again misunderstood how MySQL works. Let's break down your query. Take note of this statement in the manual

In a SELECT statement, each select expression is evaluated only when sent to the client. This means that in a HAVING, GROUP BY, or ORDER BY clause, referring to a variable that is assigned a value in the select expression list does not work as expected

The order of processing of queries is roughly

FROM / JOIN
WHERE / ON
GROUP BY / ROLLUP
HAVING
UNION
SELECT
ORDER BY
@variable resolution

Your "broken" query attempts to use the variable WITHIN the same level, which is just about as sinful as using a WHERE/HAVING clause against a column alias. That's why you'll never see MySQL variable-based row_numbering solutions using the variable on the same query-level, it is always in a subquery. The outer query can be considered the client of the inner query at which stage the variable/placeholder-expression has been rendered. By your argument, you can just as easily break it using a WHERE clause involving the @row directly (yes it will run!).

Maureenmaureene answered 4/10, 2012 at 14:5 Comment(7)
I'm not convinced, but even if I measread the statement, that still leaves us without any explicit guarantees either, and my question remains. When you write that current versions do guarantee order, have you any references to back this up, or is it just personal experience?Coverup
Aside from personal experience, consider the fact that the solution is well known and oft-quoted. Find me any reference or feedback that it hasn't worked at all times. That is good enough for me.Maureenmaureene
This answer didn't work for the OP of that question, according to comments and a followup question. So far I personally consider this problem here to be the most likely cause.Coverup
You're putting square peg into a round hole. This sqlfiddle.com/#!2/3b7b4/14/0 is the form we are discussing. The link you sent is about ORDER BY clause in GROUP_CONCAT function. And that itself is a known bug in some MySQL versions. If you want to extend your argument to bugs, then I concede that nothing is guaranteed.Maureenmaureene
You are right, it's not exactly the same thing. I should have been more verbose. The point I as trying to make is that in some cases, things which seem obvious and work in one setup may fail in another setup. Trying to finde out what I can rely on and what I cannot, I started looking at the docs, trying to find guarantees. Without success so far. I couldn't get the form discussed here to break yet.Coverup
Finally found a variation of my original query which does break. Introducing a HAVING on the row number column did the trick. Added this to my question. Also added tutorials and source code as other possible sources of answers, in case you want to quote any of these.Coverup
@Coverup using HAVING or ORDER BY ... DESC in your sub-query in which you set the rank will fail, you can wrap that in a query and safely use those in that... but not in the same place where you set the rank, anything that is evaluated after WHERE will break the ranking unless you use the results in a second query. Hope that clears any doubts you might still have.Trypanosomiasis

© 2022 - 2024 — McMap. All rights reserved.