Why does SELECT results differ between mysql and sqlite?
Asked Answered
O

3

20

I'm re-asking this question in a simplified and expanded manner.

Consider these sql statements:

create table foo (id INT, score INT);

insert into foo values (106, 4);
insert into foo values (107, 3);
insert into foo values (106, 5);
insert into foo values (107, 5);

select T1.id, avg(T1.score) avg1
from foo T1
group by T1.id
having not exists (
    select T2.id, avg(T2.score) avg2
    from foo T2
    group by T2.id
    having avg2 > avg1);

Using sqlite, the select statement returns:

id          avg1      
----------  ----------
106         4.5       
107         4.0       

and mysql returns:

+------+--------+
| id   | avg1   |
+------+--------+
|  106 | 4.5000 |
+------+--------+

As far as I can tell, mysql's results are correct, and sqlite's are incorrect. I tried to cast to real with sqlite as in the following but it returns two records still:

select T1.id, cast(avg(cast(T1.score as real)) as real) avg1
from foo T1
group by T1.id
having not exists (
    select T2.id, cast(avg(cast(T2.score as real)) as real) avg2
    from foo T2
    group by T2.id
    having avg2 > avg1);

Why does sqlite return two records?

Quick update:

I ran the statement against the latest sqlite version (3.7.11) and still get two records.

Another update:

I sent an email to [email protected] about the issue.

Myself, I've been playing with VDBE and found something interesting. I split the execution trace of each loop of not exists (one for each avg group).

To have three avg groups, I used the following statements:

create table foo (id VARCHAR(1), score INT);

insert into foo values ('c', 1.5);
insert into foo values ('b', 5.0);
insert into foo values ('a', 4.0);
insert into foo values ('a', 5.0);

PRAGMA vdbe_listing = 1;
PRAGMA vdbe_trace=ON;

select avg(score) avg1
from foo
group by id
having not exists (
    select avg(T2.score) avg2
    from foo T2
    group by T2.id
    having avg2 > avg1);

We clearly see that somehow what should be r:4.5 has become i:5:

enter image description here

I'm now trying to see why that is.

Final edit:

So I've been playing enough with the sqlite source code. I understand the beast much better now, although I'll let the original developer sort it out as he seems to already be doing it:

http://www.sqlite.org/src/info/430bb59d79

Interestingly, to me at least, it seems that the newer versions (some times after the version I'm using) supports inserting multiple records as used in a test case added in the aforementioned commit:

CREATE TABLE t34(x,y);
INSERT INTO t34 VALUES(106,4), (107,3), (106,5), (107,5);  
Overfly answered 16/4, 2012 at 9:11 Comment(12)
Just for kicks I ran this to SQL what SQL Server would produce and it complained avg2 and avg1 didn't exist. I replaced them with MAX(T2.score) and MAX(T1.score) and it gave the SQLite result. When I created the table with score REAL it gave the MySQL result. Perhaps your MySQL schema is different to sqlites?Sidwohl
@ta.speot.is: can you try adding as as in avg(T2.score) as avg2 (two occurences)?Overfly
Doesn't work. Pretty sure SQL Server doesn't play dice when it comes to using aliases in WHERE, GROUP BY or HAVING.Sidwohl
trying parts of the statement under sqlite I have this SQL error: no such function: exists. Keywords that are working under mysql may not in other databases systems. I would personally use the IN keyword for testing the results of your subqueryBurns
@Bathz: what version of sqlite are you using?Overfly
The documentation doesn't help. dev.mysql.com/doc/refman/5.0/en/group-by-functions.html - The SUM() and AVG() functions return a DECIMAL value for exact-value arguments (integer or DECIMAL). sqlite.org/lang_aggfunc.html - The result of avg() is always a floating point value ... even if all inputs are integers. And while we're at it: sqlfiddle.com/#!5/1123f/1Sidwohl
@ta.speot.is: yes I think mysql's resulte are indeed correct; and even sqlite seems to say that avg returns a real: select typeof(avg(score)) from foo; returns real.Overfly
It looks like you have discovered a genuine bug in SQLite (which does not surprise me a slightest bit). It is definitely not the precision, because it does not work even when you replace avg with sum, which does not require division.Batish
@sixfeetsix I run sqlite v.2.8.17 so yes maybe my advice does not help :) . However in my short dev life I never used the keyword EXISTS except for table creation and so on and when reading #25429 I may think it is not the best keyword for what you are trying to do.Burns
@Bathz: yes sqlite 2.X doesn't support that select statement (perhaps because EXISTS wasn't supported then...); in any case, I have nothing for or against EXISTS.Overfly
Ok, I took a look again with the two keywords versions, and indeed results are the same. Sqlite seems to incorrectly handle whether the subquery or the avg1 variable. Because when I hardcode the subquery result in a IN clause or the avg1 var in a EXISTS clause it returns the correct result...Burns
Seems like sqlite team needs a bug report.Matos
M
1

I tried to mess with some variants of query.

It seems, like sqlite has errors in using of previous declared fields in a nested HAVING expressions.

In your example avg1 under second having is always equal to 5.0

Look:

select T1.id, avg(T1.score) avg1
from foo T1
group by T1.id
having not exists (
    SELECT 1 AS col1 GROUP BY col1 HAVING avg1 = 5.0);

This one returns nothing, but execution of the following query returns both records:

...
having not exists (
    SELECT 1 AS col1 GROUP BY col1 HAVING avg1 <> 5.0);

I can not find any similar bug at sqlite tickets list.

Matos answered 17/4, 2012 at 12:36 Comment(1)
Yes I'm seeing something very similar with when tracing with VDBE. I already sent [email protected] an email about the issue.Overfly
W
1

Lets look at this two ways, i'll use postgres 9.0 as my reference database

(1)

-- select rows from foo 

select T1.id, avg(T1.score) avg1
from foo T1
group by T1.id
-- where we don't have any rows from T2
having  not exists (
-- select rows from foo
select T2.id, avg(T2.score) avg2
from foo T2
group by T2.id
-- where the average score for any row is greater than the average for 
-- any row in T1
having avg2 > avg1);

 id  |        avg1        
-----+--------------------
 106 | 4.5000000000000000
(1 row)

then let's move some of the logic inside the subquery, getting rid of the 'not' : (2)

-- select rows from foo 
select T1.id, avg(T1.score) avg1
from foo T1
group by T1.id
-- where we do have rows from T2
having  exists (
-- select rows from foo
select T2.id, avg(T2.score) avg2
from foo T2
group by T2.id
-- where the average score is less than or equal than the average for any row in T1
having avg2 <= avg1);
-- I think this expression will be true for all rows as we are in effect doing a
--cartesian join 
-- with the 'having' only we don't display the cartesian row set

 id  |        avg1        
-----+--------------------
 106 | 4.5000000000000000
 107 | 4.0000000000000000
(2 rows)

so you have got to ask yourself -- what do you actually mean when you do this correlated subquery inside a having clause, if it evaluates every row against every row from the primary query we are making a cartesian join and I don't think we should be pointing fingers at the SQL engine.

if you want every row that is less than the maximum average What you should be saying is:

select T1.id, avg(T1.score) avg1 
from foo T1 group by T1.id
having avg1 not in 
(select max(avg1) from (select id,avg(score) avg1 from foo group by id)) 
Wroughtup answered 18/4, 2012 at 6:32 Comment(3)
That select statement in my question is really not that difficult to understand; I simply want the correct result ;-)Overfly
I contend that it is ambiguous and will provide an indeterminate result across different technologiesWroughtup
It seems it's not ambiguous to MySql, MS Sql, and now Postgres; and it also seems that the main developer of sqlite is making changes in reaction to the email I sent to [email protected].Overfly
T
0

Have you tried this version? :

select T1.id, avg(T1.score) avg1
from foo T1
group by T1.id
having not exists (
    select T2.id, avg(T2.score) avg2
    from foo T2
    group by T2.id
    having avg(T2.score) > avg(T1.score));

Also this one (which should be giving same results):

select T1.*
from
  ( select id, avg(score) avg1
    from foo 
    group by id
  ) T1
where not exists (
    select T2.id, avg(T2.score) avg2
    from foo T2
    group by T2.id
    having avg(T2.score) > avg1);

The query can also be handled with derived tables, instead of subquery in HAVING clause:

select ta.id, ta.avg1
from 
  ( select id, avg(score) avg1
    from foo
    group by id
  ) ta
  JOIN
  ( select avg(score) avg1
    from foo 
    group by id
    order by avg1 DESC
    LIMIT 1
  ) tmp
  ON tmp.avg1 = ta.avg1 
Touchy answered 17/4, 2012 at 13:3 Comment(5)
I just did; same 2 records in sqlite, same 1 record in mysql.Overfly
Yes your second option I have already served as an answer; please see the question I linked at the beginning of this question.Overfly
As to why the original query doesn't work as expected, I suppose it's a bug, related to how subqueries are handled.Purine
@sixfeetsix: Does the JOIN version shows correct (1-row) result?Purine
@sixfeetsix: I've added one more version with derived subquery, turning HAVING into WHERE.Purine

© 2022 - 2024 — McMap. All rights reserved.