Why does SELECT results differ between mysql and sqlite?

Asked 16/4, 2012 at 9:11 Answered 18/4, 2012 at 6:32

I'm re-asking this question in a simplified and expanded manner.

Consider these sql statements:

create table foo (id INT, score INT);

insert into foo values (106, 4);
insert into foo values (107, 3);
insert into foo values (106, 5);
insert into foo values (107, 5);

select T1.id, avg(T1.score) avg1
from foo T1
group by T1.id
having not exists (
    select T2.id, avg(T2.score) avg2
    from foo T2
    group by T2.id
    having avg2 > avg1);

Using sqlite, the select statement returns:

id          avg1      
----------  ----------
106         4.5       
107         4.0

and mysql returns:

+------+--------+
| id   | avg1   |
+------+--------+
|  106 | 4.5000 |
+------+--------+

As far as I can tell, mysql's results are correct, and sqlite's are incorrect. I tried to cast to real with sqlite as in the following but it returns two records still:

select T1.id, cast(avg(cast(T1.score as real)) as real) avg1
from foo T1
group by T1.id
having not exists (
    select T2.id, cast(avg(cast(T2.score as real)) as real) avg2
    from foo T2
    group by T2.id
    having avg2 > avg1);

Why does sqlite return two records?

Quick update:

I ran the statement against the latest sqlite version (3.7.11) and still get two records.

Another update:

I sent an email to [email protected] about the issue.

Myself, I've been playing with VDBE and found something interesting. I split the execution trace of each loop of not exists (one for each avg group).

To have three avg groups, I used the following statements:

create table foo (id VARCHAR(1), score INT);

insert into foo values ('c', 1.5);
insert into foo values ('b', 5.0);
insert into foo values ('a', 4.0);
insert into foo values ('a', 5.0);

PRAGMA vdbe_listing = 1;
PRAGMA vdbe_trace=ON;

select avg(score) avg1
from foo
group by id
having not exists (
    select avg(T2.score) avg2
    from foo T2
    group by T2.id
    having avg2 > avg1);

We clearly see that somehow what should be r:4.5 has become i:5:

enter image description here

I'm now trying to see why that is.

Final edit:

So I've been playing enough with the sqlite source code. I understand the beast much better now, although I'll let the original developer sort it out as he seems to already be doing it:

http://www.sqlite.org/src/info/430bb59d79

Interestingly, to me at least, it seems that the newer versions (some times after the version I'm using) supports inserting multiple records as used in a test case added in the aforementioned commit:

CREATE TABLE t34(x,y);
INSERT INTO t34 VALUES(106,4), (107,3), (106,5), (107,5);

Overfly answered 16/4, 2012 at 9:11 Comment(12)

Just for kicks I ran this to SQL what SQL Server would produce and it complained avg2 and avg1 didn't exist. I replaced them with MAX(T2.score) and MAX(T1.score) and it gave the SQLite result. When I created the table with score REAL it gave the MySQL result. Perhaps your MySQL schema is different to sqlites? – Sidwohl 16/4, 2012 at 9:42

@ta.speot.is: can you try adding as as in avg(T2.score) as avg2 (two occurences)? – Overfly 16/4, 2012 at 9:44

Doesn't work. Pretty sure SQL Server doesn't play dice when it comes to using aliases in WHERE, GROUP BY or HAVING. – Sidwohl 16/4, 2012 at 9:45

trying parts of the statement under sqlite I have this SQL error: no such function: exists. Keywords that are working under mysql may not in other databases systems. I would personally use the IN keyword for testing the results of your subquery – Burns 16/4, 2012 at 9:45

@Bathz: what version of sqlite are you using? – Overfly 16/4, 2012 at 9:46

The documentation doesn't help. dev.mysql.com/doc/refman/5.0/en/group-by-functions.html - The SUM() and AVG() functions return a DECIMAL value for exact-value arguments (integer or DECIMAL). sqlite.org/lang_aggfunc.html - The result of avg() is always a floating point value ... even if all inputs are integers. And while we're at it: sqlfiddle.com/#!5/1123f/1 – Sidwohl 16/4, 2012 at 9:47

@ta.speot.is: yes I think mysql's resulte are indeed correct; and even sqlite seems to say that avg returns a real: select typeof(avg(score)) from foo; returns real. – Overfly 16/4, 2012 at 9:51

It looks like you have discovered a genuine bug in SQLite (which does not surprise me a slightest bit). It is definitely not the precision, because it does not work even when you replace avg with sum, which does not require division. – Batish 16/4, 2012 at 9:51

@sixfeetsix I run sqlite v.2.8.17 so yes maybe my advice does not help :) . However in my short dev life I never used the keyword EXISTS except for table creation and so on and when reading #25429 I may think it is not the best keyword for what you are trying to do. – Burns 16/4, 2012 at 9:51

@Bathz: yes sqlite 2.X doesn't support that select statement (perhaps because EXISTS wasn't supported then...); in any case, I have nothing for or against EXISTS. – Overfly 16/4, 2012 at 9:54

Ok, I took a look again with the two keywords versions, and indeed results are the same. Sqlite seems to incorrectly handle whether the subquery or the avg1 variable. Because when I hardcode the subquery result in a IN clause or the avg1 var in a EXISTS clause it returns the correct result... – Burns 16/4, 2012 at 11:18

Seems like sqlite team needs a bug report. – Matos 17/4, 2012 at 12:4

I tried to mess with some variants of query.

It seems, like sqlite has errors in using of previous declared fields in a nested HAVING expressions.

In your example avg1 under second having is always equal to 5.0

Look:

select T1.id, avg(T1.score) avg1
from foo T1
group by T1.id
having not exists (
    SELECT 1 AS col1 GROUP BY col1 HAVING avg1 = 5.0);

This one returns nothing, but execution of the following query returns both records:

...
having not exists (
    SELECT 1 AS col1 GROUP BY col1 HAVING avg1 <> 5.0);

I can not find any similar bug at sqlite tickets list.

Matos answered 17/4, 2012 at 12:36 Comment(1)

Yes I'm seeing something very similar with when tracing with VDBE. I already sent [email protected] an email about the issue. – Overfly 17/4, 2012 at 12:49

Lets look at this two ways, i'll use postgres 9.0 as my reference database

(1)

-- select rows from foo 

select T1.id, avg(T1.score) avg1
from foo T1
group by T1.id
-- where we don't have any rows from T2
having  not exists (
-- select rows from foo
select T2.id, avg(T2.score) avg2
from foo T2
group by T2.id
-- where the average score for any row is greater than the average for 
-- any row in T1
having avg2 > avg1);

 id  |        avg1        
-----+--------------------
 106 | 4.5000000000000000
(1 row)

then let's move some of the logic inside the subquery, getting rid of the 'not' : (2)

-- select rows from foo 
select T1.id, avg(T1.score) avg1
from foo T1
group by T1.id
-- where we do have rows from T2
having  exists (
-- select rows from foo
select T2.id, avg(T2.score) avg2
from foo T2
group by T2.id
-- where the average score is less than or equal than the average for any row in T1
having avg2 <= avg1);
-- I think this expression will be true for all rows as we are in effect doing a
--cartesian join 
-- with the 'having' only we don't display the cartesian row set

 id  |        avg1        
-----+--------------------
 106 | 4.5000000000000000
 107 | 4.0000000000000000
(2 rows)

so you have got to ask yourself -- what do you actually mean when you do this correlated subquery inside a having clause, if it evaluates every row against every row from the primary query we are making a cartesian join and I don't think we should be pointing fingers at the SQL engine.

if you want every row that is less than the maximum average What you should be saying is:

select T1.id, avg(T1.score) avg1 
from foo T1 group by T1.id
having avg1 not in 
(select max(avg1) from (select id,avg(score) avg1 from foo group by id))

Wroughtup answered 18/4, 2012 at 6:32 Comment(3)

That select statement in my question is really not that difficult to understand; I simply want the correct result ;-) – Overfly 18/4, 2012 at 9:57

I contend that it is ambiguous and will provide an indeterminate result across different technologies – Wroughtup 18/4, 2012 at 10:21

It seems it's not ambiguous to MySql, MS Sql, and now Postgres; and it also seems that the main developer of sqlite is making changes in reaction to the email I sent to [email protected]. – Overfly 18/4, 2012 at 10:25

Have you tried this version? :

select T1.id, avg(T1.score) avg1
from foo T1
group by T1.id
having not exists (
    select T2.id, avg(T2.score) avg2
    from foo T2
    group by T2.id
    having avg(T2.score) > avg(T1.score));

Also this one (which should be giving same results):

select T1.*
from
  ( select id, avg(score) avg1
    from foo 
    group by id
  ) T1
where not exists (
    select T2.id, avg(T2.score) avg2
    from foo T2
    group by T2.id
    having avg(T2.score) > avg1);

The query can also be handled with derived tables, instead of subquery in HAVING clause:

select ta.id, ta.avg1
from 
  ( select id, avg(score) avg1
    from foo
    group by id
  ) ta
  JOIN
  ( select avg(score) avg1
    from foo 
    group by id
    order by avg1 DESC
    LIMIT 1
  ) tmp
  ON tmp.avg1 = ta.avg1

Touchy answered 17/4, 2012 at 13:3 Comment(5)

I just did; same 2 records in sqlite, same 1 record in mysql. – Overfly 17/4, 2012 at 13:6

Yes your second option I have already served as an answer; please see the question I linked at the beginning of this question. – Overfly 17/4, 2012 at 13:8

As to why the original query doesn't work as expected, I suppose it's a bug, related to how subqueries are handled. – Purine 17/4, 2012 at 13:16

@sixfeetsix: Does the JOIN version shows correct (1-row) result? – Purine 17/4, 2012 at 13:53

@sixfeetsix: I've added one more version with derived subquery, turning HAVING into WHERE. – Purine 17/4, 2012 at 14:16

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags