SQLite Exists keyword : How to query the highest average?
Asked Answered
J

2

4

In an SQLite database table with two columns 'mID', and 'stars', I have to return 'mID's with highest average values of 'stars'.

Having the following data:

  Rating
mID  stars
101     2
101     4
106     4
103     2
108     4
108     2
101     3
103     3
104     2
108     4
107     3
106     5
107     5
104     3

I would first take average of 'stars' of each 'mID' by grouping it by 'mID', such as

select mID, avg(stars) theAvg
from Rating
group by mID;

As a result, I would get the table of average 'stars' values for each 'mID'.

mID  avg(stars)
101     3.0
103     2.5
104     2.5
106     4.5
107     4.0
108     3.33333333333

If I were to just return the highest average value of 'stars',
then I could have just taken something like select max(theAvg) followed by what I just calculated.
But then, to get the highest average 'stars' associated with its 'mID', I needed something else.

So I used 'not exists' keyword followed by a subquery that generates another table of 'mID' and 'stars'. This subquery compares with the original table to verify that for some average 'stars' value from the original table R1, there exists no new table R2's average 'stars' value that is greater than R1's averaged 'stars' value

select mID, theAvg
from (select mID, avg(stars) theAvg
from Rating
group by mID) as R1
where not exists(select * from 
(select mID, avg(stars) theAvg
from Rating
group by mID) as R2
where R2.theAvg > R1.theAvg);

I thought as a result of this query, I would get the highest average stars and it's mID, but instead what I get is two tuples ('mID':106, 'theAvg':4.5) and ('mID':107, 'theAvg':4.0), when the desired answer is only one tuple ('mID':106, 'theAvg':4.5), since we are looking for the highest average of all averages of 'stars'.

The result of my query(Wrong):
mID  theAvg
106    4.5
107    4.0

The desired Result:
mID  theAvg
106    4.5

What steps do you think I got wrong? Any suggestion how you'd do it?

Jawbone answered 15/4, 2012 at 10:18 Comment(3)
yes that is indeed strange; do you mind if I ask the question with a simplified version of your sql query?Blague
Not at all, you can go ahead... thank you.Jawbone
#10171903Blague
B
1

You can order by desc on the average, and add a limit clause as shown here:

select mID, avg(stars) theAvg
from Rating
group by mID
order by theAvg desc limit 1;

Should give you this:

sqlite> create table Rating (mID INT, stars INT);
sqlite> 
sqlite> insert into Rating values (101, 2);
sqlite> insert into Rating values (101, 4);
sqlite> insert into Rating values (106, 4);
sqlite> insert into Rating values (103, 2);
sqlite> insert into Rating values (108, 4);
sqlite> insert into Rating values (108, 2);
sqlite> insert into Rating values (101, 3);
sqlite> insert into Rating values (103, 3);
sqlite> insert into Rating values (104, 2);
sqlite> insert into Rating values (108, 4);
sqlite> insert into Rating values (107, 3);
sqlite> insert into Rating values (106, 5);
sqlite> insert into Rating values (107, 5);
sqlite> insert into Rating values (104, 3);
sqlite> 
sqlite> select mID, avg(stars) theAvg
   ...> from Rating
   ...> group by mID
   ...> order by theAvg DESC LIMIT 1;
106|4.5

Documentation this way: http://www.sqlite.org/lang_select.html#orderby

Blague answered 15/4, 2012 at 14:9 Comment(1)
Thank you. your answer was really useful. But I still don't understand how come my query returns two tuples, when I thought logically such a 'Not exists' query should only return one tuple...Jawbone
A
1

Sorry I'm a somewhat new to SQL and SO, but I found a solution that works when there is a tie for highest avg(stars) (or more specially an unknown amount of ties, in which case you can't easily set a limit for the ordered output). Like I said, I'm kinda nooby so its a little messy:

select title, avg(stars)
from movie join rating using(mID)
where mID not in (select R1.mID  
from (select avg(stars) theAvg, mID, ratingDate from Rating group by mID) 
as R1
join (select avg(stars) theAvg, mID, ratingDate from Rating group by mID)
as R2 
where R1.theAvg < R2.theAvg)
group by mID;

The subquery returns the mID of any movie that has average stars less than any other movie's average stars, and the main query's where statement takes any mID the subquery didn't return. The logic is pretty similar to what you had originally.

As for the two tuple problem: I don't really understand where you went wrong, but I'll let you know if I figure it out.

Acrimony answered 21/8, 2014 at 0:38 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.