Joining tables based on the maximum value
Asked Answered
R

7

9

Here's a simplified example of what I'm talking about:

Table: students      exam_results
_____________       ____________________________________
| id | name |       | id | student_id | score |   date |
|----+------|       |----+------------+-------+--------|
|  1 | Jim  |       |  1 |          1 |    73 | 8/1/09 | 
|  2 | Joe  |       |  2 |          1 |    67 | 9/2/09 |
|  3 | Jay  |       |  3 |          1 |    93 | 1/3/09 |
|____|______|       |  4 |          2 |    27 | 4/9/09 |
                    |  5 |          2 |    17 | 8/9/09 |
                    |  6 |          3 |   100 | 1/6/09 |
                    |____|____________|_______|________|

Assume, for the sake of this question, that every student has at least one exam result recorded.

How would you select each student along with their highest score? Edit: ...AND the other fields in that record?

Expected output:

_________________________
| name | score |   date |
|------+-------|--------|
|  Jim |    93 | 1/3/09 |
|  Joe |    27 | 4/9/09 |
|  Jay |   100 | 1/6/09 |
|______|_______|________|

Answers using all types of DBMS are welcome.

Romanticist answered 7/12, 2009 at 23:22 Comment(2)
How would you resolve ties? In your example, which record should be selected in Jim scored 93 twice?Ela
in my own tables, (which are nothing to do with students and exams) this doesn't happen. Either one should be fine?Romanticist
R
13

Answering the EDITED question (i.e. to get associated columns as well).

In Sql Server 2005+, the best approach would be to use a ranking/window function in conjunction with a CTE, like this:

with exam_data as
(
    select  r.student_id, r.score, r.date,
            row_number() over(partition by r.student_id order by r.score desc) as rn
    from    exam_results r
)
select  s.name, d.score, d.date, d.student_id
from    students s
join    exam_data d
on      s.id = d.student_id
where   d.rn = 1;

For an ANSI-SQL compliant solution, a subquery and self-join will work, like this:

select  s.name, r.student_id, r.score, r.date
from    (
            select  r.student_id, max(r.score) as max_score
            from    exam_results r
            group by r.student_id
        ) d
join    exam_results r
on      r.student_id = d.student_id
and     r.score = d.max_score
join    students s
on      s.id = r.student_id;

This last one assumes there aren't duplicate student_id/max_score combinations, if there are and/or you want to plan to de-duplicate them, you'll need to use another subquery to join to with something deterministic to decide which record to pull. For example, assuming you can't have multiple records for a given student with the same date, if you wanted to break a tie based on the most recent max_score, you'd do something like the following:

select  s.name, r3.student_id, r3.score, r3.date, r3.other_column_a, ...
from    (
            select  r2.student_id, r2.score as max_score, max(r2.date) as max_score_max_date
            from    (
                        select  r1.student_id, max(r1.score) as max_score
                        from    exam_results r1
                        group by r1.student_id
                    ) d
            join    exam_results r2
            on      r2.student_id = d.student_id
            and     r2.score = d.max_score
            group by r2.student_id, r2.score
        ) r
join    exam_results r3
on      r3.student_id = r.student_id
and     r3.score = r.max_score
and     r3.date = r.max_score_max_date
join    students s
on      s.id = r3.student_id;

EDIT: Added proper de-duplicating query thanks to Mark's good catch in comments

Russon answered 7/12, 2009 at 23:44 Comment(4)
I don't think the distinct works to de-duplicate ties if the dates are different.Untune
Good point Mark - would need to use something deterministic in another subquery to de-dupe correctly in the ANSI query. I'll edit to reflect...Russon
Invalid column name 'score': I think you got a couple of table names mixed up.Untune
Yeah, that's what you get when you write a query in a text editor...I just adjusted it, had the filters backwards on the final join conditionsRusson
E
4
SELECT s.name,
    COALESCE(MAX(er.score), 0) AS high_score
FROM STUDENTS s
    LEFT JOIN EXAM_RESULTS er ON er.student_id = s.id
GROUP BY s.name
Eringo answered 7/12, 2009 at 23:25 Comment(3)
This assumes that there could be students without exams associated.Eringo
ahhh ok, it seems i've asked the question poorly. I'll reword.Romanticist
have to add a group by clause for this to work, i.e. "group by s.name"Russon
C
2

Try this,

Select student.name, max(result.score) As Score from Student
        INNER JOIN
    result
        ON student.ID = result.student_id
GROUP BY
    student.name
Casavant answered 7/12, 2009 at 23:32 Comment(2)
thanks Zinx, however I stuffed up the original wording of the question. I actually need to know more than just the high score: I need to know all other fields in the record which holds their high score, too.Romanticist
Zinx, highlight your code and hit Ctrl+K to format the query syntax, easier readingRusson
O
2

With Oracle's analytic functions this is easy:

SELECT DISTINCT
       students.name
      ,FIRST_VALUE(exam_results.score)
       OVER (PARTITION BY students.id
             ORDER BY exam_results.score DESC) AS score
      ,FIRST_VALUE(exam_results.date)
       OVER (PARTITION BY students.id
             ORDER BY exam_results.score DESC) AS date
FROM   students, exam_results
WHERE  students.id = exam_results.student_id;
Olive answered 8/12, 2009 at 5:8 Comment(1)
I know PostgreSQL isn't in the question list, but in case anyone stumbles across this, it has window functions to do this, too.Passus
C
1
Select Name, T.Score, er. date 
from Students S inner join
          (Select Student_ID,Max(Score) as Score from Exam_Results
           Group by Student_ID) T 
On S.id=T.Student_ID inner join Exam_Result er
On er.Student_ID = T.Student_ID And er.Score=T.Score
Counterclockwise answered 8/12, 2009 at 5:9 Comment(0)
U
0

Using MS SQL Server:

SELECT name, score, date FROM exam_results
JOIN students ON student_id = students.id
JOIN (SELECT DISTINCT student_id FROM exam_results) T1
ON exam_results.student_id = T1.student_id
WHERE exam_results.id = (
    SELECT TOP(1) id FROM exam_results T2
    WHERE exam_results.student_id = T2.student_id
    ORDER BY score DESC, date ASC)

If there is a tied score, the oldest date is returned (change date ASC to date DESC to return the most recent instead).

Output:

Jim 93  2009-01-03 00:00:00.000
Joe 27  2009-04-09 00:00:00.000
Jay 100 2009-01-06 00:00:00.000

Test bed:

CREATE TABLE students(id int , name nvarchar(20) );

CREATE TABLE exam_results(id int , student_id int , score int, date datetime);

INSERT INTO students
VALUES
(1,'Jim'),(2,'Joe'),(3,'Jay')

INSERT INTO exam_results VALUES
(1, 1, 73, '8/1/09'), 
(2, 1, 93, '9/2/09'),
(3, 1, 93, '1/3/09'),
(4, 2, 27, '4/9/09'),
(5, 2, 17, '8/9/09'),
(6, 3, 100, '1/6/09')

SELECT name, score, date FROM exam_results
JOIN students ON student_id = students.id
JOIN (SELECT DISTINCT student_id FROM exam_results) T1
ON exam_results.student_id = T1.student_id
WHERE exam_results.id = (
    SELECT TOP(1) id FROM exam_results T2
    WHERE exam_results.student_id = T2.student_id
    ORDER BY score DESC, date ASC)

On MySQL, I think you can change the TOP(1) to a LIMIT 1 at the end of the statement. I have not tested this though.

Untune answered 8/12, 2009 at 0:10 Comment(0)
B
0
select  s.id, s.name, e.score, e.date
from    students s
join    exam_result e      on      s.id = e.student_id
join    exam_result e1     on      s.id = e1.student_id
group by s.id, s.name, e.date
having e.score = max(e1.score);

you can try this. It suits me

Bowlds answered 21/8 at 2:6 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.