Retrieving last record in each group from database - SQL Server 2005/2008
Asked Answered
S

2

51

I have done some seaching by can't seem to get the results I am looking for. Basically we have four different management systems in place throughout our company and I am in the process of combining all the data from each system on a regular basis. My goal is to update the data every hour into a central database. Here is a sample data set I am working with:

COMPUTERNAME | SERIALNUMBER | USERNAME | LASTIP | LASTUPDATE | SOURCE
TEST1 | 1111 | BOB | 1.1.1.1 | 1/17/2011 01:00:00 | MGMT_SYSTEM_1
TEST1 | 1111 | BOB | 1.1.1.1 | 1/18/2011 01:00:00 | MGMT_SYSTEM_2
TEST1 | 1111 | PETER | 1.1.1.11 | 1/19/2011 01:00:00 | MGMT_SYSTEM_3
TEST2 | 2222 | GEORGE | 1.1.1.2 | 1/17/2011 01:00:00 | MGMT_SYSTEM_1
TEST3 | 3333 | TOM | 1.1.1.3 | 1/19/2011 01:00:00 | MGMT_SYSTEM_2
TEST4 | 4444 | MIKE   | 1.1.1.4 | 1/17/2011 01:00:00 | MGMT_SYSTEM_1
TEST4 | 4444 | MIKE   | 1.1.1.41 | 1/19/2011 01:00:00 | MGMT_SYSTEM_3
TEST5 | 5555 | SUSIE  | 1.1.1.5 | 1/19/2011 01:00:00 | MGMT_SYSTEM_1

So I want to query this master table and only retrieve the latest record (based on LASTUPDATE) that way I can get the latest info about that system. The problem is that one system may be in each database, but of course they will never have the same exact update time.

I would expect to get something like this:

TEST1 | 1111 | PETER | 1.1.1.11 | 1/19/2011 01:00:00 | MGMT_SYSTEM_3
TEST2 | 2222 | GEORGE | 1.1.1.2 | 1/17/2011 01:00:00 | MGMT_SYSTEM_1
TEST3 | 3333 | TOM | 1.1.1.3 | 1/19/2011 01:00:00 | MGMT_SYSTEM_2
TEST4 | 4444 | MIKE   | 1.1.1.41 | 1/19/2011 01:00:00 | MGMT_SYSTEM_3
TEST5 | 5555 | SUSIE  | 1.1.1.5 | 1/19/2011 01:00:00 | MGMT_SYSTEM_1

I have tried using the MAX function, but with that I can only retrieve one column. And I can't use that in a subquery because I don't have a unique ID field that would give me the last updated record. One of the systems is a MySQL database and the MAX function in MySQL will actually work the way I need it to only returning one record per GROUP BY, but it doesn't work in SQL Server.

I'm thinking I need to use MAX and a LEFT JOIN, but my attempts so far have failed.

Your help would be greatly appreciated. I have been racking my brain for the past 3-4 hours trying to get a working query. This master table is located on a SQL Server 2005 server.

Thanks!

Scission answered 20/1, 2011 at 19:58 Comment(0)
T
108
;with cteRowNumber as (
   select COMPUTERNAME, SERIALNUMBER, USERNAME, LASTIP, LASTUPDATE, SOURCE,
   row_number() over(partition by COMPUTERNAME order by LASTUPDATE desc) as RowNum
        from YourTable
)
select COMPUTERNAME, SERIALNUMBER, USERNAME, LASTIP, LASTUPDATE, SOURCE
    from cteRowNumber
    where RowNum = 1
Transpadane answered 20/1, 2011 at 20:5 Comment(3)
Joe, that worked great. I never knew about the WITH clause and never used OVER or PARTITION before. Can you briefly tell me what these are doing. I looked them up, but now sure I am finding the correct info.Scission
@RyanF: The WITH clause defines a Common Table Expression or CTE. row_number is a window function. Both are features that were introduced in SQL Server 2005. Hopefully these links can get you started in the right direction.Transpadane
This is a very slow query when your table has many thousands of rowsDerr
R
53

In SQL Server, the most performant solution is often a correlated subquery:

select t.*
from t
where t.lastupdate = (select max(t2.lastupdate)
                      from t t2
                      where t2.computername = t.computername
                     );

In particular, this can take advantage of an index on (computername, lastupdate). Conceptually, the reason this is faster than row_number() is because this query simply filters out the rows that don't match. The row_number() version needs to attach to the row number to all rows, before it filters -- that is more data processing.

Red answered 18/5, 2018 at 14:46 Comment(4)
Could you possibly comment, what I shall do if I need to have "inner" select to contain rows with a max value of some other column "id"? I have a case where I have non-unique t2.lastupdate and different ID column, so I need to select also by max(ID) (with performance considerations of course)Apophyge
@AskarIbragimov . . . You should ask a question, not a comment.Red
Here you go #50467568Apophyge
Up-voted both answers. I'm used to the old guy sub-query approach and understand the performance benefits - but for the use case that brought me here (less data, more complex relationship), I appreciate the clarity of not having to repeat WHERE clauses, etc.Brander

© 2022 - 2024 — McMap. All rights reserved.