SQL Get max date in dataset for each month
Asked Answered
S

5

5

I have a table with INT id and DATETIME date, amongst other fields. Rows are inserted into this table each weekday (not guaranteed), and several other tables use this id as a foreign key.

My question is, how can I get the id for the max date of each month, which I can then use to join to other data tables? For example, if the process ran today, I would want to see data for Jan 31, Feb 28, ... , Oct 31, Nov 23.

I am using SQL Server 2005.

Stuck answered 23/11, 2009 at 22:38 Comment(0)
B
7
CREATE TABLE #foo (id INT, d DATETIME);

INSERT #foo(id,d) SELECT 1, '20091101'
UNION ALL SELECT 2, '20091102'
UNION ALL SELECT 3, '20091006'
UNION ALL SELECT 4, '20091001'
UNION ALL SELECT 5, '20091002'
UNION ALL SELECT 6, '20090904';

SELECT d, id
FROM 
( 
  SELECT d, id, rn = ROW_NUMBER() OVER 
  (PARTITION BY DATEDIFF(MONTH, '20000101', d)
   ORDER BY d DESC)
  FROM #foo
) AS x
WHERE x.rn = 1
ORDER BY x.d;

DROP TABLE #foo;
Boarfish answered 23/11, 2009 at 23:14 Comment(3)
Is that more or less efficient than the following? SELECT * FROM ( SELECT MONTH(d), MAX(d) FROM #foo GROUP BY MONTH(d) ) a JOIN #foo b ON a.d = b.dStuck
That isn't enough, because now you're combining this November with last November, and the November before that, etc. MONTH() just returns an integer (November = 11).Boarfish
Your query also doesn't work, because of two reasons: (a) there is no column "d" in the derived table "a"... and (b) using the MAX() assumes that the id is always incremental. What if you repair the table and regenerate stats in the wrong order? My query looks at the highest date, not the highest surrogate key.Boarfish
O
7

I'm assuming you only want the last day of the month that you have a record for (eg. Jan 30 if you've got nothing for Jan 31).

SELECT
  id,
  date
FROM (
  SELECT
    id,
    date,
    ROW_NUMBER() OVER (PARTITION BY YEAR(date), MONTH(date) ORDER BY DAY(date) DESC) AS rowNum
  FROM sometable
) z
WHERE rowNum = 1;
Orv answered 23/11, 2009 at 23:4 Comment(1)
This one did the trick for me. OP, could you explain the usage of RowNumber?Frauenfeld
B
7
CREATE TABLE #foo (id INT, d DATETIME);

INSERT #foo(id,d) SELECT 1, '20091101'
UNION ALL SELECT 2, '20091102'
UNION ALL SELECT 3, '20091006'
UNION ALL SELECT 4, '20091001'
UNION ALL SELECT 5, '20091002'
UNION ALL SELECT 6, '20090904';

SELECT d, id
FROM 
( 
  SELECT d, id, rn = ROW_NUMBER() OVER 
  (PARTITION BY DATEDIFF(MONTH, '20000101', d)
   ORDER BY d DESC)
  FROM #foo
) AS x
WHERE x.rn = 1
ORDER BY x.d;

DROP TABLE #foo;
Boarfish answered 23/11, 2009 at 23:14 Comment(3)
Is that more or less efficient than the following? SELECT * FROM ( SELECT MONTH(d), MAX(d) FROM #foo GROUP BY MONTH(d) ) a JOIN #foo b ON a.d = b.dStuck
That isn't enough, because now you're combining this November with last November, and the November before that, etc. MONTH() just returns an integer (November = 11).Boarfish
Your query also doesn't work, because of two reasons: (a) there is no column "d" in the derived table "a"... and (b) using the MAX() assumes that the id is always incremental. What if you repair the table and regenerate stats in the wrong order? My query looks at the highest date, not the highest surrogate key.Boarfish
L
3

This will pull the month and year and last id:

SELECT month(date), year(date), max(id)
FROM mytable
GROUP BY month(date), year(date)

And here's a test script

create table #mytable (
date datetime,
id int
)

insert into #mytable (date, id) values ('11/7/2009', 1)
insert into #mytable (date, id) values ('11/8/2009', 2)
insert into #mytable (date, id) values ('12/21/2009', 3)
insert into #mytable (date, id) values ('12/30/2009', 4)
insert into #mytable (date, id) values ('10/7/2009', 5)
insert into #mytable (date, id) values ('10/12/2009', 6)

SELECT month(date), year(date), max(id)
FROM #mytable
GROUP BY month(date), year(date)

drop table #mytable
Lying answered 23/11, 2009 at 22:46 Comment(0)
E
3

I'd select back the maximum date using a group by query, like this:

Select Year(datetimefield) as MyYear, month(datetimefield) as MyMonth max(day(datetimefield)) as MaxDate
from table1
group by Year(datetimefield), month(datetimefield)

The query above will give you back the maximum transaction date for each month. To get the maximum Id associated with that date for each month, join the results of this back to the source table to get the max id for that day.

So, your full query would look like this:

select  year(datetimefield) as MyYear, Month(datetimefield) as MyMonth, day(datetimefield), max(IdFieldName) as MaxID
from someTable inner join 
    (select  year(datetimefield) as MyYear, Month(datetimefield) as MyMonth,     max(day(datetimefield)) as MaxDate
    from someTable 
    group by year(datetimefield), Month(datetimefield)) as innerSelect
on innerselect.MyYear = year(datetimefield) and 
    innerselect.MyMonth = Month(datetimefield) and
    innerselect.MaxDate = day(datetimefield)
group by year(datetimefield), Month(datetimefield), day(datetimefield)
Erenow answered 23/11, 2009 at 22:48 Comment(3)
The self join here would make this more expensive than using the window functions in the other responses.Tompkins
True, but this solution is also more portable because it does not depend on features that only exist in SQL2005 or later.Erenow
The user has stated they are on SQL Server 2005, and I would rather take advantage of available features than write poorly-performing, pure vanilla SQL in the odd event that we will someday switch to Oracle or DB2. Not going to happen, and is a very rare scenario these days. In any case, at that point, fixing queries like this will be the least of your worries.Boarfish
H
0

It may be useful.

SELECT id, process_date FROM table_x WHERE process_date IN ( SELECT MAX (process_date) FROM table_x GROUP BY TRUNC (process_date, 'MONTH') ) ORDER BY process_date -- or whatever you want ;

Hoodwink answered 15/2 at 13:9 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.