How to find missing rows (dates) in a mysql table?
Asked Answered
A

4

4

I have tried several topics like this one: How to find missing data rows using SQL? here, but I couldn't make it work in my situation.

I have a table named posts in MySQL which I save user diaries in it every day. Sometimes users forget to write a post for a day and I want to make it possible for them to submit it later. So the db structures like this:

date           userid
2011-10-01     1
2011-10-02     1
(missing)
2011-10-04     1
2011-10-05     1
(missing)
2011-10-07     1

So I want to show a dropdown list of missing dates in this table of missing rows to user, so he can select the date he wants to submit the post for.

How can I do this? Thanks.

Ayannaaycock answered 2/4, 2011 at 10:53 Comment(0)
A
3

These types of queries are easiest to solve if you have a date table. In your DB, run this batch as a one-off to create a filled date table.

DROP PROCEDURE IF EXISTS FillDateTable;

delimiter //
CREATE PROCEDURE FillDateTable()
    LANGUAGE SQL
    NOT DETERMINISTIC
    CONTAINS SQL
    SQL SECURITY DEFINER
    COMMENT ''
BEGIN
  drop table if exists datetable;
  create table datetable (thedate datetime primary key, isweekday smallint);

  SET @x := date('2000-01-01');
  REPEAT 
    insert into datetable (thedate, isweekday) SELECT @x, case when dayofweek(@x) in (1,7) then 0 else 1 end;
    SET @x := date_add(@x, interval 1 day);
    UNTIL @x >= '2030-12-31' END REPEAT;
END//
delimiter ;

CALL FillDateTable;

Then you can just use a regular LEFT JOIN

SELECT thedate
FROM datetable
LEFT JOIN posts on posts.date = datetable.thedate
WHERE posts.date IS NULL

Of course you don't want all "missing" dates from 2000 to 2030. Limit it to the MIN and MAX dates in the posts table (for the user), i.e.

SELECT thedate
FROM datetable
INNER JOIN (select min(date) postStart, max(date) postEnd
            FROM posts
            where userid=123) p on datetable.thedate BETWEEN p.postStart and p.postEnd
LEFT JOIN posts on posts.date = datetable.thedate
WHERE posts.date IS NULL
Airless answered 2/4, 2011 at 11:8 Comment(2)
@Ayannaaycock / Performance is pretty good. Space for 10,000 dates .. negligible. About 100kb? 1MB? (wild guesses), but certainly not large at all.Airless
generated from 2000 to 2101 and takes roughly 1.5MbDirector
B
5

The simplest way to find missing dates is to use a calendar table. I've posted code to create and populate a calendar table for PostgreSQL; you should be able to adapt it without any trouble.

With the calendar table in place, your query is pretty simple, and easy to understand. To find the missing dates for October, 2011, you'd use something along these lines. (Guessing at your "posts" table.)

select c.cal_date
from calendar c
left join posts p on (c.cal_date = p.date)
where p.date is null
  and c.cal_date between '2011-10-01' and '2011-10-31'
  and p.userid = 1
order by c.cal_date
Boozy answered 2/4, 2011 at 11:5 Comment(0)
A
3

These types of queries are easiest to solve if you have a date table. In your DB, run this batch as a one-off to create a filled date table.

DROP PROCEDURE IF EXISTS FillDateTable;

delimiter //
CREATE PROCEDURE FillDateTable()
    LANGUAGE SQL
    NOT DETERMINISTIC
    CONTAINS SQL
    SQL SECURITY DEFINER
    COMMENT ''
BEGIN
  drop table if exists datetable;
  create table datetable (thedate datetime primary key, isweekday smallint);

  SET @x := date('2000-01-01');
  REPEAT 
    insert into datetable (thedate, isweekday) SELECT @x, case when dayofweek(@x) in (1,7) then 0 else 1 end;
    SET @x := date_add(@x, interval 1 day);
    UNTIL @x >= '2030-12-31' END REPEAT;
END//
delimiter ;

CALL FillDateTable;

Then you can just use a regular LEFT JOIN

SELECT thedate
FROM datetable
LEFT JOIN posts on posts.date = datetable.thedate
WHERE posts.date IS NULL

Of course you don't want all "missing" dates from 2000 to 2030. Limit it to the MIN and MAX dates in the posts table (for the user), i.e.

SELECT thedate
FROM datetable
INNER JOIN (select min(date) postStart, max(date) postEnd
            FROM posts
            where userid=123) p on datetable.thedate BETWEEN p.postStart and p.postEnd
LEFT JOIN posts on posts.date = datetable.thedate
WHERE posts.date IS NULL
Airless answered 2/4, 2011 at 11:8 Comment(2)
@Ayannaaycock / Performance is pretty good. Space for 10,000 dates .. negligible. About 100kb? 1MB? (wild guesses), but certainly not large at all.Airless
generated from 2000 to 2101 and takes roughly 1.5MbDirector
B
1

You could automatically enter an empty post each time (end of day) with null titles, null contents but actual date. Then, if the user wants to add a post for a previous day, display all the posts with null titles and contents and update the one he selects.

This shouldn't be a space problem, not if they write more than they miss. For instance, if they write for 4 days and miss 1.

Also, you would run a script and delete entries with null titles, null contents AND date older than X days. If they haven't added the missing post for X days, they probably will never do.

I apologize if my solution is trivial / too abstract.

Baudoin answered 2/4, 2011 at 11:8 Comment(1)
Thanks for your reply. I have thought of that, but there are more than 2000 users and they probably have a lot of missed items. I don't like that much of empty rows in my table :(Ayannaaycock
H
0
select  
    t0.date,
    t1.date1
    if(t1.date1 is null, date_add(t0.date, interval 1 day), '') missing_date
from
(select date from posts group by date) t0
left join
(select date_add(date, interval 1 day) date1 from posts group by date) t1
on t0.date = t1.date1
order by t0.date asc

You can find the missing date this way. Be careful you need to remove the last row, you can choose a way that suits you to achieve that.

Hydrocortisone answered 31/8, 2021 at 9:26 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.