MySQL how to fill missing dates in range?
Asked Answered
I

7

84

I have a table with 2 columns, date and score. It has at most 30 entries, for each of the last 30 days one.

date      score
-----------------
1.8.2010  19
2.8.2010  21
4.8.2010  14
7.8.2010  10
10.8.2010 14

My problem is that some dates are missing - I want to see:

date      score
-----------------
1.8.2010  19
2.8.2010  21
3.8.2010  0
4.8.2010  14
5.8.2010  0
6.8.2010  0
7.8.2010  10
...

What I need from the single query is to get: 19,21,9,14,0,0,10,0,0,14... That means that the missing dates are filled with 0.

I know how to get all the values and in server side language iterating through dates and missing the blanks. But is this possible to do in mysql, so that I sort the result by date and get the missing pieces.

EDIT: In this table there is another column named UserID, so I have 30.000 users and some of them have the score in this table. I delete the dates every day if date < 30 days ago because I need last 30 days score for each user. The reason is I am making a graph of the user activity over the last 30 days and to plot a chart I need the 30 values separated by comma. So I can say in query get me the USERID=10203 activity and the query would get me the 30 scores, one for each of the last 30 days. I hope I am more clear now.

Icaria answered 21/8, 2010 at 20:30 Comment(4)
Yes, it is possible, but why would you do it?Harve
I still don't get it. Do not fetch unnecessary data from the database if you can fill those gaps with whatever is plotting the graph and you'll save yourself some overhead.Harve
but then I have to SELECT the data for USERID, I get for example 20 rows of date and score out and then I have to loop in my server side language (ASP) to check if there is date 30 days ago, if it is not make 0 else make the database value... Isn't this more consuming that fetching from database 30 values and just construct the string?Icaria
possible duplicate of MySQL: Select All Dates In a Range Even If No Records PresentSubcontract
D
60

MySQL doesn't have recursive functionality, so you're left with using the NUMBERS table trick -

  1. Create a table that only holds incrementing numbers - easy to do using an auto_increment:

    DROP TABLE IF EXISTS `example`.`numbers`;
    CREATE TABLE  `example`.`numbers` (
      `id` int(10) unsigned NOT NULL auto_increment,
       PRIMARY KEY  (`id`)
    ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
    
  2. Populate the table using:

    INSERT INTO `example`.`numbers`
      ( `id` )
    VALUES
      ( NULL )
    

    ...for as many values as you need.

  3. Use DATE_ADD to construct a list of dates, increasing the days based on the NUMBERS.id value. Replace "2010-06-06" and "2010-06-14" with your respective start and end dates (but use the same format, YYYY-MM-DD) -

    SELECT `x`.*
      FROM (SELECT DATE_ADD('2010-06-06', INTERVAL `n`.`id` - 1 DAY)
              FROM `numbers` `n`
             WHERE DATE_ADD('2010-06-06', INTERVAL `n`.`id` -1 DAY) <= '2010-06-14' ) x
    
  4. LEFT JOIN onto your table of data based on the time portion:

       SELECT `x`.`ts` AS `timestamp`,
              COALESCE(`y`.`score`, 0) AS `cnt`
         FROM (SELECT DATE_FORMAT(DATE_ADD('2010-06-06', INTERVAL `n`.`id` - 1 DAY), '%m/%d/%Y') AS `ts`
                 FROM `numbers` `n`
                WHERE DATE_ADD('2010-06-06', INTERVAL `n`.`id` - 1 DAY) <= '2010-06-14') x
    LEFT JOIN TABLE `y` ON STR_TO_DATE(`y`.`date`, '%d.%m.%Y') = `x`.`ts`
    

If you want to maintain the date format, use the DATE_FORMAT function:

DATE_FORMAT(`x`.`ts`, '%d.%m.%Y') AS `timestamp`
Deaton answered 21/8, 2010 at 20:49 Comment(5)
Thank you. Is this a fast operation of would you advise against using such approach and going server side calculation?Icaria
@Jerry2: My preference is to do as much data processing in the database, short of really involved presentation stuff. I don't envy doing this in application code, just as long as it's one trip to the database...Deaton
In order to use indexes the conditions (WHERE and ON clauses) can be rewritten to WHERE n.id < DATEDIFF('2010-06-14', '2010-06-06') and LEFT JOIN TABLE y ON y.date = DATE_FORMAT(x.ts, '%d.%m.%Y')Ingrowing
As soon as I add a WHERE clause for example WHERE 'y'.'score' = 2, the all the filled dates dont show up anymoreDibucaine
@SebaM : That's because WHERE clauses are applied after joins. So, you start with the full range, left join on your data, then strip out a bunch of rows. What you really want to do is filter the y table as part of the join... SELECT * FROM x LEFT JOIN y ON y.foo = x.bah AND y.score = 2Aron
N
31

Time went by since this question was asked. MySQL 8.0 was released in 2018 and added support for recursive common table expressions, which provide an elegant, state-of-the-art solution to this question.

The following query can be used to generate a list of dates, say for the first 15 days of August 2010:

with recursive all_dates(dt) as (
    -- anchor
    select '2010-08-01' dt
    union all 
    -- recursion with stop condition
    select dt + interval 1 day from all_dates where dt < '2010-08-15'
)
select * from all_dates order by dt

You can then left join this resultset with your table to generate the expected output:

with recursive all_dates(dt) as (
    select '2010-08-01' dt
    union all 
    select dt + interval 1 day from all_dates where dt < '2010-08-15'
)
select d.dt date, coalesce(t.score, 0) score
from all_dates d
left join mytable t on t.date = d.dt
order by d.dt

Demo on DB Fiddle:

date       | score
:--------- | ----:
2010-08-01 |    19
2010-08-02 |    21
2010-08-03 |     0
2010-08-04 |    14
2010-08-05 |     0
2010-08-06 |     0
2010-08-07 |    10
2010-08-08 |     0
2010-08-09 |     0
2010-08-10 |    14
2010-08-11 |     0
2010-08-12 |     0
2010-08-13 |     0
2010-08-14 |     0
2010-08-15 |     0

Note that it is very easy to adapt the recursive CTE for other intervals or periods. As an example, say we want a row every 15 minutes from 4 AM to 8 AM on August 1st, 2010 ; we can do :

with recursive all_dates(dt) as (
    select '2010-08-01 04:00:00' dt
    union all 
    select dt + interval 15 minute from all_dates where dt < '2010-08-01 08:00:00'
)
...
Nunn answered 20/12, 2019 at 23:24 Comment(0)
C
27

I'm not a fan of the other answers, requiring tables to be created and such. This query does it efficiently without helper tables.

SELECT 
    IF(score IS NULL, 0, score) AS score,
    b.Days AS date
FROM 
    (SELECT a.Days 
    FROM (
        SELECT curdate() - INTERVAL (a.a + (10 * b.a) + (100 * c.a)) DAY AS Days
        FROM       (SELECT 0 AS a UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) AS a
        CROSS JOIN (SELECT 0 AS a UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) AS b
        CROSS JOIN (SELECT 0 AS a UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) AS c
    ) a
    WHERE a.Days >= curdate() - INTERVAL 30 DAY) b
LEFT JOIN your_table
    ON date = b.Days
ORDER BY b.Days;

So lets dissect this.

SELECT 
    IF(score IS NULL, 0, score) AS score,
    b.Days AS date

The if will detect days that had no score and set them to 0. b.Days is the configured amount of days you chose to get from the current date, up to 1000.

    (SELECT a.Days 
    FROM (
        SELECT curdate() - INTERVAL (a.a + (10 * b.a) + (100 * c.a)) DAY AS Days
        FROM       (SELECT 0 AS a UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) AS a
        CROSS JOIN (SELECT 0 AS a UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) AS b
        CROSS JOIN (SELECT 0 AS a UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) AS c
    ) a
    WHERE a.Days >= curdate() - INTERVAL 30 DAY) b

This subquery is something I saw on stackoverflow. It efficiently generates a list of the past 1000 days from the current date. The interval (currently 30) in the WHERE clause at the end determines which days are returned; the maximum is 1000. This query could be easily modified to return 100s of years worth of dates, but 1000 should be good for most things.

LEFT JOIN your_table
    ON date = b.Days
ORDER BY b.Days;

This is the part that brings your table that contains the score into it. You compare to the selected date range from the date generator query to be able to fill in 0s where needed (the score will be set to NULL initially, because it is a LEFT JOIN; this is fixed in the select statement). I also order it by the dates, just because. This is preference, you could also order by score.

Before the ORDER BY you could easily join with your table about user info you mentioned with your edit, to add that last requirement.

I hope this version of the query helps someone. Thanks for reading.

Camiecamila answered 18/6, 2018 at 22:2 Comment(1)
The downsides being that this requires the dates to be generated each and every time you run a query, but also that you need to copy and paste this code to all queries where it's needed. I can see an argument for saying that generating the sequence in memory is faster than reading it from disk (though it would likely be cached in memory anyway), but I'd still want to create a function or view to create the sequence, so that I can re-use the code. (meaning that the argument against creating an object in the database becomes mute.)Aron
O
16

You can accomplish this by using a Calendar Table. That's a table which you create once and fill with a date range (e.g. one dataset for each day 2000-2050; that depends on your data). Then you can make an outer join of your table against the calendar table. If a date is missing in your table, you return 0 for the score.

Octamerous answered 21/8, 2010 at 20:58 Comment(5)
True, but a numbers table is more flexible - see my answer for an example. IE: what if now you need sequential numbers too? Do you want a table per data type?Deaton
Needing sequential numbers would be another use case ;-) If you have to target different DBMS (i.e. Oracle, MySQL, SQL-Server) your approach would need a slightly modified statement, and I suspect that the DATE_ADD approach is slower than a calendar table (but I think that's not relevant here)Octamerous
On SQL Server 2005, a recursive CTE was barely faster than the NUMBERS table trickDeaton
There is a nice and easy procedure for creating a calendar table at http://www.media-division.com/using-mysql-generate-daily-sales-reports-filled-gaps/. And although, as @omg-ponies mentioned above, the numbers trick is almost as fast as having the calendar table, sometimes using the funky tricks may be misleading. Especially if you expect other developers to maintain your code in the future.Peyote
Compared to the numbers-table-solution a calendar table will let you write simple queries like SELECT c.date, COALESCE(y.score, 0) AS cnt FROM calendar c LEFT JOIN y ON y.date = c.date WHERE c.date BETWEEN '2010-06-06' AND '2010-06-14'Ingrowing
P
4

Michael Conard answer is great but I needed intervals of 15 minutes where the time must always start at the top of every 15th minute:

SELECT a.Days 
FROM (
    SELECT FROM_UNIXTIME( FLOOR( UNIX_TIMESTAMP() / (15 * 60) ) * (15 * 60)) - INTERVAL 15 * (a.a + (10 * b.a) + (100 * c.a)) MINUTE AS Days
    FROM       (SELECT 0 AS a UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) AS a
    CROSS JOIN (SELECT 0 AS a UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) AS b
    CROSS JOIN (SELECT 0 AS a UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) AS c
) a
WHERE a.Days >= curdate() - INTERVAL 30 DAY

This will set the current time to the previous round 15th minute:

FROM_UNIXTIME( FLOOR( UNIX_TIMESTAMP() / (15 * 60) ) * (15 * 60))

And this will remove time with a 15 minute step:

- INTERVAL 15 * (a.a + (10 * b.a) + (100 * c.a)) MINUTE

If there's a simpler way to do it, please let me know.

Pettitoes answered 28/5, 2019 at 2:50 Comment(0)
F
0

you can user direct from start date up to today with insertion

        with recursive all_dates(dt) as (
        -- anchor
        select '2021-01-01' dt
            union all 
        -- recursion with stop condition
        INSERT IGNORE  INTO mytable (date,score) VALUES (dt + interval 1 day ,0 )  where dt + interval 1 day <= curdate()
    )
    select * from all_dates
Factual answered 6/2, 2021 at 11:57 Comment(0)
P
0

If you are joining this with calendar table then you can probably use this. This may help you generate all the missing dates between min and max date of your table along with other columns in your table.

select c.calendar_date,fact.column3,fact.column4,fact.column5 
from calendar c 
join (SELECT min(your_date) as mindt, max(your_date) as maxdt,column3,column4,column5
from your_table         
group by 3,4,5
) fact
on c.calendar_date between fact.mindt and fact.maxdt
Psychasthenia answered 22/6, 2023 at 4:28 Comment(1)
As it’s currently written, your answer is unclear. Please edit to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center.Crotty

© 2022 - 2024 — McMap. All rights reserved.