Oracle: select missing dates
Asked Answered
C

5

5

I have a table with (among other things) dates in a field.

I need to get a list of all dates that are more recent than the oldest date, older than the most recent date, and are completely missing from the table.

So, if the table contained:

2012-01-02
2012-01-02
2012-01-03
2012-01-05
2012-01-05
2012-01-07
2012-01-08

I want a query that returns:

2012-01-04
2012-01-06
Comedown answered 6/3, 2012 at 22:28 Comment(3)
I think you will have to use a cursor to do that (in conjunction with a SQL that groups by date).Tabor
Does it need to be a single query or can it be a series of pl/sql instructions?Parkinson
I would prefer a single query, but it could be a script if that is all that's possible.Comedown
H
16

Something like this (assuming your table is named your_table and the date column is named the_date):

with date_range as (
      select min(the_date) as oldest, 
             max(the_date) as recent, 
             max(the_date) - min(the_date) as total_days
      from your_table
),
all_dates as (
   select oldest + level - 1 as a_date
   from date_range
   connect by level <= (select total_days from date_range)
)
select ad.a_date
from all_dates ad
  left join your_table yt on ad.a_date = yt.the_date
where yt.the_date is null
order by ad.a_date;  

Edit:
the WITH clause is called a "common table expression" and is equivalent to a derived table ("inline view").

It's similar to

select * 
from ( 
     ..... 
) all_dates
join your_table ...

The second CTE simply creates a list of dates "on-the-fly" using a undocumented feature of Oracle's connect by implementation.

Re-using a select (like I did with calculating the first and last date) is a bit easier (and IMHO more readable) than using derived tables.

Edit 2:

This can be done with a recursive CTE as well:

with date_range as (
      select min(the_date) as oldest, 
             max(the_date) as recent, 
             max(the_date) - min(the_date) as total_days
      from your_table
),
all_dates (a_date, lvl) as (
   select oldest as a_date, 1 as lvl
   from date_range 
   union all
   select (select oldest from date_range) + lvl, lvl + 1
   from all_dates 
   where lvl < (select total_days from date_range)
)
select ad.a_date, lvl
from all_dates ad    
  left join your_table yt on ad.a_date = yt.the_date
where yt.the_date is null
order by ad.a_date;  

Which should work in all DBMS supporting recursive CTEs (PostgreSQL and Firebird - being more standard compliant - do need the recursive keyword though).

Note the hack select (select oldest from date_range) + lvl, lvl + 1 in the recursive part. This should not be necessary, but Oracle still has some bugs with regards to DATEs in a recursive CTE. In PostgreSQL the following works without problems:

....
all_dates (a_date, lvl) as (
   select oldest as a_date, 0 as lvl
   from date_range 
   union all
   select a_date + 1, lvl + 1
   from all_dates 
   where lvl < (select total_days from date_range)
)
....
Hinz answered 6/3, 2012 at 22:38 Comment(7)
Thanks! If you don't mind editing more generic details about how that query works (what does 'with' do? etc) it would certainly be helpful for my learning as well as anyone else who comes along. But I can google it myself now that I know what to look for, too. Again, thanks!Comedown
@a_horse I hadn't seen the dual table, and I thought the common table expression was a SQL Server thing, not standard SQL.Parkinson
@AlejoBrz: CTEs are standard SQL (Oracle has had it long before SQL Server) and are supported by a wide range of DBMS (I think MySQL and SQLite are the only one not to support it nowadays).Hinz
@a_horse_with_no_name This is cool! Good reminder of how to play tricks with 1-record tables (dual, in this case). Can this be done with standard SQL (without "connect by")? I read in Wikipedia that you can create hierarchical queries using recursive common table expressions, but i don't know what this would look like.Tabor
Hmmm... i found an example of recursive CTEs.Tabor
@theglauber: it should be possible with a recursive CTE as well. But this requires Oracle 11.2 and I think the recursvie implementation still is buggy when it comes to date handlingHinz
The closer you look, CTEs are really just a "dynamic" table in the query. When I have a really complex query to create, I will often create CTEs to help me organize my thinking and to visualize the "sets" of data that I am defining. They are a great tool to add to your 'toolkit' of knowledge.Trichotomy
I
2

I'd opt for this variant, because it's more efficient:

with all_dates_wo_boundary_values as
( select oldest + level the_date
    from ( select min(the_date) oldest
                , max(the_date) recent
             from your_table
         )
 connect by level <= recent - oldest - 1
)
select the_date
  from all_dates_wo_boundary_values
 minus
select the_date
  from your_table

And here is some proof.
First the setup:

SQL> create table your_table (the_date)
  2  as
  3  select date '2012-01-02' from dual union all
  4  select date '2012-01-02' from dual union all
  5  select date '2012-01-03' from dual union all
  6  select date '2012-01-05' from dual union all
  7  select date '2012-01-05' from dual union all
  8  select date '2012-01-07' from dual union all
  9  select date '2012-01-08' from dual
 10  /

Table created.

SQL> exec dbms_stats.gather_table_stats(user,'your_table')

PL/SQL procedure successfully completed.

SQL> alter session set statistics_level = all
  2  /

Session altered.

Horse's query:

SQL> with date_range as
  2  ( select min(the_date) as oldest
  3         , max(the_date) as recent
  4         , max(the_date) - min(the_date) as total_days
  5      from your_table
  6  )
  7  , all_dates as
  8  ( select ( select oldest from date_range) + level as a_date
  9      from dual
 10   connect by level <= (select total_days from date_range)
 11  )
 12  select ad.a_date
 13    from all_dates ad
 14         left join your_table yt on ad.a_date = yt.the_date
 15   where yt.the_date is null
 16   order by ad.a_date
 17  /

A_DATE
-------------------
04-01-2012 00:00:00
06-01-2012 00:00:00

2 rows selected.

SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'))
  2  /

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  gaqx49vb9gz9k, child number 0
-------------------------------------
with date_range as ( select min(the_date) as oldest        , max(the_date) as recent        , max(the_date) - min(the_date) as total_d
ays     from your_table )

, all_dates as ( select ( select oldest from date_range) + level as a_date     from dual  connect by level <= (select total_days from
date_range) ) select

ad.a_date   from all_dates ad        left join your_table yt on ad.a_date = yt.the_date  where yt.the_date is null  order by ad.a_date

Plan hash value: 1419150012

------------------------------------------------------------------------------------------------------------------------------------------------------------------------    
| Id  | Operation                         | Name                        | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   1 |  TEMP TABLE TRANSFORMATION        |                             |      1 |        |      2 |00:00:00.01 |      22 |      1 |    1 |       |       |          |
|   2 |   LOAD AS SELECT                  |                             |      1 |        |      1 |00:00:00.01 |       7 |      0 |    1 |   262K|   262K|  262K (0)|
|   3 |    SORT AGGREGATE                 |                             |      1 |      1 |      1 |00:00:00.01 |       3 |      0 |    0 |       |       |          |
|   4 |     TABLE ACCESS FULL             | YOUR_TABLE                  |      1 |      7 |      7 |00:00:00.01 |       3 |      0 |    0 |       |       |          |
|   5 |   SORT ORDER BY                   |                             |      1 |      1 |      2 |00:00:00.01 |      12 |      1 |    0 |  2048 |  2048 | 2048  (0)|
|*  6 |    FILTER                         |                             |      1 |        |      2 |00:00:00.01 |      12 |      1 |    0 |       |       |          |
|*  7 |     HASH JOIN OUTER               |                             |      1 |      1 |      7 |00:00:00.01 |      12 |      1 |    0 |  1048K|  1048K|  707K (0)|
|   8 |      VIEW                         |                             |      1 |      1 |      6 |00:00:00.01 |       9 |      1 |    0 |       |       |          |
|   9 |       CONNECT BY WITHOUT FILTERING|                             |      1 |        |      6 |00:00:00.01 |       3 |      0 |    0 |       |       |          |
|  10 |        FAST DUAL                  |                             |      1 |      1 |      1 |00:00:00.01 |       0 |      0 |    0 |       |       |          |
|  11 |        VIEW                       |                             |      1 |      1 |      1 |00:00:00.01 |       3 |      0 |    0 |       |       |          |
|  12 |         TABLE ACCESS FULL         | SYS_TEMP_0FD9D660C_81240964 |      1 |      1 |      1 |00:00:00.01 |       3 |      0 |    0 |       |       |          |
|  13 |      TABLE ACCESS FULL            | YOUR_TABLE                  |      1 |      7 |      7 |00:00:00.01 |       3 |      0 |    0 |       |       |          |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------

   6 - filter("YT"."THE_DATE" IS NULL)
   7 - access("YT"."THE_DATE"=INTERNAL_FUNCTION("AD"."A_DATE"))


32 rows selected.

And my suggestion:

SQL> with all_dates_wo_boundary_values as
  2  ( select oldest + level the_date
  3      from ( select min(the_date) oldest
  4                  , max(the_date) recent
  5               from your_table
  6           )
  7   connect by level <= recent - oldest - 1
  8  )
  9  select the_date
 10    from all_dates_wo_boundary_values
 11   minus
 12  select the_date
 13    from your_table
 14  /

THE_DATE
-------------------
04-01-2012 00:00:00
06-01-2012 00:00:00

2 rows selected.

SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'))
  2  /

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  7aavxmzkj7zq7, child number 0
-------------------------------------
with all_dates_wo_boundary_values as ( select oldest + level the_date     from ( select min(the_date) oldest
  , max(the_date) recent              from your_table          )  connect by level <= recent - oldest - 1 ) select
the_date   from all_dates_wo_boundary_values  minus select the_date   from your_table

Plan hash value: 2293301832

-----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                       | Name       | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------
|   1 |  MINUS                          |            |      1 |        |      2 |00:00:00.01 |       6 |       |       |          |
|   2 |   SORT UNIQUE                   |            |      1 |      1 |      5 |00:00:00.01 |       3 |  9216 |  9216 | 8192  (0)|
|   3 |    VIEW                         |            |      1 |      1 |      5 |00:00:00.01 |       3 |       |       |          |
|   4 |     CONNECT BY WITHOUT FILTERING|            |      1 |        |      5 |00:00:00.01 |       3 |       |       |          |
|   5 |      VIEW                       |            |      1 |      1 |      1 |00:00:00.01 |       3 |       |       |          |
|   6 |       SORT AGGREGATE            |            |      1 |      1 |      1 |00:00:00.01 |       3 |       |       |          |
|   7 |        TABLE ACCESS FULL        | YOUR_TABLE |      1 |      7 |      7 |00:00:00.01 |       3 |       |       |          |
|   8 |   SORT UNIQUE                   |            |      1 |      7 |      5 |00:00:00.01 |       3 |  9216 |  9216 | 8192  (0)|
|   9 |    TABLE ACCESS FULL            | YOUR_TABLE |      1 |      7 |      7 |00:00:00.01 |       3 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------


22 rows selected.

Regards,
Rob.

Immunity answered 7/3, 2012 at 9:2 Comment(0)
F
1

We can use simple hierarchical query as shown below:

WITH CTE AS
(SELECT (SELECT MIN(COL1) FROM T)+LEVEL-1 AS OUT FROM DUAL
CONNECT BY (LEVEL-1) <= (SELECT MAX(COL1) - MIN(COL1) FROM T))
SELECT OUT FROM CTE WHERE OUT NOT IN (SELECT COL1 FROM T);
Flush answered 6/2, 2014 at 12:27 Comment(0)
D
0

You need a Calendar table (either permanent or created on the fly). Then you could do a simple:

SELECT c.my_date
FROM 
        calendar c
    JOIN
        ( SELECT MIN(date_column) AS min_date 
               , MAX(date_column) AS max_date 
          FROM tableX
        ) mm
      ON c.mydate BETWEEN min_date AND max_date
WHERE
    c.my_date NOT IN
    ( SELECT date_column
      FROM tableX
    )
Deathlike answered 6/3, 2012 at 22:44 Comment(0)
B
0

You do not need to generate all the dates and then use MINUS (or an anti-join) to remove the existing rows (which will be slow).

You can use the LEAD analytic function to find the next date and then use CROSS JOIN LATERAL (available from Oracle 12) to join a row-generator to generate only the missing dates:

SELECT m.missing
FROM   (
         SELECT dt,
                LEAD(dt) OVER (ORDER BY dt) AS next_dt
         FROM   table_name
       ) t
       CROSS JOIN LATERAL (
         SELECT dt + LEVEL AS missing
         FROM   DUAL
         WHERE  dt + 1 < next_dt
         CONNECT BY dt + LEVEL < next_dt
       ) m

Which, for the sample data:

CREATE TABLE table_name (dt) AS
  SELECT DATE '2012-01-02' FROM DUAL UNION ALL
  SELECT DATE '2012-01-02' FROM DUAL UNION ALL
  SELECT DATE '2012-01-03' FROM DUAL UNION ALL
  SELECT DATE '2012-01-05' FROM DUAL UNION ALL
  SELECT DATE '2012-01-05' FROM DUAL UNION ALL
  SELECT DATE '2012-01-07' FROM DUAL UNION ALL
  SELECT DATE '2012-01-08' FROM DUAL;

Outputs:

MISSING
2012-01-04 00:00:00
2012-01-06 00:00:00

db<>fiddle here

Braga answered 6/9, 2022 at 9:54 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.