check for any missing dates in the index
Asked Answered
P

6

29

Is there any way to check for missing dates in a dataframe directly. I want to check if there are a missing dates between 2013-01-19 to 2018-01-29

            GWA_BTC      GWA_ETH    GWA_LTC  GWA_XLM  GWA_XRP
   Date                 
2013-01-19  11,826.36   1,068.45    195.00    0.51    1.82
2013-01-20  13,062.68   1,158.71    207.58    0.52    1.75
   ...
2018-01-28  12,326.23   1,108.90    197.36    0.48    1.55
2018-01-29  11,397.52   1,038.21    184.92    0.47    1.43

I tried to check it manually but it took a lot of time.

Phrase answered 27/8, 2018 at 17:34 Comment(7)
what do you mean by missing? are they business days ?Hysterotomy
@Hysterotomy There are some missing dates randomly in the data. Like around 7 days are missing from 2014-07-13 to 2014-07-19. Similarly, there are some missing dates in between the data. Is there any way we can know what dates are missing rather than checking manually?Phrase
yes but do you 365 days per year or 252?Hysterotomy
@Hysterotomy 365 days.Phrase
you want to know only if there are missing or what the missing dates are?Hysterotomy
@Hysterotomy I want to know the missing datesPhrase
@user456 Maybe edit your question, as it is written it suggest that you want a boolean.Heck
L
84

You can use DatetimeIndex.difference(other)

pd.date_range(start = '2013-01-19', end = '2018-01-29' ).difference(df.index)

It returns the elements not present in the other

Leigh answered 27/8, 2018 at 17:47 Comment(3)
Just put this as an edit to my answer (will delete). But clearly this is the way to go!Tolmach
@sacul, oh cool. Your answer is quite comprehensive, you should keep itLeigh
This is the first thing I thought of. pd.date_range(df.index.min(), df.index.max()).difference(df.index)Sneakbox
T
4

Example:

As a minimal example, take this:

>>> df
              GWA_BTC   GWA_ETH  GWA_LTC  GWA_XLM  GWA_XRP
Date                                                      
2013-01-19  11,826.36  1,068.45   195.00     0.51     1.82
2013-01-20  13,062.68  1,158.71   207.58     0.52     1.75
2013-01-28  12,326.23  1,108.90   197.36     0.48     1.55
2013-01-29  11,397.52  1,038.21   184.92     0.47     1.43

And we can find the missing dates between 2013-01-19 and 2013-01-29

Method 1:

See @Vaishali's answer

Use .difference to find the difference between your datetime index and the set of all dates within your range:

pd.date_range('2013-01-19', '2013-01-29').difference(df.index)

Which returns:

DatetimeIndex(['2013-01-21', '2013-01-22', '2013-01-23', '2013-01-24',
               '2013-01-25', '2013-01-26', '2013-01-27'],
              dtype='datetime64[ns]', freq=None)

Method 2:

You can re-index your dataframe using all dates within your desired daterange, and find where reindex has inserted NaNs.

And to find missing dates between 2013-01-19 and 2013-01-29:

>>> df.reindex(pd.date_range('2013-01-19', '2013-01-29')).isnull().all(1)

2013-01-19    False
2013-01-20    False
2013-01-21     True
2013-01-22     True
2013-01-23     True
2013-01-24     True
2013-01-25     True
2013-01-26     True
2013-01-27     True
2013-01-28    False
2013-01-29    False
Freq: D, dtype: bool

Those values with True are the missing dates in your original dataframe

Tolmach answered 27/8, 2018 at 17:40 Comment(5)
Do i have to convert the index to datetime object? at the moment the dtype of index is object.Phrase
Yes, you should. df.index = pd.to_datetime(df.index)Tolmach
Method 1 does not work? It returns all the dates between start and end rather than displaying missing dates?Phrase
No, it works, but I crossed it out because @Leigh had already answered it using that solutionTolmach
That replaces the original column(s) with True/False values. How can I keep the original columns and add the True/False as a new column?Prefiguration
T
3

You can use DatetimeIndex.difference and add freq param, so you can check for missing days, hours, minutes, depending on the frequency you are using:

pd.date_range(df.index.min(), df.index.max(), freq="1min").difference(df.index)
Thoma answered 3/9, 2022 at 4:50 Comment(0)
H
2

assuming data is daily non business dates:

df.index.to_series().diff().dt.days > 1
Hysterotomy answered 27/8, 2018 at 17:41 Comment(0)
H
0

I can't post a comment but you can probably traverse each value and add 24 hours to the previous value to see if the date matches?

import pandas as pd

a = [1,2,3,4,5]
b = [1,0.4,0.3,0.5,0.2]

df = pd.DataFrame({'a':a , 'b': b})

for i in range(len(df)):
    prev = df.loc[i,'a']
    if i is 0:
        continue
    else:
         # Add 1 day to the current value and check with prev value
Hypo answered 27/8, 2018 at 17:38 Comment(0)
G
0
pd.date_range(df.index.min(), df.index.max()).difference(df.index)
Genovese answered 18/5, 2024 at 6:10 Comment(1)
Thank you for your interest in contributing to the Stack Overflow community. This question already has a few answers—including one that has been extensively validated by the community. Are you certain your approach hasn’t been given previously? If so, it would be useful to explain how your approach is different, under what circumstances your approach might be preferred, and/or why you think the previous answers aren’t sufficient. Can you kindly edit your answer to offer an explanation?Frieder

© 2022 - 2025 — McMap. All rights reserved.