How to get last day of each month in Pandas DataFrame index (using TimeGrouper)
Asked Answered
R

4

5

I have a DataFrame with incomplete dates and I only need the date/row of the last day available of each month. I tried using TimeGrouper and take .last() of each group.

import pandas as pd
idx = [pd.datetime(2016,2,1),pd.datetime(2017,1,20),pd.datetime(2017,2,1),pd.datetime(2017,2,27)]
df = pd.DataFrame([1,2,3,4],index=idx)
df
        0
2016-02-01  1
2017-01-20  2
2017-02-01  3
2017-02-27  4

Expecting:

df_eom
        0
2016-02-01  1
2017-01-20  2
2017-02-27  4

However I got this:

df_eom = df.groupby(pd.TimeGrouper(freq='1M')).last()
df_eom
              0
2016-02-29  1.0
2016-03-31  NaN
2016-04-30  NaN
2016-05-31  NaN
2016-06-30  NaN
2016-07-31  NaN
2016-08-31  NaN
2016-09-30  NaN
2016-10-31  NaN
2016-11-30  NaN
2016-12-31  NaN
2017-01-31  2.0
2017-02-28  4.0

Not only it creates date that weren't in df but also changed the index of first and last row of df. Am I using TimeGrouper wrong?

Relume answered 16/1, 2018 at 18:56 Comment(2)
df.groupby([df.index.year, df.index.month]).last() gets close but one loses the day information...Dossal
@Dossal how can I reindex with the last day itself after that?Cookhouse
I
10

Here's one way

In [795]: df.iloc[df.reset_index().groupby(df.index.to_period('M'))['index'].idxmax()]
Out[795]:
            0
2016-02-01  1
2017-01-20  2
2017-02-27  4

Or

In [802]: df.loc[df.groupby(df.index.to_period('M')).apply(lambda x: x.index.max())]
Out[802]:
            0
2016-02-01  1
2017-01-20  2
2017-02-27  4
Icterus answered 22/2, 2018 at 5:20 Comment(0)
C
0

You could group by the year and month and iterate through your groups to find the last date. Like so:

groups = df.groupby([df.index.year, df.index.month])
df_eom = pd.DataFrame()
for idx, group in groups:
    df_eom = df_eom.append(group.iloc[-1])
df_eom
            0
2016-02-01  1
2017-01-20  2
2017-02-27  4

I don't really like this because of the looping, but given that you really can't have an outrageous number of years and each year will have a maximum of 12 month groups it shouldn't be too awful.

Calder answered 16/1, 2018 at 19:24 Comment(0)
R
0

I believe this solution is more appropriate in more use cases. The previous instances only work if the the date is exactly a month end. If you deal with financial data for example, the last day of the month may or may not be a calendar month end. This solution accounts for it:

df[df['as_of_date'].dt.month.shift(-1)!=df['as_of_date'].dt.month].reset_index(drop=True)

Rubbico answered 19/4, 2021 at 19:27 Comment(0)
M
0

Following Grr example:

This will solve without looping

groups = df.groupby([df.index.year, df.index.month]).tail(1)
Mistakable answered 18/7 at 15:9 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.