Get last date in each month of a time series pandas
Asked Answered
C

6

11

Currently I'm generating a DateTimeIndex using a certain function, zipline.utils.tradingcalendar.get_trading_days. The time series is roughly daily but with some gaps.

My goal is to get the last date in the DateTimeIndex for each month.

.to_period('M') & .to_timestamp('M') don't work since they give the last day of the month rather than the last value of the variable in each month.

As an example, if this is my time series I would want to select '2015-05-29' while the last day of the month is '2015-05-31'.

['2015-05-18', '2015-05-19', '2015-05-20', '2015-05-21', '2015-05-22', '2015-05-26', '2015-05-27', '2015-05-28', '2015-05-29', '2015-06-01']

Clevelandclevenger answered 9/6, 2015 at 22:21 Comment(3)
Sorry won't df.groupby([df.index.year,df.index.month]).last() give what you want?Giustino
No, that doesn't unfortunately. Take a look at my post for an example.Clevelandclevenger
I'm not sure why Ed's comment won't work. Your post first converts to a period, whereas Ed's comment just looks at the last trading day in any given month for any given year.Andress
C
6

Condla's answer came closest to what I needed except that since my time index stretched for more than a year I needed to groupby by both month and year and then select the maximum date. Below is the code I ended up with.

# tempTradeDays is the initial DatetimeIndex
dateRange = []  
tempYear = None  
dictYears = tempTradeDays.groupby(tempTradeDays.year)
for yr in dictYears.keys():
    tempYear = pd.DatetimeIndex(dictYears[yr]).groupby(pd.DatetimeIndex(dictYears[yr]).month)
    for m in tempYear.keys():
        dateRange.append(max(tempYear[m]))
dateRange = pd.DatetimeIndex(dateRange).order()
Clevelandclevenger answered 10/6, 2015 at 12:15 Comment(1)
Like it. But maybe there is a pandas inhouse solution for this nowdays...Pedro
G
6

Suppose your data frame looks like this

original dataframe

Then the following Code will give you the last day of each month.

df_monthly = df.reset_index().groupby([df.index.year,df.index.month],as_index=False).last().set_index('index')

transformed_dataframe

This one line code does its job :)

Gantrisin answered 24/5, 2019 at 20:56 Comment(0)
U
4

My strategy would be to group by month and then select the "maximum" of each group:

If "dt" is your DatetimeIndex object:

last_dates_of_the_month = []
dt_month_group_dict = dt.groupby(dt.month)
for month in dt_month_group_dict:
    last_date = max(dt_month_group_dict[month])
    last_dates_of_the_month.append(last_date)

The list "last_date_of_the_month" contains all occuring last dates of each month in your dataset. You can use this list to create a DatetimeIndex in pandas again (or whatever you want to do with it).

Untidy answered 9/6, 2015 at 23:5 Comment(0)
T
3

This is an old question, but all existing answers here aren't perfect. This is the solution I came up with (assuming that date is a sorted index), which can be even written in one line, but I split it for readability:

month1 = pd.Series(apple.index.month)
month2 = pd.Series(apple.index.month).shift(-1)
mask = (month1 != month2)
apple[mask.values].head(10)

Few notes here:

  • Shifting a datetime series requires another pd.Series instance (see here)
  • Boolean mask indexing requires .values (see here)

By the way, when the dates are the business days, it'd be easier to use resampling: apple.resample('BM')

Tallinn answered 21/2, 2018 at 18:17 Comment(0)
M
2

Maybe the answer is not needed anymore, but while searching for an answer to the same question I found maybe a simpler solution:

import pandas as pd 

sample_dates = pd.date_range(start='2010-01-01', periods=100, freq='B')
month_end_dates = sample_dates[sample_dates.is_month_end]
Middle answered 21/8, 2015 at 8:4 Comment(1)
OP clearly states "if this is my time series I would want to select '2015-05-29' while the last day of the month is '2015-05-31'." Your solution would miss 2015-05-29 since it is not a month-end.Purism
T
0

Try this, to create a new diff column where the value 1 points to the change from one month to the next.

     df['diff'] = np.where(df['Date'].dt.month.diff() != 0,1,0) 
Thorfinn answered 6/8, 2020 at 15:21 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.