How to floor a date to the first date of that month?
Asked Answered
E

12

42

I have a pandas DataFrame with index column = date.

Input:

            value
date    
1986-01-31  22.93
1986-02-28  15.46

I want to floor the date to the first day of that month

Output:

            value
date    
1986-01-01  22.93
1986-02-01  15.46

What I tried:

df.index.floor('M')
ValueError: <MonthEnd> is a non-fixed frequency

This is potentially because the df is generated by df = df.resample("M").sum() (The output of this code is the input at the beginning of the question)

I also tried df = df.resample("M", convention='start').sum(). However, it does not work.

I know in R, it is easy to just call floor(date, 'M').

Evictee answered 16/2, 2017 at 21:36 Comment(1)
Is performance an issue? I'd consider transforming the values into datetime objects for this, but this might be way to costly if you are trying to process millions of objects.Flowerlike
M
48

there is a pandas issue about the floor problem

the suggested way is

import pandas as pd
pd.to_datetime(df.date).dt.to_period('M').dt.to_timestamp()
Mangonel answered 17/4, 2018 at 3:52 Comment(1)
df.date.dt.to_period('M').dt.to_timestamp() seems to be sufficient, the initial pd.to_datetime is not needed.Sailor
B
19

You can use timeseries offset MonthBegin

from pandas.tseries.offsets import MonthBegin
df['date'] = pd.to_datetime(df['date']) - MonthBegin(1)

Edit: The above solution does not handle the dates which are already floored to the beginning of the month. Here is an alternative solution.

Here is a dataframe with additional test cases:

            value
date    
1986-01-31  22.93
1986-02-28  15.46
2018-01-01  20.00
2018-02-02  25.00

With timedelta method,

df.index = pd.to_datetime(df.index)
df.index = df.index - pd.to_timedelta(df.index.day - 1, unit='d')


            value
date    
1986-01-01  22.93
1986-02-01  15.46
2018-01-01  20.00
2018-02-01  25.00
Blades answered 16/2, 2017 at 21:51 Comment(3)
this is the only pandonic approach among all the answers (as a bonus this is vectorized)Adhesion
There's a bug with this method: it'd translate any date to the beginning of the following month, except the beginning of the month, which stays the same. i.e. 1-1-2018 -> 1-1-2018, but 2-1-2018 -> 1-2-2018...Autobahn
The timedelta method is right-on, worked perfectly for my use case, with one modification - adding the 'dt', so changing to 'df.index.dt.day' inside the to_timedelta().Flickertail
S
11

This will do the trick and no imports necessary. Numpy has a dtype datetime64 which by default pandas sets to [ns] as seen by checking the dtype. You can change this to month, which will start on the first of the month by accessing the numpy array and changing the type.

df.date = pd.to_datetime(df.date.values.astype('datetime64[M]'))

It would be nice if pandas would implement this with their own astype() method but unfortunately you cannot.

The above works for data as datetime values or strings, if you already have your data as datetime[ns] type you can omit the pd.to_datetime() and just do:

df.date = df.date.values.astype('datetime64[M]')
Stapes answered 16/2, 2017 at 21:59 Comment(1)
I don’t know in which version they imported that to the pandas astype, but currently df.date.astype('datetime64[M]') works (version 1.2.2 at least). You could update this answer.Mangan
E
9

You can also use string datetime formating:

df['month'] = df['date'].dt.strftime('%Y-%m-01')

Eliathas answered 3/1, 2019 at 21:50 Comment(0)
S
8

Here's another 'pandonic' way to do it:

df.date - pd.Timedelta('1 day') * (df.date.dt.day - 1)
Spectre answered 15/4, 2018 at 15:16 Comment(2)
This is very nice! Plus, it works with dask! (in contrast to Deo Leung's answer)Sianna
This also works well with variable instances of Timestamps in addition to being vectorized. Just change the df.date with your Timestamp and it works great!Bromine
I
3

Assume that you are dealing with the following data frame:

import pandas as pd

df = pd.DataFrame({'MyDate': ['2021-03-11', '2021-04-26', '2021-01-17']})
df['MyDate'] = pd.to_datetime(df.MyDate)

Which is:

    MyDate
0   2021-03-11
1   2021-04-26
2   2021-01-17

And you want to truncate the date to month:

df['Truncated'] = df['MyDate'] + pd.offsets.MonthBegin(-1)
# OR    
# df['Truncated'] = df['MyDate'] - pd.offsets.MonthBegin(1)
df

And you get:

      MyDate  Truncated
0 2021-03-11 2021-03-01
1 2021-04-26 2021-04-01
2 2021-01-17 2021-01-01

Important note: This approach DOES NOT work when the date is already floored to the first day of the month, For that reason we will provide other solutions too.

import pandas as pd

df = pd.DataFrame({'MyDate': ['2021-03-11', '2021-04-26', '2021-01-17', '2021-02-01']})
df['MyDate'] = pd.to_datetime(df.MyDate)

df['Truncated'] = df['MyDate'].dt.to_period('M').dt.to_timestamp()
print(df)

And you get:

 MyDate  Truncated
0 2021-03-11 2021-03-01
1 2021-04-26 2021-04-01
2 2021-01-17 2021-01-01

Finally, another approach could be the following:

df['Truncated'] = df['MyDate'].dt.strftime('%Y-%m-01')
print(df)

And you get:

    MyDate   Truncated
0 2021-03-11  2021-03-01
1 2021-04-26  2021-04-01
2 2021-01-17  2021-01-01
Isogonic answered 11/3, 2021 at 16:31 Comment(1)
your code won't work if date is first day of the month - it would became first day of the previous monthIdiotic
S
2
dt_1 = "2016-02-01"
def first_day(dt):
    lt_split = dt.split("-")
    return "-".join([lt_split[0], lt_split[1], "01"])

print first_day(dt_1)

For Panda's DataFrame, you can use dt["col_name_date"].apply(first_day).

Steffy answered 16/2, 2017 at 21:58 Comment(0)
D
1

From August 2019:

This should work:

[x.replace(day=1).date() for x in df['date']]

Only requirement is to make sure date is a datetime, which we can guarantee with a call to pd.to_datetime(df['date'])

Daune answered 16/8, 2019 at 11:50 Comment(0)
O
1

Liked Mikhail Venkov answer. Added below code to have the column added as a timestamp value and keep timezone information

    df['month'] = pd.to_datetime(df['timestamp'].dt.strftime('%Y-%m-01')).dt.tz_localize(timezone) 

where timezone = 'America/Los_Angeles' or whatever zone you want

Othilie answered 12/7, 2020 at 18:36 Comment(0)
S
0

One liner

df.set_index(
    df.index - pd.to_timedelta(pd.to_datetime(df.index).day - 1, unit="D")
)

(there is a warning about vectorization.)

Actually what I would rather do is to first make that index a proper column, do the operations on it and then make it an index again:

In [32]: df = pd.DataFrame(
                index=[datetime.date.fromisoformat("1986-01-31"), datetime.date.fromisoformat("1986-02-28")], 
                data={"value":[22.93, 15.46]})

In [33]: df
Out[33]:
            value
1986-01-31  22.93
1986-02-28  15.46

In [34]: df2 = df.reset_index().rename(columns={"index": "date"})

In [35]: df2
Out[35]:
         date  value
0  1986-01-31  22.93
1  1986-02-28  15.46

In [36]: df2.date = pd.to_datetime(df2.date)

In [37]: df2.date
Out[37]:
0   1986-01-31
1   1986-02-28
Name: date, dtype: datetime64[ns]

In [38]: df2.date -= pd.to_timedelta(df2.date.dt.day - 1, unit="D")

In [39]: df2
Out[39]:
        date  value
0 1986-01-01  22.93
1 1986-02-01  15.46

In [40]: df2.set_index("date")
Out[40]: 
            value
date             
1986-01-01  22.93
1986-02-01  15.46

The most important line is:

df2.date -= pd.to_timedelta(df2.date.dt.day - 1, unit="D")

where you actually remove the day offset.

Spirillum answered 15/3, 2022 at 17:48 Comment(0)
Y
0

I needed both floor and ceiling functions. The following worked for me:

import pandas as pd

def monthfloor(ts):
    offset = pd.offsets.Hour(1)

    if offset.is_month_start(ts):
        return ts
    else:
        return ts - pd.offsets.MonthBegin()

def monthceil(ts):
    offset = pd.offsets.Hour(1)

    if offset.is_month_end(ts):
        return ts
    else:
        return ts + pd.offsets.MonthEnd()

begin = pd.Timestamp(2023, 4, 1)
middle = pd.Timestamp(2023, 4, 15)
end = pd.Timestamp(2023, 4, 30)

tss = pd.Series([begin, middle, end])

print(tss.map(monthfloor))
# 0   2023-04-01
# 1   2023-04-01
# 2   2023-04-01
# dtype: datetime64[ns]

print(tss.map(monthceil))
# 0   2023-04-30
# 1   2023-04-30
# 2   2023-04-30
# dtype: datetime64[ns]
Yeld answered 30/4, 2023 at 19:6 Comment(0)
K
0

First add an offset, and then remove it. This way also the first of the month behaves well (and it should work also for other offsets):

import pandas as pd

df = pd.DataFrame({"value": pd.date_range("2017-01-01", "2017-02-10", freq="W")})

df["floored"] = df["value"] + pd.offsets.MonthBegin() - pd.offsets.MonthBegin()

Result:

>>> print(df)
       value    floored
0 2017-01-01 2017-01-01
1 2017-01-08 2017-01-01
2 2017-01-15 2017-01-01
3 2017-01-22 2017-01-01
4 2017-01-29 2017-01-01
5 2017-02-05 2017-02-01
Kovach answered 27/4 at 20:18 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.