Python Timedelta64 convert days to months
Asked Answered
E

3

11

I have a Pandas dataframe with 2 columns representing a start-timestamp and an end-timestamp:

start       end
2016-06-13  2016-07-20

The datatype of these columns is datetime64[ns].

I now want to create a new column showing the difference in months:

start       end         duration
2016-06-13  2016-07-20  1.1

What I tried is to dow the following:

df['duration'] = df['end'] - df['start']

The result looks like this:

start       end         duration
2016-06-13  2016-07-20  37 days 00:00:00.000000000

I then tried to do the following:

df['duration'] = df['end'] - df['start']).dt.months

But this yields the following error

AttributeError: 'TimedeltaProperties' object has no attribute 'months'

The datatype of the duration column is timedelta64[ns].

How can I achieve the desired result?

Emmittemmons answered 19/8, 2018 at 13:16 Comment(5)
What counts as a month... is it a fixed number of days or something else... Is 37 days on from January 31st - 1 month or two months?Flapdoodle
ideally it should be the true value of the amount of days of the month that lies between the start and end dates.Emmittemmons
That doesn't answer the question in the comment though... it'd help if you explain the exact rules of what a month is or isn't and provide some examples of inputs/outputs... A simple approach is just to say a month is 30 days (or N other amount) and that's a simple division... anything more complicated than that needs rules defined for expected results...Flapdoodle
the more i think about it, the less trivial it gets :) i guess i don't know for sure what I want. Probably just a simple solution to get the number of months (e.g. with a fixed amount of days that is 30) between two dates.Emmittemmons
Just take the number of days and divide by 30 then :)Flapdoodle
N
13
import numpy as np #version: 1.16.2
import pandas as pd #version: 0.25.1

df['duration'] = (df['end'] - df['start'])/np.timedelta64(1, 'M')
Nabokov answered 30/1, 2020 at 14:58 Comment(1)
This is actually a neat answer!Cindelyn
T
2

The previous code no more works in the recent versions of numpy.

TypeError: Cannot get a common metadata divisor for NumPy datetime metadata [D] and [M] because they have incompatible nonlinear base time units
import numpy as np #version: 1.18.5
import pandas as pd #version: 1.1.5
df['duration'] = (df['end'] - df['start']).astype('timedelta64[M]')/np.timedelta64(1, 'M')
Towny answered 4/6, 2021 at 16:48 Comment(0)
L
0

Months 'M' unit is deprecated in last versions and will raise an error 'Unit M is not supported'. So we can simply use days 'D' difference devided by 30.

df['duration'] = (df['end'] - df['start'])/np.timedelta64(1, 'D')/30

Or for precise calculations for longer periods use 'to_period'

df['duration'] = (pd.to_datetime(df['end']).to_period('M') - 
                  pd.to_datetime(df['start']).to_period('M')
                  ).apply(lambda x: x.n)
Lo answered 28/2, 2024 at 15:11 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.