Using lambda and strftime on dates when there are null values (Pandas)
Asked Answered
M

3

7

I'm trying to change the format of a datetime column in my Dataframe using lambda and strftime like below

df['Date Column'] = df['Date Column'].map(lambda x: x.strftime('%m/%d/%Y'))

However, since I have null values in some of these fields, this is giving me an error. I cannot drop these null rows because I still need them for the data in the other columns. Is there a way around this error without dropping the nulls.

Perhaps something like

df['Date Column'].map(lambda x: x.strftime('%m/%d/%Y') if x != null else "")

?

The method I've used is to drop the nulls, format the column, then merge it back onto the original dataset, but this seems like a very inefficient method.

Maleficence answered 18/2, 2016 at 17:10 Comment(2)
"Perhaps something like...." That's what I was about to suggest. Did it work?Donner
It does not sadly. I've tried different forms of it. .notnull(), != np.nan, != "NaT", != "NaN, but none of them have worked, so I'm wondering if the method is wrongMaleficence
D
10

You should be not checking for nan/nat (un)equality, but .notnull() should work and it does for me:

s = pd.date_range('2000-01-01', periods=5).to_series().reset_index(drop=True)
s[2] = None
s

0   2000-01-01
1   2000-01-02
2          NaT
3   2000-01-04
4   2000-01-05
dtype: datetime64[ns]

s.map(lambda x: x.strftime('%m/%d/%Y') if pd.notnull(x) else '')

0    01/01/2000
1    01/02/2000
2              
3    01/04/2000
4    01/05/2000
dtype: object

This returns the same that the answers by @Alexander and @Batman but is more explicit. It may also be slightly slower for large series.

Alternatively you can use the .dt accesor. The null values will be formatted as NaT.

s.dt.strftime('%m/%d/%Y')

0    01/01/2000
1    01/02/2000
2           NaT
3    01/04/2000
4    01/05/2000
dtype: object
Diseased answered 18/2, 2016 at 18:1 Comment(3)
s.dt.strftime('%m/%d/%Y') does not seem to work with None values, at least with pandas 1.1.5: AttributeError: Can only use .dt accessor with datetimelike valuesCounterclaim
My code still works for the specific problem stated in the OP: a "datetime column". You must have some different, maybe a series of dtype object with objects of mixed types. My answer does not address such scenario.Diseased
Indeed, this will work if the column is correctly set to a datetime dtype. If all columns are None, the column may be set to object and calling dt will result in a crash.Counterclaim
O
3

Personally I'd just define a small function, and then use that.

def to_string(date):
    if date:
        string = date.strftime('%Y%m%d')
    else:
        string = ""

    return string

Then

df['Date Column'].map(to_string) 

Otherwise

df['Date Column'].map(lambda x: x.strftime('%Y%m%d') if x else "")
Odelet answered 18/2, 2016 at 17:20 Comment(2)
Sorry. You hadn't answered when I started editing my answer. I prefer the function definition because I think it's easier to read. Which is why I prefaced my answer with "personally". It occurred to me afterwards that they may prefer not to do it that way.Odelet
Both methods ended up giving me the same error I was getting about null values. Goyo's solution worked for me, so I will be marking that as answerMaleficence
F
1

You can use a conditional assignment (ternary).

df['Date Column'] = df['Date Column'].map(lambda x: x.strftime('%m/%d/%Y') if x else '')
Fecal answered 18/2, 2016 at 17:22 Comment(3)
For some reason this method is still giving me the same error I was getting about null values. Goyo's solution worked for me, so I will be marking that as answerMaleficence
I guess it depends on what you mean by "null values in some of these fields" (None vs. NaT vs. NaN). I assumed None, but using notnull() includes the other cases. Sample data helps to create better solutions that fit your requirements.Fecal
Ah you're totally correct. It was NaT because it was in datetime format, I apologize about that.Maleficence

© 2022 - 2024 — McMap. All rights reserved.