Error:cannot convert float NaN to integer in pandas
Asked Answered
P

3

6

I have the dataframe:

   a            b     c      d
0 nan           Y     nan   nan
1  1.27838e+06  N      3     96
2 nan           N      2    nan
3  284633       Y     nan    44

I try to change the data which is non zero to interger type to avoid exponential data(1.27838e+06):

f=lambda x : int(x)
df['a']=np.where(df['a']==None,np.nan,df['a'].apply(f))

But I get error also event thought I wish to change the dtype of not null value, anyone can point out my error? thanks

Photoreconnaissance answered 4/7, 2017 at 3:10 Comment(1)
NaN cannot be represented by integer, what are you expecting here? If you want ints then you need to state what value you expect here after conversionMichelinamicheline
L
6

Pandas doesn't have the ability to store NaN values for integers. Strictly speaking, you could have a column with mixed data types, but this can be computationally inefficient. So if you insist, you can do

df['a'] = df['a'].astype('O')
df.loc[df['a'].notnull(), 'a'] = df.loc[df['a'].notnull(), 'a'].astype(int)
Lentil answered 4/7, 2017 at 3:34 Comment(0)
A
1

As far as I have read in the pandas documentation, it is not possible to represent an integer NaN:

"In the absence of high performance NA support being built into NumPy from the ground up, the primary casualty is the ability to represent NAs in integer arrays."

As it is explained later, it is due to memory and performance reasons, and also so that the resulting Series continues to be “numeric”. One possibility is to use dtype=object arrays instead.

Airdry answered 4/7, 2017 at 3:31 Comment(2)
if make it as object then i may face problem when involve the row in calculation later, that mean no way to make 2 different data type which is float64 and int is it?Photoreconnaissance
@VivianTio, yeah no way, at least no way to be efficient.Airdry
D
0

You can use Int64 datatype to keep it simple, like:

Working example:

df = pd.DataFrame({"id": ["1", "2", "3", "4", np.NaN]}, columns=["id"]).astype('float64').astype('Int64')

Non working example (returns the same error as yours, int64 doesn't support NaNs):

df = pd.DataFrame({"id": ["1", "2", "3", "4", np.NaN]}, columns=["id"]).astype('float64').astype('int64')

The difference it's just that Int64 is more flexible. It is used to represent integers but can deal with NaNs, like in your case. So you can fill the values later if needed.

Reference: https://github.com/pandas-dev/pandas/issues/27731

Deleterious answered 29/8, 2024 at 11:36 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.