Pandas: nan->None
Asked Answered
A

3

8

pandas.DataFrame.to_dict converts nan to nan and null to None. As explained in Python comparison ignoring nan this is sometimes suboptimal.

Is there a way to convert all nans to None? (either in pandas or later on in Python)

E.g.,

>>> df = pd.DataFrame({"a":[1,None],"b":[None,"foo"]})
>>> df
     a     b
0  1.0  None
1  NaN   foo
>>> df.to_dict()
{'a': {0: 1.0, 1: nan}, 'b': {0: None, 1: 'foo'}}

I want

{'a': {0: 1.0, 1: None}, 'b': {0: None, 1: 'foo'}}

instead.

Aeriel answered 25/1, 2018 at 22:50 Comment(0)
D
10
import pandas as pd

df = pd.DataFrame({"a":[1,None],"b":[None,"foo"]})
df.where((pd.notnull(df)), None)
Out[850]: 
      a     b
0     1  None
1  None   foo
df.where((pd.notnull(df)), None).to_dict()
Out[851]: {'a': {0: 1.0, 1: None}, 'b': {0: None, 1: 'foo'}}
Decontaminate answered 25/1, 2018 at 22:55 Comment(5)
I'll note that this does the same thing, converts every column to an object type, just that it does it in two steps.Misti
@cᴏʟᴅsᴘᴇᴇᴅ yep, you are right , almost the same :-)Decontaminate
Just mentioning that since OP seems to think this is converting the data to string (which isn't the case!).Misti
@cᴏʟᴅsᴘᴇᴇᴅ: this is different from what you suggested because it works on the externally generated DataFrame, as opposed to creating a generic DF from scratch.Aeriel
@Aeriel I am aware of what it does. My point in my previous comment was that the end result is the same (a generic dataframe), not a dataframe of strings like you initially surmised. I was only addressing your misconception, nothing more.Misti
M
3

Initialise as an object DataFrame (at your peril...):

df = pd.DataFrame({"a":[1,None],"b":[None,"foo"]}, dtype=object)    
df

      a     b
0     1  None
1  None   foo

In the first column, pandas attempts to infer the dtype, and guesses float. You can prevent that by forcing it to remain object thereby suppressing any type of conversion at all.

Misti answered 25/1, 2018 at 22:53 Comment(9)
This is cheating. I have numeric columns in the DataFrame, and converting it to string loses information.Aeriel
@Aeriel No, there is no string conversion taking place.Misti
Each column is initialised as column of python objects. Pandas no longer makes assumptions about what its content is, and falls back to slow methods of operating on it.Misti
I had a feeling though that df = pd.DataFrame({"a":[1,None],"b":[None,"foo"]}) was an MCVE to give a starting DF to play with. In reality, if you're at the end of a chain of processes, does it make sense to convert your whole resulting DF to object before to_dict()?Article
@Aeriel object != strOrnithorhynchus
@cᴏʟᴅsᴘᴇᴇᴅ I've just seen your comment on the other answer so I'm probably wrong here.Article
@Article It usually doesn't make sense converting any dataframe to object except in the rarest of cases. OP seems to have a good reason for wanting to do so, so I'm not getting in their way here...Misti
@cᴏʟᴅsᴘᴇᴇᴅ No, what I meant by my very last comment is I missed something. df.where((pd.notnull(df)), None).to_dict() looks the business, but you stated it's converting to object type in two steps. So your answer, on the surface, does look like a cheat to me because you alter the DF at creation but ultimately it doesn't matter. +1 for reshaping my thinking :)Article
@Article Cheers, as long as you call pd.DataFrame somewhere, this works :DMisti
F
0

I found that the accepted answer did not work, but this did:

df.replace([np.nan], [None]).to_dict('records')

I don't know why. I can say at least that all fields of the df that appeared to have na values in them did verify as such by checking them with df.isna().

I got the solution from here.

Fortnightly answered 19/6 at 10:39 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.