Convert pandas.Series from dtype object to float, and errors to nans [duplicate]

Asked 20/9, 2014 at 20:1 Answered 6/11, 2022 at 6:53

Consider the following situation:

In [2]: a = pd.Series([1,2,3,4,'.'])

In [3]: a
Out[3]: 
0    1
1    2
2    3
3    4
4    .
dtype: object

In [8]: a.astype('float64', raise_on_error = False)
Out[8]: 
0    1
1    2
2    3
3    4
4    .
dtype: object

I would have expected an option that allows conversion while turning erroneous values (such as that .) to NaNs. Is there a way to achieve this?

Fag answered 20/9, 2014 at 20:1 Comment(0)

Use `pd.to_numeric` with `errors='coerce'`

# Setup
s = pd.Series(['1', '2', '3', '4', '.'])
s

0    1
1    2
2    3
3    4
4    .
dtype: object

pd.to_numeric(s, errors='coerce')

0    1.0
1    2.0
2    3.0
3    4.0
4    NaN
dtype: float64

If you need the NaNs filled in, use Series.fillna.

pd.to_numeric(s, errors='coerce').fillna(0, downcast='infer')

0    1
1    2
2    3
3    4
4    0
dtype: float64

Note, downcast='infer' will attempt to downcast floats to integers where possible. Remove the argument if you don't want that.

From v0.24+, pandas introduces a Nullable Integer type, which allows integers to coexist with NaNs. If you have integers in your column, you can use
pd.__version__
# '0.24.1'

pd.to_numeric(s, errors='coerce').astype('Int32')

0      1
1      2
2      3
3      4
4    NaN
dtype: Int32
There are other options to choose from as well, read the docs for more.

Extension for `DataFrames`

If you need to extend this to DataFrames, you will need to apply it to each row. You can do this using DataFrame.apply.

# Setup.
np.random.seed(0)
df = pd.DataFrame({
    'A' : np.random.choice(10, 5), 
    'C' : np.random.choice(10, 5), 
    'B' : ['1', '###', '...', 50, '234'], 
    'D' : ['23', '1', '...', '268', '$$']}
)[list('ABCD')]
df

   A    B  C    D
0  5    1  9   23
1  0  ###  3    1
2  3  ...  5  ...
3  3   50  2  268
4  7  234  4   $$

df.dtypes

A     int64
B    object
C     int64
D    object
dtype: object

df2 = df.apply(pd.to_numeric, errors='coerce')
df2

   A      B  C      D
0  5    1.0  9   23.0
1  0    NaN  3    1.0
2  3    NaN  5    NaN
3  3   50.0  2  268.0
4  7  234.0  4    NaN

df2.dtypes

A      int64
B    float64
C      int64
D    float64
dtype: object

You can also do this with DataFrame.transform; although my tests indicate this is marginally slower:

df.transform(pd.to_numeric, errors='coerce')

   A      B  C      D
0  5    1.0  9   23.0
1  0    NaN  3    1.0
2  3    NaN  5    NaN
3  3   50.0  2  268.0
4  7  234.0  4    NaN

If you have many columns (numeric; non-numeric), you can make this a little more performant by applying pd.to_numeric on the non-numeric columns only.

df.dtypes.eq(object)

A    False
B     True
C    False
D     True
dtype: bool

cols = df.columns[df.dtypes.eq(object)]
# Actually, `cols` can be any list of columns you need to convert.
cols
# Index(['B', 'D'], dtype='object')

df[cols] = df[cols].apply(pd.to_numeric, errors='coerce')
# Alternatively,
# for c in cols:
#     df[c] = pd.to_numeric(df[c], errors='coerce')

df

   A      B  C      D
0  5    1.0  9   23.0
1  0    NaN  3    1.0
2  3    NaN  5    NaN
3  3   50.0  2  268.0
4  7  234.0  4    NaN

Applying pd.to_numeric along the columns (i.e., axis=0, the default) should be slightly faster for long DataFrames.

Radiancy answered 22/12, 2017 at 14:6 Comment(6)

This is awesome :-) , We should keep the thing updated , since left deprecated method on this site is not good for the future visitor :-) , You did it – Lilian 23/12, 2017 at 16:28

Aha, maybe you can add s.str.isalnum() :-) combine with mask – Lilian 23/12, 2017 at 16:31

@Wen You mean s.str.isdigit()? It will only work for integers, not floats. Good idea though. – Radiancy 23/12, 2017 at 16:32

@Wen as per I know to_numeric is gold here. – Milesmilesian 23/12, 2017 at 16:33

@cᴏʟᴅsᴘᴇᴇᴅ as of reference we can take this question https://mcmap.net/q/162886/-how-to-replace-39-any-strings-39-with-nan-in-pandas-dataframe-using-a-boolean-mask/4800652 to consideration. – Milesmilesian 23/12, 2017 at 16:43

Thanks! However, using axis=0 yields an error in my Python 3.6.2: File "C:\Python\lib\site-packages\pandas\core\series.py", line 2294, in apply mapped = lib.map_infer(values, f, convert=convert_dtype) File "pandas\src\inference.pyx", line 1207, in pandas.lib.map_infer (pandas\lib.c:66124) File "C:\Python\lib\site-packages\pandas\core\series.py", line 2282, in <lambda> f = lambda x: func(x, *args, **kwds) TypeError: to_numeric() got an unexpected keyword argument 'axis' – Christoffer 24/5, 2018 at 10:28

In [30]: pd.Series([1,2,3,4,'.']).convert_objects(convert_numeric=True)
Out[30]: 
0     1
1     2
2     3
3     4
4   NaN
dtype: float64

Asturias answered 20/9, 2014 at 20:6 Comment(5)

I think I took out the function of raise_on_error a while back. Doesn't do anything ATM. – Asturias 20/9, 2014 at 20:9

I opened an issue for enhancement here: github.com/pydata/pandas/issues/8332, feel free to comment on proposed API – Asturias 20/9, 2014 at 20:12

On 0.14.1 it prevents it from throwing an exception. Without specifying it the astype statement would've raise an error. – Fag 20/9, 2014 at 20:12

sorry you are right. That's why never JUST read the code, test it :) – Asturias 20/9, 2014 at 20:13

.convert_objects() method has been deprecated since 0.17, pd.to_numeric is the new way to go. – Mimas 3/11, 2016 at 8:5

-2

Do this:

pd.to_numeric(s, errors='coerce')

Optimum answered 6/11, 2022 at 6:53 Comment(1)

repetitive answer with no additional effort – Intramundane 24/5, 2023 at 20:45

Use `pd.to_numeric` with `errors='coerce'`

Extension for `DataFrames`

Recommended topics

Hot tags

Use pd.to_numeric with errors='coerce'

Extension for DataFrames

Recommended topics

Hot tags

Use `pd.to_numeric` with `errors='coerce'`

Extension for `DataFrames`