How to replace dashes in a python dataframe by NaN?

Asked 10/6, 2017 at 7:21 Answered 16/10, 2022 at 17:47

I have a dataframe (in Python) as follows:

      A     B    C    D     E      F     G    H

0    T-1    2    3    -    M-7     2          2

1    T-2    5    4    2    M-7     4          3

2    T-3    -    -    3    M-7            9   4  

3    T-4    6    -    4    M-7     -      9   5

4    T-5    -    1    5    M-7     8          6

I would like to replace the dashes (excluding those in column A and E) with NaN.

I tried df.replace('-', np.nan).It ended up with replacing the entire cells of columns A and E as well. I guess I can use df[column_name].replace('-',np.nan) but then I would have to apply it for each column separately.

Is there any other way, which is efficient and will work for any number of columns with similar restrictions?

Chiccory answered 10/6, 2017 at 7:21 Comment(3)

df.replace('-', np.nan) should work. Not sure why it's not working for you. – Dioscuri 10/6, 2017 at 7:39

As I have mentioned in the question,df.replace('-', np.nan) works more than I require. I would like columns A and E remain as they are but the dashes of other columns should be replaced. – Chiccory 10/6, 2017 at 7:57

Using df.replace('-', np.nan) shouldn't change anything in column A and E. It should only replace the dashes in the other columns. It should do exactly what you want, so I don't understand why it's not working for you. I can't reproduce this problem. – Dioscuri 10/6, 2017 at 8:18

In [18]: df[df.columns.drop(['A','E'])] = \
             df[df.columns.drop(['A','E'])].replace('-', np.nan)

In [19]: df
Out[19]:
   A    B    C    D    E    F    G    H
0  a  T-1    a  NaN  M-7    a    a    a
1  b  T-2  NaN    b  M-7  NaN    b    b
2  c  T-3    c    c  M-7    c    c  NaN
3  d  T-4    d    d  M-7  NaN    d    d
4  e  T-5  NaN    e  M-7  NaN  NaN  NaN

Ducks answered 10/6, 2017 at 7:45 Comment(0)

This should work.

df = pd.DataFrame({'A': list('abcde'),
                  'B': ['T-1', 'T-2', 'T-3', 'T-4', 'T-5'],
                  'C': ['a', '-', 'c', 'd', '-'],
                  'D': ['-', 'b', 'c', 'd', 'e'],
                  'E': ['M-7', 'M-7', 'M-7', 'M-7', 'M-7'],
                  'F': ['a', '-', 'c', '-', '-'],
                  'G': ['a', 'b', 'c', 'd', '-'],
                  'H': ['a', 'b', '-', 'd', '-']
                  })

df = df.astype(str)

s = df.applymap(lambda x: re.sub(r'^-$', str(np.NaN), x))

Output:

   A    B    C    D    E    F    G    H
0  a  T-1    a  nan  M-7    a    a    a
1  b  T-2  nan    b  M-7  nan    b    b
2  c  T-3    c    c  M-7    c    c  nan
3  d  T-4    d    d  M-7  nan    d    d
4  e  T-5  nan    e  M-7  nan  nan  nan

Unintentional answered 10/6, 2017 at 7:32 Comment(2)

Your method is great. But for the dataframe that I have given above, I am getting an error: TypeError: ('expected string or bytes-like object', 'occurred at index B '). – Chiccory 10/6, 2017 at 8:1

Convert all the columns in your dataframe to string type and it will work for you, like so: df.astype(str) – Unintentional 10/6, 2017 at 8:34

Simply use df=df.replace('-', np.nan)

Brick answered 16/10, 2022 at 17:47 Comment(2)

Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center. – Soll 19/10, 2022 at 7:20

I think this does not fully address the question, this method was proposed and rejected – Unassailable 20/10, 2022 at 6:50

Recommended topics

Hot tags