How to replace dashes in a python dataframe by NaN?
Asked Answered
C

3

7

I have a dataframe (in Python) as follows:

      A     B    C    D     E      F     G    H

0    T-1    2    3    -    M-7     2          2

1    T-2    5    4    2    M-7     4          3

2    T-3    -    -    3    M-7            9   4  

3    T-4    6    -    4    M-7     -      9   5

4    T-5    -    1    5    M-7     8          6

I would like to replace the dashes (excluding those in column A and E) with NaN.

I tried df.replace('-', np.nan).It ended up with replacing the entire cells of columns A and E as well. I guess I can use df[column_name].replace('-',np.nan) but then I would have to apply it for each column separately.

Is there any other way, which is efficient and will work for any number of columns with similar restrictions?

Chiccory answered 10/6, 2017 at 7:21 Comment(3)
df.replace('-', np.nan) should work. Not sure why it's not working for you.Dioscuri
As I have mentioned in the question,df.replace('-', np.nan) works more than I require. I would like columns A and E remain as they are but the dashes of other columns should be replaced.Chiccory
Using df.replace('-', np.nan) shouldn't change anything in column A and E. It should only replace the dashes in the other columns. It should do exactly what you want, so I don't understand why it's not working for you. I can't reproduce this problem.Dioscuri
D
2
In [18]: df[df.columns.drop(['A','E'])] = \
             df[df.columns.drop(['A','E'])].replace('-', np.nan)

In [19]: df
Out[19]:
   A    B    C    D    E    F    G    H
0  a  T-1    a  NaN  M-7    a    a    a
1  b  T-2  NaN    b  M-7  NaN    b    b
2  c  T-3    c    c  M-7    c    c  NaN
3  d  T-4    d    d  M-7  NaN    d    d
4  e  T-5  NaN    e  M-7  NaN  NaN  NaN
Ducks answered 10/6, 2017 at 7:45 Comment(0)
U
1

This should work.

df = pd.DataFrame({'A': list('abcde'),
                  'B': ['T-1', 'T-2', 'T-3', 'T-4', 'T-5'],
                  'C': ['a', '-', 'c', 'd', '-'],
                  'D': ['-', 'b', 'c', 'd', 'e'],
                  'E': ['M-7', 'M-7', 'M-7', 'M-7', 'M-7'],
                  'F': ['a', '-', 'c', '-', '-'],
                  'G': ['a', 'b', 'c', 'd', '-'],
                  'H': ['a', 'b', '-', 'd', '-']
                  })

df = df.astype(str)

s = df.applymap(lambda x: re.sub(r'^-$', str(np.NaN), x))

Output:

   A    B    C    D    E    F    G    H
0  a  T-1    a  nan  M-7    a    a    a
1  b  T-2  nan    b  M-7  nan    b    b
2  c  T-3    c    c  M-7    c    c  nan
3  d  T-4    d    d  M-7  nan    d    d
4  e  T-5  nan    e  M-7  nan  nan  nan
Unintentional answered 10/6, 2017 at 7:32 Comment(2)
Your method is great. But for the dataframe that I have given above, I am getting an error: TypeError: ('expected string or bytes-like object', 'occurred at index B ').Chiccory
Convert all the columns in your dataframe to string type and it will work for you, like so: df.astype(str)Unintentional
B
0

Simply use df=df.replace('-', np.nan)

Brick answered 16/10, 2022 at 17:47 Comment(2)
Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.Soll
I think this does not fully address the question, this method was proposed and rejectedUnassailable

© 2022 - 2024 — McMap. All rights reserved.