Replace values in DataFrame column when they start with string using lambda
Asked Answered
R

3

7

I have a DataFrame:

import pandas as pd
import numpy as np
x = {'Value': ['Test', 'XXX123', 'XXX456', 'Test']}
df = pd.DataFrame(x)

I want to replace the values starting with XXX with np.nan using lambda.

I have tried many things with replace, apply and map and the best I have been able to do is False, True, True, False.

The below works, but I would like to know a better way to do it and I think the apply, replace and a lambda is probably a better way to do it.

df.Value.loc[df.Value.str.startswith('XXX', na=False)] = np.nan
Ragwort answered 22/8, 2019 at 17:23 Comment(3)
does your dataframe has just the 1 column? and apply isnt a preffered way bdwRarefy
dataframe has many columnsRagwort
and each column has values starting with XXX which you want to replace with np.nan or is it just 1 column?Rarefy
C
16

use the apply method

In [80]: x = {'Value': ['Test', 'XXX123', 'XXX456', 'Test']}
In [81]: df = pd.DataFrame(x)
In [82]: df.Value.apply(lambda x: np.nan if x.startswith('XXX') else x)
Out[82]:
0    Test
1     NaN
2     NaN
3    Test
Name: Value, dtype: object

Performance Comparision of apply, where, loc enter image description here

Copeck answered 22/8, 2019 at 17:34 Comment(1)
Excellent. This answer helps me understand lambda better for this sort of thing.Ragwort
R
5

np.where() performs way better here:

df.Value=np.where(df.Value.str.startswith('XXX'),np.nan,df.Value)

Performance vs apply on larger dfs:

enter image description here

Rarefy answered 22/8, 2019 at 17:47 Comment(2)
I like the np.where option you presented. How does the apply lambda test against it?Ragwort
@Ragwort check thisRarefy
D
1

Use of .loc is not necessary. Write just:

df.Value[df.Value.str.startswith('XXX')] = np.nan

Lambda function could be necessary if you wanted to compute some expression to be substituted. In this case just np.nan is enough.

Debar answered 22/8, 2019 at 17:29 Comment(2)
Thanks very much for your answer. It looks like I kind of fell on the right path anyway??Ragwort
I thouhgt actually about applying a lambda function, which returns some value to be substituted. In this case the value to substitute is just np.nan, so there is no need to apply any lambda function.Debar

© 2022 - 2024 — McMap. All rights reserved.