Using lambda conditional and pandas str.contains to lump strings
Asked Answered
M

2

6

Trying to learn some stuff, I'm messing around with the global shark attack database on Kaggle and I'm trying to find the best way to lump strings using a lambda function and str.contains.

Basically anywhere a string contains a phrase with skin diving e.g. 'skin diving for abalone' , in the data['Activity'] column I want to replace the activity with skin diving. (there are 92 variations for skin diving hence trying to use the lambda function)

I can return a boolean series using

data['Activity].str.contains('skin diving')

But I'm unsure how to change the value if this condition is true

My lambda function = data.apply(lambda x: 'free diving' if x.str.contains('free diving)) but i'm getting a syntax error and i'm not familiar enough with lambda functions and pandas to get it right, any help would be appreciated.

Meal answered 9/2, 2017 at 19:35 Comment(1)
The if expression must have an else part: x if condition else y. Your lambda does not have the else part.Hipparch
I
17

Instead of using a Series.str method, you can use the in operator in your lambda to test for the substring

data['activity'] = data['activity'].apply(lambda x: 'skin diving' if 'skin diving' in x else x)
Incoordination answered 9/2, 2017 at 19:40 Comment(3)
Thanks mate, very useful. I hadn't thought about the in operatorMeal
How would I use regex against a string? I want to loop through series cells and if cell contains parenthesis ( or ) do something, else do something elseWame
It also seems to be the case that a conditional in a lambda statement requires an else clause.Donne
M
5

You could use str.contains method with np.where

In [141]: df
Out[141]:
         activity
0  free diving ok
1              ok

In [142]: df.activity = np.where(df.activity.str.contains('free diving'),
                                 'free diving', df.activity)

In [143]: df
Out[143]:
      activity
0  free diving
1           ok
Maynard answered 9/2, 2017 at 20:8 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.