Using a variable within a regular expression in Pandas str.contains()
Asked Answered
F

3

14

I'm attempting to select rows from a dataframe using the pandas str.contains() function with a regular expression that contains a variable as shown below.

df = pd.DataFrame(["A test Case","Another Testing Case"], columns=list("A"))
variable = "test"
df[df["A"].str.contains(r'\b' + variable + '\b', regex=True, case=False)] #Returns nothing

While the above returns nothing, the following returns the appropriate row as expected

df[df["A"].str.contains(r'\btest\b', regex=True, case=False)] #Returns values as expected

Any help would be appreciated.

Foam answered 4/12, 2018 at 22:2 Comment(1)
Perhaps your issue is that you are concatenating the raw strings to a standard string?? Maybe try fr'\b{variable}\b'Uglify
S
22

Both word boundary characters must be inside raw strings. Why not use some sort of string formatting instead? String concatenation as a rule is generally discouraged.

df[df["A"].str.contains(fr'\b{variable}\b', regex=True, case=False)] 
# Or, 
# df[df["A"].str.contains(r'\b{}\b'.format(variable), regex=True, case=False)] 

             A
0  A test Case
Shrewd answered 4/12, 2018 at 22:5 Comment(2)
How would you do this if you had the specify the amount of characters, since that happens with [0-9]{3}, for example if you want a pattern of three numbers. Was facing this problem just yet, so just used string concatenation which solved it, and f-string didnt work.Ku
@Ku the standard method is to escape the curly braces. If memory serves, that would be {{3}}.Shrewd
C
0

Following command work for me:
df.query('text.str.contains(@variable)')

Constant answered 4/5, 2021 at 14:25 Comment(0)
P
-1

I had the exact same problem when parsing a 'variable' to str.contains(variable).

Try using str.contains(variable, regex=False)

It worked for me perfectly.

Palembang answered 25/7, 2019 at 16:33 Comment(1)
Clearly the opposite as the OP requested.Rout

© 2022 - 2024 — McMap. All rights reserved.