str.startswith using Regex
Asked Answered
C

3

13

Can i understand why the str.startswith() is not dealing with Regex :

   col1
0  country
1  Country

i.e : df.col1.str.startswith('(C|c)ountry')

it returns all the values False :

   col1
0  False
1  False
Curvy answered 24/8, 2016 at 16:53 Comment(4)
are you sure startswith accepts string or regex as parameter?Wimble
pandas.Series.str.startswith does not accept regex.Disposition
I see! as i'am new to pandas, i was thinking that we can use regex for startswith like i used it for str.replace(). thanksCurvy
Here's a non-regex alternative: df['col1'].str.startswith(('country', 'Country')) (It accepts tuples for either-or)Glaring
P
27

Series.str.startswith does not accept regex because it is intended to behave similarly to str.startswith in vanilla Python, which does not accept regex. The alternative is to use a regex match (as explained in the docs):

df.col1.str.contains('^[Cc]ountry')

The character class [Cc] is probably a better way to match C or c than (C|c), unless of course you need to capture which letter is used. In this case you can do ([Cc]).

Pinckney answered 24/8, 2016 at 17:1 Comment(2)
Thanks for the clarifation @Mad Physicist this is usefulCurvy
The link to the documentation does not jump to the relevant section: pandas.pydata.org/pandas-docs/stable/user_guide/…Berneta
G
8

Series.str.startswith does not accept regexes. Use Series.str.match instead:

df.col1.str.match(r'(C|c)ountry', as_indexer=True)

Output:

0    True
1    True
Name: col1, dtype: bool
Genisia answered 24/8, 2016 at 16:59 Comment(8)
This won't work if there is text following "country" since the whole expression must match. See my answer for something that is really equivalent to startswith.Pinckney
@MadPhysicist Not true, at least in v0.18.1. Series.str.match relies on re.match, which matches at the beginning of the string.Genisia
Also, match is now a deprecated function: pandas.pydata.org/pandas-docs/stable/generated/…Pinckney
@MadPhysicist Try yourself: pd.Series(['countryasdf', 'Country']).str.match(r'(C|c)ountry', as_indexer=True) gives pd.Series([True, True]).Genisia
Yes, with as_indexer=True this works. But still deprecated.Pinckney
It may be deprecated, but it works. Before claiming something does not work, please make sure it really does not.Genisia
Without as_indexer=True, you get pd.Series([('c',), ('C',)]).Genisia
My mistake. I am using an earlier version of Pandas. You are absolutely correct.Pinckney
D
0

Series.str.startswith can also receive a tuple like this:

df.col1.str.startswith(("Country","country"))

All elements from the tuple are now searched for. You can also read the tuple as an OR operator within Series.str.startswith.

Didymium answered 31/1, 2024 at 5:46 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.