Match words that don't start with a certain letter using regex
Asked Answered
T

2

5

I am learning regex but have not been able to find the right regex in python for selecting characters that start with a particular alphabet.

Example below

text='this is a test'
match=re.findall('(?!t)\w*',text)

# match returns
['his', '', 'is', '', 'a', '', 'est', '']

match=re.findall('[^t]\w+',text)

# match
['his', ' is', ' a', ' test']

Expected : ['is','a']

Toxic answered 16/5, 2018 at 15:13 Comment(2)
Try: regex101.com/r/OzUEO9/1Chuipek
[i for i in text.split() if i[0] != 't']Anthony
G
8

With regex

Use the negative set [^\Wt] to match any alphanumeric character that is not t. To avoid matching subsets of words, add the word boundary metacharacter, \b, at the beginning of your pattern.

Also, do not forget that you should use raw strings for regex patterns.

import re

text = 'this is a test'
match = re.findall(r'\b[^\Wt]\w*', text)

print(match) # prints: ['is', 'a']

See the demo here.

Without regex

Note that this is also achievable without regex.

text = 'this is a test'
match = [word for word in text.split() if not word.startswith('t')]

print(match) # prints: ['is', 'a']
Gynaecomastia answered 16/5, 2018 at 15:22 Comment(0)
M
2

You are almost on the right track. You just forgot \b (word boundary) token:

\b(?!t)\w+

Live demo

Mungo answered 16/5, 2018 at 15:34 Comment(1)
Thanks .Actually match=re.findall(r'\b(?!t)\w+',text) worked. It was looking for raw stringToxic

© 2022 - 2024 — McMap. All rights reserved.