How do I check if a string matches the following pattern?
Uppercase letter, number(s), uppercase letter, number(s)...
Example:
- These would match:
A1B2 B10L1 C1N200J1
- These wouldn't ('^' points to problem)
a1B2 ^ A10B ^ AB400 ^
How do I check if a string matches the following pattern?
Uppercase letter, number(s), uppercase letter, number(s)...
Example:
A1B2
B10L1
C1N200J1
a1B2
^
A10B
^
AB400
^
import re
pattern = re.compile("^([A-Z][0-9]+)+$")
pattern.match(string)
re.match
: If zero or more characters at the beginning of string match the regular expression pattern
. I just spent like 30 minutes trying to understand why I couldn't match something at the end of a string. Seems like it's not possible with match
, is it? For that, re.search(pattern, my_string)
works though. –
Kilocalorie ^
at the beginning when you use match
. I think it's a bit more complicated then that very simple explanation, but I'm not clear. You are correct that it does start from the beginning of the string though. –
Teevens search()
in this context. –
Murine search()
". It works perfectly fine with match. –
Teevens One-liner: re.match(r"pattern", string) # No need to compile
import re
>>> if re.match(r"hello[0-9]+", 'hello1'):
... print('Yes')
...
Yes
You can evaluate it as bool
if needed
>>> bool(re.match(r"hello[0-9]+", 'hello1'))
True
re.match
in the context of an if
, but you have to use bool
if you're using it elsewhere? –
Trevelyan re.match
. It only matches at the start of a string. Have a look at re.search
instead. –
Trevelyan if
checks for the match not being None
. –
Hypothalamus re
is used in more than one places to improve efficiency. In terms of error .match
would throw the same error what .compile
does. It's perfectly safe to use. –
Banks re
module compile and cache the patterns. Therefore there is absolutely no efficiency gain using compile and then match than just directly calling re.match
. All of these functions call the internal function _compile
(including re.compile
) which does the caching to a python dictionary. –
Carbide if (match := re.match(r"hello([0-9]+)", string)): print(match[1])
–
Nikianikita Please try the following:
import re
name = ["A1B1", "djdd", "B2C4", "C2H2", "jdoi","1A4V"]
# Match names.
for element in name:
m = re.match("(^[A-Z]\d[A-Z]\d)", element)
if m:
print(m.groups())
import re
import sys
prog = re.compile('([A-Z]\d+)+')
while True:
line = sys.stdin.readline()
if not line: break
if prog.match(line):
print 'matched'
else:
print 'not matched'
The re.match(...)
will not work if you want to match the full string.
For example;
re.match("[a-z]+", "abcdef")
✅ will give a matchre.match("[a-z]+", "abcdef 12345")
✅ will also give a match because there is a part in string which matches (maybe you don't want that when you're checking if the entire string is valid or not)Use re.fullmatch(...)
. This will only match if the
if re.fullmatch("[a-z]+", my_string):
print("Yes")
re.fullmatch("[a-z]+", "abcdef")
✅ Yesre.fullmatch("[a-z]+", "abcdef 12345")
❌ NoOne liner: bool(re.fullmatch("[a-z]+", my_string))
As stated in the comments, all these answers using re.match
implicitly matches on the start of the string. re.search
is needed if you want to generalize to the whole string.
import re
pattern = re.compile("([A-Z][0-9]+)+")
# finds match anywhere in string
bool(re.search(pattern, 'aA1A1')) # True
# matches on start of string, even though pattern does not have ^ constraint
bool(re.match(pattern, 'aA1A1')) # False
If you need the full string to exactly match the regex, see @Ali Sajjad's answer using re.fullmatch
Credit: @LondonRob and @conradkleinespel in the comments.
regular expressions make this easy ...
[A-Z]
will match exactly one character between A and Z
\d+
will match one or more digits
()
group things (and also return things... but for now just think of them grouping)
+
selects 1 or more
import re
ab = re.compile("^([A-Z]{1}[0-9]{1})+$")
ab.match(string)
I believe that should work for an uppercase, number pattern.
Just want to point out that for string without line breaks (\n
), one could also use the anchors \A
and \Z
for beginning and end of a string, respectively.
import re
pat = re.compile(r'\A([A-Z][0-9]+)+\Z')
pat.match('A1B2') # match
pat.match('A1B2a') # no match
This makes a difference if the string contains multiple lines and you want to match the pattern in latter lines in a string.
match
vs search
vs fullmatch
. Which is appropriate?re.search
is the more general of the three. As others have said, re.match()
checks for a match only at the beginning of the string. re.search()
can mimic that too by prepending \A
to whatever pattern used. On the other hand, re.fullmatch()
checks if the entire string is a match, which can again be mimicked by re.search()
by prepending \A
and appending \Z
to whatever pattern used. Below example may help illustrate this point.
# prepending \A to pattern makes `search` behave similar to `match`
s1 = '1B10L1'
pattern1 = r'([A-Z][0-9]+)+'
re.match(pattern1, s1) # no match
re.search(pattern1, s1) # match
re.search(fr"\A{pattern1}", s1) # no match <--- with \A prepended (behaves same as re.match)
# `match` only checks at the beginning and doesn't care about the end of the string
s2 = 'B10L1a'
re.match(pattern1, s2) # match
re.match(fr"{pattern1}\Z", s2) # no match <--- with \Z appended (behaves same as re.fullmatch)
re.search(fr"\A{pattern1}\Z", s2) # no match <--- have the pattern between \A and \Z to mimic fullmatch
re.fullmatch(pattern1, s2) # no match
If the string contains multiple lines and if it's flagged, then this relation breaks down: fullmatch
never scans across lines and match
scans only the first line.
# by flagging multiline, `match` and `search` match the pattern in the first line
s3 = 'B10\nL1'
pattern2 = r'^([A-Z][0-9]+)+$'
re.match(pattern2, s3, re.M) # match
re.search(pattern2, s3, re.M) # match
re.fullmatch(pattern2, s3, re.M) # no match
# the pattern is in the second line but this will not be matched by `match`
s4 = 'a\nB10'
pattern2 = r'^([A-Z][0-9]+)+$'
re.match(pattern2, s4, re.M) # no match
re.search(pattern2, s4, re.M) # match
re.search(r'\A([A-Z][0-9]+)+', s4, re.M) # no match <--- with `\A` instead of `^` it mimics `match`
If you need to search a pattern in a single string, then there's no need to compile it since re.search
, re.match
etc. all make calls to _compile
method anyway. However, if you need to search a pattern in multiple strings, then compiling it first makes a lot of difference performance-wise. For the example in the OP, pre-compiling the pattern and searching through the list is over 2 times faster than not compiling first.
from timeit import timeit
setup2 = "import re; lst = ['A1B2', 'B10L1', 'C1N200J1K1', 'a1B2', 'A10B', 'AB400']"
setup1 = setup2 + "; pat = re.compile(r'^([A-Z][0-9]+)+$')"
for _ in range(3):
t1 = timeit("list(map(pat.match, lst))", setup1) # pre-compiled
t2 = timeit("[re.match(r'^([A-Z][0-9]+)+$', x) for x in lst]", setup2)
print(t2 / t1)
# 2.083788080189313
# 2.448126223007598
# 2.43617482049811
\A
mimics match
regardless of whether there are line breaks or not. Neither \A
nor match
care about line breaks. If you want to match the pattern in later lines, you need to use ^
together with the re.M
option. –
Abject \A
and \Z
are used together, it can match a pattern in a single line string, so the pattern in that example matches 'A1B2'
but doesn't match 'A1\nB2'
. –
Pl Ali Sajjad's answer should be the default, i.e. fullmatch
to avoid false positives.
However, it's also important to know that you're always checking not None
for "yes, it's a match":
The two possibilities are therefore:
if re.fullmatch("[a-z]+", my_string)!=None:
or, as in Ali's answer:
if bool(re.fullmatch("[a-z]+", my_string)):
To my way of thinking both of these are really quite horribly unreadable. So a simple utility function is needed for readability:
def is_match(pattern, string, flags=re.IGNORECASE | re.DOTALL): # or "is_full_match", as desired
return re.fullmatch(pattern, string, flags)!=None
Those 2 flags are (usually) the most helpful default flags
settings in my experience, rather than "0".
In practice, of course, you may need to examine the Match
object delivered by re.fullmatch
. But for cases where you just need to find whether there's a match...
© 2022 - 2024 — McMap. All rights reserved.
^([A-Z]\d+){1,}$
like this? – NeoplasticismB
and not withA
. – JamieA
andB
are small letters right?A10b
andaB400
? – Kakaaba