Check if string matches pattern
Asked Answered
M

10

548

How do I check if a string matches the following pattern?

Uppercase letter, number(s), uppercase letter, number(s)...

Example:

  • These would match:
    A1B2
    B10L1
    C1N200J1
    
  • These wouldn't ('^' points to problem)
    a1B2
    ^
    A10B
       ^
    AB400
    ^
    
Merralee answered 26/9, 2012 at 5:27 Comment(5)
could you please explain more why it is a problem?Kakaaba
^([A-Z]\d+){1,}$ like this?Neoplasticism
In your third example, the problem should be with B and not with A.Jamie
maybe it's a typo error on the problem. both A and B are small letters right? A10b and aB400?Kakaaba
@Burhan, The problem is with A because B has numbers next to it and A doesn'tMerralee
T
702
import re
pattern = re.compile("^([A-Z][0-9]+)+$")
pattern.match(string)
Teevens answered 26/9, 2012 at 5:30 Comment(5)
From the docs on re.match: If zero or more characters at the beginning of string match the regular expression pattern. I just spent like 30 minutes trying to understand why I couldn't match something at the end of a string. Seems like it's not possible with match, is it? For that, re.search(pattern, my_string) works though.Kilocalorie
@conradk Yes, you're right, I think there's something like an implied ^ at the beginning when you use match. I think it's a bit more complicated then that very simple explanation, but I'm not clear. You are correct that it does start from the beginning of the string though.Teevens
I edited your answer, because it only makes sense with search() in this context.Murine
Yes, but that's what the questioner wants. I'm not sure what you mean by "only makes sense with search()". It works perfectly fine with match.Teevens
To be clear: You probably want to check if pattern.match returns something; luckily "None" is truthy, so you can just do "if pattern.match:"Stantonstanway
B
388

One-liner: re.match(r"pattern", string) # No need to compile

import re
>>> if re.match(r"hello[0-9]+", 'hello1'):
...     print('Yes')
... 
Yes

You can evaluate it as bool if needed

>>> bool(re.match(r"hello[0-9]+", 'hello1'))
True
Banks answered 13/7, 2016 at 3:9 Comment(7)
That's weird. Why can you use re.match in the context of an if, but you have to use bool if you're using it elsewhere?Trevelyan
Careful with re.match. It only matches at the start of a string. Have a look at re.search instead.Trevelyan
@Trevelyan probably because if checks for the match not being None.Hypothalamus
There's a big need to compile to make sure there are no errors in the regular expressions like bad character range errorsCarper
@SuhFangmbeng Compilation is useful when the same re is used in more than one places to improve efficiency. In terms of error .match would throw the same error what .compile does. It's perfectly safe to use.Banks
@Banks actually all of the regex functions in re module compile and cache the patterns. Therefore there is absolutely no efficiency gain using compile and then match than just directly calling re.match. All of these functions call the internal function _compile (including re.compile) which does the caching to a python dictionary.Carbide
If you're in python 3.8+ you can use the walrus operator if you need to access the match object: if (match := re.match(r"hello([0-9]+)", string)): print(match[1])Nikianikita
F
58

Please try the following:

import re

name = ["A1B1", "djdd", "B2C4", "C2H2", "jdoi","1A4V"]

# Match names.
for element in name:
     m = re.match("(^[A-Z]\d[A-Z]\d)", element)
     if m:
        print(m.groups())
Fill answered 12/2, 2015 at 10:36 Comment(2)
This is the only case that returns the match which is required for getting groups. Best answer in my opinion.Baur
best answer among other answersLennalennard
M
31
import re
import sys

prog = re.compile('([A-Z]\d+)+')

while True:
  line = sys.stdin.readline()
  if not line: break

  if prog.match(line):
    print 'matched'
  else:
    print 'not matched'
Milliliter answered 26/9, 2012 at 5:31 Comment(0)
G
24

Careful! (Maybe you want to check if FULL string matches)

The re.match(...) will not work if you want to match the full string.

For example;

  • re.match("[a-z]+", "abcdef") ✅ will give a match
  • But! re.match("[a-z]+", "abcdef 12345") ✅ will also give a match because there is a part in string which matches (maybe you don't want that when you're checking if the entire string is valid or not)

Solution

Use re.fullmatch(...). This will only match if the

if re.fullmatch("[a-z]+", my_string):
    print("Yes")
Example
  • re.fullmatch("[a-z]+", "abcdef") ✅ Yes
  • re.fullmatch("[a-z]+", "abcdef 12345") ❌ No

One liner: bool(re.fullmatch("[a-z]+", my_string))

Germany answered 18/8, 2022 at 7:19 Comment(1)
Thanks. This is exactly what I wanted to see for "check if string matches"Superelevation
T
22

As stated in the comments, all these answers using re.match implicitly matches on the start of the string. re.search is needed if you want to generalize to the whole string.

import re

pattern = re.compile("([A-Z][0-9]+)+")

# finds match anywhere in string
bool(re.search(pattern, 'aA1A1'))  # True

# matches on start of string, even though pattern does not have ^ constraint
bool(re.match(pattern, 'aA1A1'))  # False

If you need the full string to exactly match the regex, see @Ali Sajjad's answer using re.fullmatch

Credit: @LondonRob and @conradkleinespel in the comments.

Tinner answered 21/9, 2021 at 14:46 Comment(0)
M
13

regular expressions make this easy ...

[A-Z] will match exactly one character between A and Z

\d+ will match one or more digits

() group things (and also return things... but for now just think of them grouping)

+ selects 1 or more

Mandolin answered 26/9, 2012 at 5:35 Comment(0)
W
12
  
import re

ab = re.compile("^([A-Z]{1}[0-9]{1})+$")
ab.match(string)
  


I believe that should work for an uppercase, number pattern.

Whatley answered 26/9, 2012 at 6:10 Comment(0)
P
1

Just want to point out that for string without line breaks (\n), one could also use the anchors \A and \Z for beginning and end of a string, respectively.

import re
pat = re.compile(r'\A([A-Z][0-9]+)+\Z')
pat.match('A1B2')   # match
pat.match('A1B2a')  # no match

This makes a difference if the string contains multiple lines and you want to match the pattern in latter lines in a string.

match vs search vs fullmatch. Which is appropriate?

re.search is the more general of the three. As others have said, re.match() checks for a match only at the beginning of the string. re.search() can mimic that too by prepending \A to whatever pattern used. On the other hand, re.fullmatch() checks if the entire string is a match, which can again be mimicked by re.search() by prepending \A and appending \Z to whatever pattern used. Below example may help illustrate this point.

# prepending \A to pattern makes `search` behave similar to `match`
s1 = '1B10L1'
pattern1 = r'([A-Z][0-9]+)+'

re.match(pattern1, s1)             # no match
re.search(pattern1, s1)            # match
re.search(fr"\A{pattern1}", s1)    # no match     <--- with \A prepended (behaves same as re.match)


# `match` only checks at the beginning and doesn't care about the end of the string
s2 = 'B10L1a'

re.match(pattern1, s2)             # match
re.match(fr"{pattern1}\Z", s2)     # no match     <--- with \Z appended (behaves same as re.fullmatch)
re.search(fr"\A{pattern1}\Z", s2)  # no match     <--- have the pattern between \A and \Z to mimic fullmatch
re.fullmatch(pattern1, s2)         # no match

If the string contains multiple lines and if it's flagged, then this relation breaks down: fullmatch never scans across lines and match scans only the first line.

# by flagging multiline, `match` and `search` match the pattern in the first line
s3 = 'B10\nL1'
pattern2 = r'^([A-Z][0-9]+)+$'
re.match(pattern2, s3, re.M)       # match
re.search(pattern2, s3, re.M)      # match
re.fullmatch(pattern2, s3, re.M)   # no match

# the pattern is in the second line but this will not be matched by `match`
s4 = 'a\nB10'
pattern2 = r'^([A-Z][0-9]+)+$'

re.match(pattern2, s4, re.M)              # no match
re.search(pattern2, s4, re.M)             # match
re.search(r'\A([A-Z][0-9]+)+', s4, re.M)  # no match  <--- with `\A` instead of `^` it mimics `match`

To compile or not to compile

If you need to search a pattern in a single string, then there's no need to compile it since re.search, re.match etc. all make calls to _compile method anyway. However, if you need to search a pattern in multiple strings, then compiling it first makes a lot of difference performance-wise. For the example in the OP, pre-compiling the pattern and searching through the list is over 2 times faster than not compiling first.

from timeit import timeit
setup2 = "import re; lst = ['A1B2', 'B10L1', 'C1N200J1K1', 'a1B2', 'A10B', 'AB400']"
setup1 = setup2 + "; pat = re.compile(r'^([A-Z][0-9]+)+$')"

for _ in range(3):
    t1 = timeit("list(map(pat.match, lst))", setup1)         # pre-compiled
    t2 = timeit("[re.match(r'^([A-Z][0-9]+)+$', x) for x in lst]", setup2)
    print(t2 / t1)
    
# 2.083788080189313
# 2.448126223007598
# 2.43617482049811
Pl answered 3/6, 2023 at 2:47 Comment(2)
I find the beginning of your answer confusing. According to this answer, \A mimics match regardless of whether there are line breaks or not. Neither \A nor match care about line breaks. If you want to match the pattern in later lines, you need to use ^ together with the re.M option.Abject
@DonaldDuck I merely meant to say that when \A and \Z are used together, it can match a pattern in a single line string, so the pattern in that example matches 'A1B2' but doesn't match 'A1\nB2'.Pl
M
0

Ali Sajjad's answer should be the default, i.e. fullmatch to avoid false positives.

However, it's also important to know that you're always checking not None for "yes, it's a match":

The two possibilities are therefore:

if re.fullmatch("[a-z]+", my_string)!=None:

or, as in Ali's answer:

if bool(re.fullmatch("[a-z]+", my_string)):

To my way of thinking both of these are really quite horribly unreadable. So a simple utility function is needed for readability:

def is_match(pattern, string, flags=re.IGNORECASE | re.DOTALL): # or "is_full_match", as desired
    return re.fullmatch(pattern, string, flags)!=None

Those 2 flags are (usually) the most helpful default flags settings in my experience, rather than "0".

In practice, of course, you may need to examine the Match object delivered by re.fullmatch. But for cases where you just need to find whether there's a match...

Mallissa answered 9/3, 2023 at 8:19 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.