With re.findall
, you can convert the output into an iterator with iter()
and call next()
on it to get the first result. next()
is particularly useful for this task because a default value (e.g. ''
) can be passed to it; the default is returned if the iterator is empty, i.e. if there are no matches.
next(iter(re.findall('\d+', 'aa33bbb44')), '') # '33'
next(iter(re.findall('\d+', 'aazzzbbb')), '') # ''
At this point, next()
can used with re.finditer
for the job as well.
next(re.finditer('\d+', 'aa33bbb44'), [''])[0] # '33'
next(re.finditer('\d+', 'aazzzbbb'), [''])[0] # ''
You can also use the walrus operator with re.search
for a one-liner.
m[0] if (m:=re.search('\d+', 'aa33bbb44')) else '' # '33'
m[0] if (m:=re.search('\d+', 'aazzzbbb')) else '' # ''
For this specific task, the argument against re.findall
is performance and, indeed for large strings, the gap is huge. If there are multiple matches, re.findall
is much, much slower than re.search
or re.finditer
1. However, if there are no matches, re.search
with the walrus and re.finditer
are the fastest.2.
1 Timings for strings with 1mil characters and 100k matches.
text = 'aabbbccc11'*100_000
%timeit m[0] if (m:=re.search('\d+', text)) else ''
# 1.94 µs ± 192 ns per loop (mean ± std. dev. of 10 runs, 100,000 loops each)
%timeit next(re.finditer('\d+', text), [''])[0]
# 2.38 µs ± 122 ns per loop (mean ± std. dev. of 10 runs, 100,000 loops each)
%timeit next(iter(re.findall('\d+', text)), '')
# 59 ms ± 8.65 ms per loop (mean ± std. dev. of 10 runs, 10 loops each)
%timeit re.search('\d+|$', text)[0]
# 2.32 µs ± 300 ns per loop (mean ± std. dev. of 10 runs, 100,000 loops each)
%timeit re.findall('\d+|$', text)[0]
# 82.7 ms ± 1.64 ms per loop (mean ± std. dev. of 10 runs, 10 loops each)
2 Timings for strings with 1mil characters and no matches.
text = 'aabbbcccdd'*100000
%timeit m[0] if (m:=re.search('\d+', text)) else ''
# 26.3 ms ± 662 µs per loop (mean ± std. dev. of 10 runs, 100 loops each)
%timeit next(re.finditer('\d+', text), [''])[0]
# 26 ms ± 195 µs per loop (mean ± std. dev. of 10 runs, 100 loops each)
%timeit next(iter(re.findall('\d+', text)), '')
# 26.2 ms ± 615 µs per loop (mean ± std. dev. of 10 runs, 100 loops each)
%timeit re.search('\d+|$', text)[0]
# 72.9 ms ± 14.1 ms per loop (mean ± std. dev. of 10 runs, 100 loops each)
%timeit re.findall('\d+|$', text)[0]
# 67.8 ms ± 2.38 ms per loop (mean ± std. dev. of 10 runs, 100 loops each)
len(re.findAll)==0
check instead. – Motionless