How to use regular expressions do reverse search?
Asked Answered
T

4

16

For example:
My string is: 123456789 nn nn oo nn nn mlm nn203.
My target is: nn.

Then, I match string from the end to the beginning and return the first match result and its postion.
In this examlpe, the result is nn start in [-5] end in [-3].
I wrote the simple funcitonto do this process, but how to use regular expressions to do this job?

Tenerife answered 12/5, 2013 at 17:13 Comment(1)
Hm, if the search term was 'na', would you like it to match forward or reverse in the string (ie match man or name)?Anjanette
L
22

For the string itself, just do a findall and use the last one:

import re

st='123456 nn1 nn2 nn3 nn4 mlm nn5 mlm'
 
print(re.findall(r'(nn\d+)',st)[-1])

Prints nn5

You can also do the same thing using finditer which makes it easier finding the relevant indexes:

print([(m.group(),m.start(),m.end()) for m in re.finditer(r'(nn\d+)',st)][-1])

Prints ('nn5', 27, 30)


If you have a lot of matches and you only want the last, sometimes it makes sense to simply reverse the string and pattern:

m=re.search(r'(\d+nn)',st[::-1])
offset=m.start(1)
print(st[-m.start(1)-len(m.group(1)):-m.start(1)])

Or, modify your pattern into something that only the last match could possible satisfy:

# since fixed width, you can use a lookbehind:
m=re.search(r'(...(?<=nn\d)(?!.*nn\d))',st)
if m: print(m.group(1))

Or, take advantage of the greediness of .* which will always return the last of multiple matches:

# .* will skip to the last match of nn\d
m=re.search(r'.*(nn\d)', st)
if m: print(m.group(1))

Any of those prints nn5

Lamarckism answered 12/5, 2013 at 17:35 Comment(3)
Here's something similar to your code that many might consider more readable: pastebin.com/J7SsXjsS (Note that search does exist after the loop is finished.)Queridas
In that link I gave, you'll get an error if you don't get any results, though (so be sure to handle it).Queridas
reversing the string was the idea I neededErythritol
P
7

First, if you're not looking for a regular expression, string.rfind is a lot easier to get right.

You can use a regular expression by using a negative lookahead, see the documentation of re:

import re
s = "123456789 nn nn oo nn nn mlm nn203"
match = re.search("(nn)(?!.*nn.*)", s)

# for your negative numbers:
print (match.start()-len(s), match.end()-len(s))
# (-5, -3)
Philologian answered 12/5, 2013 at 17:24 Comment(1)
I'd recommend the use of re.DOTALL flag here, as . (dot character), by default, doesn't include newline characters.Merrygoround
S
5

Idea:

  • find reversed regexp (in your case irrelevant) in reversed string
  • resulting indexes convert to negative numbers + switch start<->end

Example:

>>> import re
>>> s = "123456789 nn nn oo nn nn mlm nn203"
>>> m = re.search("(nn)", s[::-1])
>>> -m.end(), -m.start()
(-5, -3)
Selle answered 12/5, 2013 at 18:12 Comment(0)
M
0

In Python the answer is rfind it works not only on strings, but on bytes too!

Regexp has also two special symbols: ^ and $.

 tx = "hello .... ok"

 # ^ forces the search from beginning
 /^.*[o].*/   will find in hello

 # $ forces the search at the END 
 /.*[o].*$/   will match from ok

Hope it helps!

Mord answered 15/5 at 18:46 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.