Regex with lookbehind not working using re.match
Asked Answered
K

1

7

The following python code:

import re

line="http://google.com"
procLine = re.match(r'(?<=http).*', line)
if procLine.group() == "":
    print(line + ": did not match regex")
else:
    print(procLine.group())

does not match successfully, and outputs the following error:

Traceback (most recent call last): File "C:/Users/myUser/Documents/myScript.py", line 5, in if procLine.group() == "": AttributeError: 'NoneType' object has no attribute 'group'

When I replace the regex with just .* it works fine which suggests it's the regex that is in error, however, on https://regex101.com/ when I test my regex and string for python flavor it appears to match fine.

Any ideas?

Kinross answered 30/9, 2017 at 10:20 Comment(1)
You might want to use search instead, check documentation: "Note that patterns which start with positive lookbehind assertions will not match at the beginning of the string being searched; you will most likely want to use the search() function rather than the match() function"Veloz
K
10

If you convert your lookbehind to a non-capturing group, this should work:

In [7]: re.match(r'(?:http://)(.*)', line)
Out[7]: <_sre.SRE_Match object; span=(0, 17), match='http://google.com'>

In [8]: _.group(1)
Out[8]: 'google.com'

The reason a lookbeind does not work is because - as Rawing mentioned - re.match starts looking from the start of the string, so a lookbehind at the start of a string does not make sense.


If you insist on using a lookbehind, switch to re.search:

In [10]: re.search(r'(?<=http://).*', line)
Out[10]: <_sre.SRE_Match object; span=(7, 17), match='google.com'>

In [11]: _.group()
Out[11]: 'google.com'
Knotty answered 30/9, 2017 at 10:25 Comment(7)
Thanks, this did work, and I will use this as a workaround, but I will hold off marking it as correct for now to see if someone knows why lookbehinds seem to be failing me.Kinross
@LostCrotchet It's because match applies the regex at the start of the string. A lookbehind at the start of the string will never work.Tabret
@Rawing It was my suspicion, but I didn't want to write that without being sure. Let me know if it's okay to add that in, or else you can create an answer.Knotty
@cᴏʟᴅsᴘᴇᴇᴅ Go ahead. I think whenever someone posts a comment instead of an answer, they're implicitly giving you permission to s̶t̶e̶a̶l̶ use it :)Tabret
@LostCrotchet Pinging you to see this post.Knotty
do you know how I can use lookbehinds then?Kinross
@LostCrotchet Sure, use re.search instead ;-) Edited answer.Knotty

© 2022 - 2024 — McMap. All rights reserved.