I am using ply and have noticed a strange discrepancy between the token re match stored in t.lex.lexmatch, as compared with an sre_pattern defined in the usual way with the re module. The group(x)'s seem to be off by 1.
I have defined a simple lexer to illustrate the behavior I am seeing:
import ply.lex as lex
tokens = ('CHAR',)
def t_CHAR(t):
r'.'
t.value = t.lexer.lexmatch
return t
l = lex.lex()
(I get a warning about t_error but ignore it for now.) Now I feed some input into the lexer and get a token:
l.input('hello')
l.token()
I get a LexToken(CHAR,<_sre.SRE_Match object at 0x100fb1eb8>,1,0)
. I want to look a the match object:
m = _.value
So now I look at the groups:
m.group()
=> 'h'
as I expect.
m.group(0)
=> 'h'
as I expect.
m.group(1)
=> 'h'
, yet I would expect it to not have such a group.
Compare this to creating such a regular expression manually:
import re
p = re.compile(r'.')
m2 = p.match('hello')
This gives different groups:
m2.group()
= 'h'
as I expect.
m2.group(0)
= 'h'
as I expect.
m2.group(1)
gives IndexError: no such group
as I expect.
Does anyone know why this discrepancy exists?