RegEx with variable data in it - ply.lex
Asked Answered
C

3

3

im using the python module ply.lex to write a lexer. I got some of my tokens specified with regular expression but now im stuck. I've a list of Keywords who should be a token. data is a list with about 1000 Keywords which should be all recognised as one sort of Keyword. This can be for example: _Function1 _UDFType2 and so on. All words in the list are separated by whitespaces thats it. I just want that lexer to recognise the words in this list, so that it would return a token of type `KEYWORD.

data = 'Keyword1 Keyword2 Keyword3 Keyword4'
def t_KEYWORD(t):
    # ... r'\$' + data ??
    return t

text = '''
Some test data


even more

$var = 2231




$[]Test this 2.31 + / &
'''

autoit = lex.lex()
autoit.input(text)
while True:
    tok = autoit.token()
    if not tok: break
    print(tok)

So i was trying to add the variable to that regex, but it didnt work. I'm always gettin: No regular expression defined for rule 't_KEYWORD'.

Thank you in advance! John

Cosby answered 31/8, 2012 at 14:50 Comment(8)
What code did you use to add it to the regex (show the code that actually raises the error)Susurrate
I still don't follow. That line is commented out. Can you show an example that actually throws that error?Susurrate
well just use my code from above or here: data = 'Keyword1 Keyword2 Keyword3 Keyword4' def t_KEYWORD(t): r'\$' + data return tCosby
That code doesn't throw an exception. Where is the line that actually uses the t_KEYWORD in the regex?Susurrate
Thats all i get: ERROR: /Users/John/Lexer/lexer.py:21: No regular expression defined for rule 't_KEYWORD' Traceback (most recent call last): File "/Users/John/Lexer/lexer.py", line 77, in <module> autoit = lex.lex() File "/Library/Frameworks/Python.framework/Versions/3.0/lib/python3.0/site-packages/ply-3.4-py3.0.egg/ply/lex.py", line 894, in lex raise SyntaxError("Can't build lexer") SyntaxError: Can't build lexerCosby
Where is the line autoit = lex.lex() that is throwing the traceback? It's not in the code that's provided. The code you provide is just defining a function and never actually does anything with regular expressions or ply.lexSusurrate
Okay, let's back up a second. ply already has a decorator -- TOKEN -- to do some of the docstring magic people are suggesting. See here, for example. But I'm not sure if you want to construct 4 separate tokens and have each of them recognized separately (which this wouldn't do anyway), or if you have one keyword with four variations, or what. Could you edit your post to be a little more specific?Damico
Okay, edited my post above. Hope that makes it more understandable.Cosby
Y
3

As @DSM suggests you can use the TOKEN decorator. The regular expression to find tokens like cat or dog is 'cat|dog' (that is, words separated by '|' rather than a space). So try:

from ply.lex import TOKEN
data = data.split() #make data a list of keywords

@TOKEN('|'.join(data))
def t_KEYWORD(t):
    return t
Yurev answered 31/8, 2012 at 17:3 Comment(0)
Y
2

ply.lex uses the docstring for the regular expression. Notice the order which you define tokens defines their precedence, which this is usually important to manage.

.

The docstring at the top cannot be an expression, so you need to do this token definition by token definition.

We can test this in the interpreter:

def f():
    "this is " + "my help"  #not a docstring :(
f.func_doc #is None
f.func_doc = "this is " + "my help" #now it is!

Hence this ought to work:

def t_KEYWORD(token):
    return token
t_KEYWORD.func_doc=r'REGULAR EXPRESSION HERE' #can be an expression
Yurev answered 31/8, 2012 at 14:53 Comment(5)
Could you define an empty function, and modify f.__doc__ = 'my' + 'regex' ?Baggs
hm i've about 1000 keywords, its impossible to do token definition by token definition.Cosby
@JohnSmith updated with a possible fix! This should work, please give it a try.Yurev
I've tried: def t_KEYWORD(t): return t t_KEYWORD.func_doc=r'\d+' but still the some errorCosby
@JohnSmith it seems like ply.lex reads the docstring immediately, possibly even at runtime: but I asked this question.Yurev
N
0

Not sure if this works with ply, but the docstring is the __doc__ attribute of a function so if you write a decorator that takes a string expression and sets that to the __doc__ attribute of the function ply might use that.

News answered 31/8, 2012 at 15:16 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.