How to prevent table regeneration in PLY
Asked Answered
P

5

9

I am using PLY in a command line application that I package as a Python egg to be installed via pip. Everytime I run my script from the command line, I see the following message:

"Generating LALR tables"

Additionally, parser.out and parsetab.py files are written to the directory from which the script is invoked. Is there some way to ship these files with the application so that it does not regenerate the tables each and every time?

Polygyny answered 28/9, 2012 at 17:47 Comment(1)
I don't know if it's possible to do this with ply. You can change the filename or directory to use, or choose to always regenerate the tables (which may not be an option, depending the size of your grammar). But to ship the parser table generated... don't know. :/Alabama
P
0

What I ultimately wound up doing was turning off optimization. I was going through the PLY 3.4 source and I found this little nugget in the lexer code:

# If in optimize mode, we write the lextab
if lextab and optimize:
    lexobj.writetab(lextab,outputdir)

return lexobj

By changing the code that builds the lexer and parser to:

self.lexer = lex.lex(module=self, optimize=False, debug=False, **kwargs)

and

self.lexer = lex.lex(module=self, optimize=False, debug=False, **kwargs)

I avoided all file write-outs. The debugger writes .out files into the directory and the Python files are the result of the optimize flag.

While this works for the time being, I cannot say I am entirely happy with this approach. Presumably, having some way to keep optimization on and, at the same time, keep the working directory clean would be a superior solution would result in better performance. If someone else has a better methodology, I am more than open to it.

Polygyny answered 7/10, 2012 at 3:55 Comment(0)
T
12

use

yacc.yacc(debug=0, write_tables=0)
Turnkey answered 13/2, 2013 at 22:28 Comment(1)
Note: you should really use debug=False, write_tables=False. The 0 just happens to work but it's more explicit to use True/False and the documentation also uses bool and not integers.Periotic
I
3

You want to use optimized mode, by calling lex as:

lexer = lex.lex(optimize=1)

.

It's worth emphasising (from the same link):

On subsequent executions, lextab.py will simply be imported to build the lexer. This approach substantially improves the startup time of the lexer and it works in Python's optimized mode.

When running in optimized mode, it is important to note that lex disables most error checking. Thus, this is really only recommended if you're sure everything is working correctly and you're ready to start releasing production code.

Since this is production code, this sounds like exactly what you want.

.

In looking into this issue, I came across the miscellaneous Yacc notes:

Since the generation of the LALR tables is relatively expensive, previously generated tables are cached and reused if possible. The decision to regenerate the tables is determined by taking an MD5 checksum of all grammar rules and precedence rules. Only in the event of a mismatch are the tables regenerated.

And looking deeper into the yacc function inside yacc.py, we see that optimise ignores this mismatch in the following snippet:

if optimize or (read_signature == signature):
    try:
        lr.bind_callables(pinfo.pdict)
        parser = LRParser(lr,pinfo.error_func)
        parse = parser.parse
        return parser

where signature is compared to checksum stored in parsetab.py (as _lr_signature).

Iquique answered 4/10, 2012 at 16:45 Comment(1)
Is there something you need to do besides setting optimize=1 in the parser and lexer? When I do that, it still regenerates the files and still prints the message.Polygyny
P
0

What I ultimately wound up doing was turning off optimization. I was going through the PLY 3.4 source and I found this little nugget in the lexer code:

# If in optimize mode, we write the lextab
if lextab and optimize:
    lexobj.writetab(lextab,outputdir)

return lexobj

By changing the code that builds the lexer and parser to:

self.lexer = lex.lex(module=self, optimize=False, debug=False, **kwargs)

and

self.lexer = lex.lex(module=self, optimize=False, debug=False, **kwargs)

I avoided all file write-outs. The debugger writes .out files into the directory and the Python files are the result of the optimize flag.

While this works for the time being, I cannot say I am entirely happy with this approach. Presumably, having some way to keep optimization on and, at the same time, keep the working directory clean would be a superior solution would result in better performance. If someone else has a better methodology, I am more than open to it.

Polygyny answered 7/10, 2012 at 3:55 Comment(0)
D
0

This is an old question, but I ran into a similar problem with ply when I tried to use the outputdir yacc keyword argument to place the generated parser tables in specific directories within my project -- it would place them there, but re-generate them every time regardless. I found this patch on github which solved the regeneration issue with no noticeable ill effects. Basically, all it does is modify the read_table method on the yacc class to take an extra parameter -- the outputdir -- and searches the directory there before re-generation. In order to make that work, the sole call site to read_table (in method yacc) also needs to be modified to pass the outputdir keyword argument.

Duda answered 14/4, 2015 at 21:13 Comment(0)
T
-1

Apparently, there are arguments for this in ply.yacc:

def yacc(method='LALR', debug=yaccdebug, module=None, tabmodule=tab_module, start=None, 
     check_recursion=1, optimize=0, write_tables=1, debugfile=debug_file,outputdir='',
     debuglog=None, errorlog = None, picklefile=None):

So, you just pass a different errorlog and debuglog (with a debug() etc. methods that do not print to stdout/stderr). And you specify a fixed outputdir. And that's all you need to do.

UPDATE: I just checked and this is the correct setting:

yacc.yacc(
    debug=False,                         # do not create parser.out
    outputdir=r"c:\temp\aaa" # instruct to place parsetab here
)

Actually you need to use an outputdir that already contains parsetab.py. This will eliminate not just the message but your program will not write out parsetab.py. It will just use it.

Tailpipe answered 3/10, 2012 at 14:25 Comment(4)
The OP is not asking to simply not showing the message, but not regenerate the temporary parser-tables files all the time.Alabama
I guess that when you use a fixed dir and you already place the files there, then they won't be regenerated. But I'll try and let you know the results.Tailpipe
debug=False does indeed suppress the error message, but does not suppress the file generation. When I use it in conjunction with the deployment location (where the files already exist) it still attempts to overwrite them.Polygyny
If you want to disable file generation completely then overwrite the LRGeneratedTable.write_table method (e.g. make it a method that does nothing.) Alternatively, you can use the picklefile parameter (passing a file like object) and overwrite LRGeneratedTable.pickle_table to do nothing. Unfortunately, the usage of LRGeneratedTable class is hard wired into yacc.yacc, so you cannot subclass that. :-(Tailpipe

© 2022 - 2024 — McMap. All rights reserved.