Slow ANTLR4 generated Parser in Python, but fast in Java
Asked Answered
I

4

9

I am trying to convert ant ANTLR3 grammar to an ANTLR4 grammar, in order to use it with the antlr4-python2-runtime. This grammar is a C/C++ fuzzy parser.

After converting it (basically removing tree operators and semantic/syntactic predicates), I generated the Python2 files using:

java -jar antlr4.5-complete.jar -Dlanguage=Python2 CPPGrammar.g4

And the code is generated without any error, so I import it in my python project (I'm using PyCharm) to make some tests:

import sys, time
from antlr4 import *
from parser.CPPGrammarLexer import CPPGrammarLexer
from parser.CPPGrammarParser import CPPGrammarParser

currenttimemillis = lambda: int(round(time.time() * 1000))

def is_string(object):
    return isinstance(object,str)

def parsecommandstringline(argv):
    if(2!=len(argv)):
        raise IndexError("Invalid args size.")
    if(is_string(argv[1])):
        return True
    else:
        raise TypeError("Argument must be str type.")

def doparsing(argv):
    if parsecommandstringline(argv):
        print("Arguments: OK - {0}".format(argv[1]))
        input = FileStream(argv[1])
        lexer = CPPGrammarLexer(input)
        stream = CommonTokenStream(lexer)
        parser = CPPGrammarParser(stream)
        print("*** Parser: START ***")
        start = currenttimemillis()
        tree = parser.code()
        print("*** Parser: END *** - {0} ms.".format(currenttimemillis()-start))
        pass

def main(argv):
    tree = doparsing(argv)
    pass

if __name__ == '__main__':
    main(sys.argv)

The problem is that the parsing is very slow. With a file containing ~200 lines it takes more than 5 minutes to complete, while the parsing of the same file in antlrworks only takes 1-2 seconds. Analyzing the antlrworks tree, I noticed that the expr rule and all of its descendants are called very often and I think that I need to simplify/change these rules to make the parser operate faster: expr_tree

Is my assumption correct or did I make some mistake while converting the grammar? What can be done to make parsing as fast as on antlrworks?

UPDATE: I exported the same grammar to Java and it only took 795ms to complete the parsing. The problem seems more related to python implementation than to the grammar itself. Is there anything that can be done to speed up Python parsing?
I've read here that python can be 20-30 times slower than java, but in my case python is ~400 times slower!

Ingeborg answered 16/7, 2015 at 13:24 Comment(3)
Have to profile rule execution times to have any certainty. Could be the heavy use of negated sets, literals in the parser, or something else that appears completely benign.Dishonorable
@Dishonorable thank you for commenting. I'm not an ANTLR expert, but doesn't seem to me that my grammar and the original one have a lot of negated sets or literals in the parser. I think it's an error related to antlr4-python2-runtime because it only takes 1 second to parse the same file on java. Python can be slower, but 400 times slower is too much to think it's a problem on my side.Ingeborg
Still, the best way to identify the aspect of the run-time that is not performant is to profile the individual rules and identify particular rule aspect that is slow to process. The problem is on your side only in the sense that your grammar is doing something to trigger the slow down. Almost undoubtedly a change to the runtime will be required. The hard part is figuring out what to fix. The key, luckily, is somewhere in your grammar. Do what you can to isolate the cause and create an issue on the Antlr Github repo. That is the fastest way to get it fixed.Dishonorable
T
10

I confirm that the Python 2 and Python 3 runtimes have performance issues. With a few patches, I got a 10x speedup on the python3 runtime (~5 seconds down to ~400 ms). https://github.com/antlr/antlr4/pull/1010

Triviality answered 5/10, 2015 at 6:29 Comment(2)
The pull request has been accepted : use the last antlr4 python runtime or wait for the 4.5.3 releases on pypi…Triviality
Please post an answer to my question (for I am giving away 50 bounty points). #53653823Unskilled
F
4

I faced a similar problem so I decided to bump this old post with a possible solution. My grammar ran instantly with the TestRig but was incredibly slow on Python 3.

In my case the fault was the non-greedy token that I was using to produce one line comments (double slash in C/C++, '%' in my case):

TKCOMM : '%' ~[\r\n]* -> skip ;

This is somewhat backed by this post from sharwell in this discussion here: https://github.com/antlr/antlr4/issues/658

When performance is a concern, avoid using non-greedy operators, especially in parser rules.

To test this scenario you may want to remove non-greedy rules/tokens from your grammar.

Foulness answered 2/10, 2015 at 14:1 Comment(2)
And what did you use to have a better comment rule?Lisk
@Lisk Right now I'm pre-processing and removing comments in a separate routine without ANTLR. I'm using this as a workaround until the performance patch cited on this thread gets to pip.Foulness
E
4

Posting here since it may be useful to people that find this thread.

Since this was posted, there have been several performance improvements to Antlr's Python target. That said, the Python interpreter will be intrinsically slower than Java or other compiled languages.

I've put together a Python accelerator code generator for Antlr's Python3 target. It uses Antlr C++ target as a Python extension. Lexing & parsing is done exclusively in C++, and then an auto-generated visitor is used to re-build the resulting parse tree in Python. Initial tests show a 5x-25x speedup depending on the grammar and input, and I have a few ideas on how to improve it further.

Here is the code-generator tool: https://github.com/amykyta3/speedy-antlr-tool

And here is a fully-functional example: https://github.com/amykyta3/speedy-antlr-example

Hope this is useful to those who prefer using Antlr in Python!

Eniwetok answered 9/1, 2020 at 15:46 Comment(2)
I've upvoted this post a long time ago but managed to get time to get it up and running only today. I've put my hopes in this tool and it paid back big time! I now have the time drop from 7 minutes to 3 minutes for 8000 parsed files thanks to your tool. I'll see if I can push it even further. I did find a couple of bugs for which I'll be posting pull requests (hopefully soon). Cheers!Selfish
Thanks!! Glad to hear it is working well for you!Eniwetok
C
0

I use ANTLR in python3 target these days. And a file with 500~ lines just take about less than 20 sec to parse. So turning to Python3 target might help

Cosmo answered 18/12, 2020 at 9:16 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.