ANTLR V4 + Java8 Grammar -> OutOfMemoryException
Asked Answered
A

1

5

I'm trying to use ANTLR V4 with the publicly given Java 8 grammar - https://github.com/antlr/grammars-v4/blob/master/java8/Java8.g4

I generated the class files and tried to parse the Java 8 JRE, but somehow at java.text.SimpleDateFormat.java it crashes with:

java.lang.OutOfMemoryError: GC overhead limit exceeded

It also crashes, when I'm trying to parse that single file alone.

Can this be solved somehow? Obviously ANTLR V4 can't handle files with more than 2000 LOC? Is that a correct assumption?

What I've done so far:

  • Changing assigned memory to JVM in multiple steps from 256MB up to 4GB - it then changes to

    java.lang.OutOfMemoryError: Java heap space

  • To ensure that there is no syntactical problem with the input-file
    At first I removed the first half of the file -> parsing seems okay,
    then undid that action and removed the second half of the file -> parsing seems okay

Ary answered 2/10, 2015 at 14:55 Comment(9)
… and you increased the java memory size e.g. using -Xmx?Killough
Hi, Petesh! I've done that before - I've updated my question... Thx!Ary
Comment successively parts out to find the problematic code.Devaluation
Hi Joop Eggen... I've already REMOVED content from that file... There's no problematic code in SimpleDateFormat.java, because with less than ~1900-2000 LOC it succeedsAry
"Obviously ANTLR V4 can't handle files with more than 2000 LOC? Is that a correct assumption?" Assumptions are never correct. Did you try on other large source files as well? I'm sure most of the collection classes such as HashMap will be perfectly valid test subjects.Matos
Hi @Gimby! No, because I don't have a grammar that is as complex as java 8 grammar with an test input file for the given grammar by hand... I'll figure it out this weekend -> thanks for your input!Ary
well I just ran it on my machine using java -Xmx4096m -cp .:antlr-4.5.1-complete.jar org.antlr.v4.gui.TestRig Java8 compilationUnit SimpleDateFormat.java and it finished without issue. 2048 exceeded the GC limit. It could just be the harness is a pig.Killough
@Petesh: How long does it took the parsing action to finish?Ary
time of the command yielded 42.01 real 163.79 user 5.18 sys; OSX, java 1.8.0_51. This is a 3 year old macbook pro; so nothing really special.Killough
T
8

It looks like the grammar in that repository is based on one I wrote. The grammar relies on certain functionality which is only available in my "optimized" fork of ANTLR 4 in order to perform well. In addition to using that release, you'll need to do the following two things to maximize performance:

  1. Use the two-stage parsing strategy. Assuming your start rule is called compilationUnit, it might look like the following:

    CompilationUnitContext compilationUnit;
    try {
      // Stage 1: High-speed parsing for correct documents
    
      parser.setErrorHandler(new BailErrorStrategy());
      parser.getInterpreter().setPredictionMode(PredictionMode.SLL);
      parser.getInterpreter().tail_call_preserves_sll = false;
      compilationUnit = parser.compilationUnit();
    } catch (ParseCancellationException e) {
      // Stage 2: High-accuracy fallback parsing for complex and/or erroneous documents
    
      // TODO: reset your input stream
      parser.setErrorHandler(new DefaultErrorStrategy());
      parser.getInterpreter().setPredictionMode(PredictionMode.LL);
      parser.getInterpreter().tail_call_preserves_sll = false;
      parser.getInterpreter().enable_global_context_dfa = true;
      compilationUnit = parser.compilationUnit();
    }
    
  2. Enable the global context DFA (I included this in the previous code block so you can't miss it)

    parser.getInterpreter().enable_global_context_dfa = true;
    
Tyrosine answered 3/10, 2015 at 2:53 Comment(6)
Hi, Sam! Thank you for your answer - it totally hits the nail :)Ary
Hi @SamHarwell, I also have a file that causes the GC memory bailout above. Is there any chance that the functionality for this fix will make it into the main antlr distribution? Or would/do you distribute a .jar for your fork? (As I'm not a native of the java world. I don't have the toolsets to build your fork.)Bracey
@user138304 is this "optimized" fork integrated into ANTLR 4.6?Bracey
@Bracey See here for more info: github.com/tunnelvisionlabs/antlr4/blob/master/doc/…Tyrosine
@Bracey My fork of ANTLR is available through Maven Central using the group ID com.tunnelvisionlabs.Tyrosine
Hi @SamHarwell, I'm having the same issue, but with Javascript as the target language, instead of Java. Is there any way to do the two-stage parsing in Javascript?Hyperion

© 2022 - 2024 — McMap. All rights reserved.