Can we define a non context-free grammar with ANTLR?
Asked Answered
S

1

4

I'm pretty new to ANTLR4 and now I'm trying to undertand which kind of grammars we might define with it.

As far as I got, there're two kind of rules in ANTLR: parser rules (lower case words) and lexer rules (upper-case words). Example:

grammar Test;

init: prog(','prog)*;

prog: A
     | prog
     ;

A: [a-z]+;

Form the grammar production rule standpoint I would say that parser rules are NON-TERMINAL symbols which can be replaced with a sequence of tokens defined by a lexer rules.

So, it's perfectly clear that the grammar is context-free by the definition. The alpahbet of the language generated by the grammar consists of all words making up from lower-case latin letter.

Question: Can we define a non context-free grammar using ANTLR4?

Skyjack answered 25/2, 2016 at 14:55 Comment(0)
C
6

YES. (cough).

It is my understanding that you can add code to the rules. Arbitrary code can test arbitrary things, so the answer is "yes". In general, I don't think you can do this well with ANTLR, but this is pretty practical for lots of interesting special cases (e.g., accept all digit strings except those that are prime numbers).

NO.

I think if you stick to the the syntax specification allowed by ANTLR, the answer is "no". In fact, there are context-free grammars that you can "specify" with ANTLR that it cannot process correctly, which is true of most parser generators. (For ANTLR, this includes grammars with indirect left recursion, ambiguity, arbitrary lookahead, etc.) We even call most of these parser generators by the names of their "limitations", e.g., LL(1), LALR(k), etc.

Which ones can do full context free?

A few parser generators can handle full-, context free grammars. Earley and CYK parsers come to mind, but they aren't very fast so people tend to avoid using them. GLR parsers can do this (we use this in our tools because it really aids writing grammars for real languages [see my bio] but there are some grammars that make them pretty slow; you can mostly avoid these. Apparently GLL parsing schemes exist and are also full context free; I'd expect them to have performance troubles with some obtuse grammars too, but also to be pretty usable in practice.

The only parser generator I've heard of which can do a variety of context-sensitive grammars is MetaS. I've never used it, but the theory behind it is pretty impressive. The claim is that it can do arbitrary context-sensitive grammars; it will run into extremely high costs for arbitrarily nasty grammars, but that's actually not an objection in practice.

Coronary answered 25/2, 2016 at 15:18 Comment(3)
I have a gut feeling that no matter what universal parser you invent for CS grammars, it will be possible to construct a grammar that will make it run in exponential time. But it's just a hunch.Agnesagnese
Actually, Earley/CYK/GLR (and I suspect GLL) parsers run O(n^3) worst case, with considerably higher constant factors than a really sleek LALR(1) engine. That's pretty bad with a few hundred tokens. In practice, you can pretty much avoid the n^3 behavior on long stretches; where the local grammar is ugly its an acceptable price because people tend not to write huge sequences which are hard to parse. After all, they have to read it, too :-}Coronary
@biziclop's comment refers to context-sensitive grammars, which cannot be parsed with Earley/GLR/CYK. So the O(n^3) complexity does not apply.Sublimity

© 2022 - 2024 — McMap. All rights reserved.