ANTLR4: Matching all input alternatives exaclty once
Asked Answered
M

2

17

How can I make a rule to match all of its alternatives only once in any order, in ANTLR?

i.e.

rule: ('example\r\n' | 'example2\r\n') nextRule

I would like 'example' and 'example2' to match only once before moving onto the next rule.

Should match inputs of:

example
example2

or

example2
example

but not inputs of:

example
example
example2
Midyear answered 18/2, 2013 at 10:26 Comment(0)
S
20

How can I make a rule to match all of its alternatives only once in any order, in ANTLR?

"All of its alternatives only once" is simply rule: altA altB altC .... That's the easy part. Asking rule to accept all alternatives altA, altB, altC, etc. in any arrangement means accommodating every arrangement. That's easy enough to handle manually for small numbers (rule: (altA altB | altB altA);). But there's no automatic shortcut that I'm aware of to handling all the permutations for you automatically.

Here are some approaches in case there isn't a built-in way and assuming you can't relax your requirements. Caveats: I don't know the full scope of your problem; I don't know your grammar; I don't know why you want what you're asking for; I don't know what class of solution you prefer, other than you'd probably like it easier than any of these options.


First, you could bite the bullet and produce all the permutations of matches yourself, either by hand or by running a permutation generator. Then ANTLR would have what you want in a manner that it understands. It's crude but efficient: it's straight ANTLR syntax so there's no external code involved as there is in the options below.

For example, assume you have a rule field that processes input like "public static final x", with all three modifiers expected but in no particular order. The permutations would look like so:

field : modifiers ID EOF;

modifiers
    : PUBLIC STATIC FINAL //ABC
    | PUBLIC FINAL STATIC //ACB
    | STATIC PUBLIC FINAL //BAC
    | STATIC FINAL PUBLIC //BCA
    | FINAL PUBLIC STATIC //CAB
    | FINAL STATIC PUBLIC //CBA
    ;

That's the end of it. No external code, no predicates, no nothing.


Second, you could use semantic predicates in the grammar to ensure that all matches are provided and matched without duplicates. There are various ways of writing the predicates themselves, but it boils down to tracking what matches have been made (to prevent duplicates) and then testing whether the rule has matched all the parts that it expects. Here is a basic example, following the same requirements as the previous one:

field 
    locals [java.util.HashSet<String> names = new java.util.HashSet<String>();]
    : modifiers ID EOF;

modifiers
    //Ensure that the full number of modifiers have been provided
    : {$field::names.size() < 3}? modifier modifiers
    | {$field::names.size() == 3}? //match nothing once we have (any) three modifiers
    ;

modifier
    //Ensure that no duplicates have been provided
    : {!$field::names.contains("public")}? PUBLIC {$field::names.add("public");}
    | {!$field::names.contains("static")}? STATIC {$field::names.add("static");}
    | {!$field::names.contains("final")}? FINAL {$field::names.add("final");}
    ;

Here rule field tracks the modifier names in local variable names. Rule modifiers calls rule modifier until names contains three values. Rule modifier matches any name that doesn't have a corresponding key in names. Note that the values are manually added to names. They could be any arbitrary value as long as modifier's alternatives add the same value to both sides of the token it's matching.

My implementation is a little crude because the modifiers end up nesting in the produced parse tree (since modifiers contains one modifier and one modifiers that contains one modifier and one modifiers that...), but I hope you get the idea.


Third, you could leave the poor parser alone and test for completeness in the calling code. This can be done during parsing using a parser listener or done after parsing using the ParserRuleContext object produced by the parser. This breaks the problem into two parts: let the parser solve "any X, Y, Z in any order" and let the calling code solve "all and only just X, Y, Z."

Here is an example using the listener approach:

//partial grammar

field : modifier* ID EOF; //accept any modifiers in any order

modifier  
    : PUBLIC
    | STATIC
    | FINAL
    ;

 

//snippet of calling code
//initialize lexer and parser

parser.addParseListener(new MyGrammarBaseListener() {
    @Override
    public void exitField(FieldContext ctx) {
        // The field rule has finished. Let's verify that no modifiers
        // were duplicated.

        //TODO check for duplicates, ensure all modifiers are specified.
        //TODO log errors accordingly.

    }
});

//Call the parser.
parser.field();

The grammar is kept clean. Modifiers can appear in the input arbitrarily before ID, in any number and in any order. The calling code performs the tests by whatever means it chooses, logging whatever errors it wants.


Here is an uberexample that pulls together the three options I mentioned, to give a clearer idea of what I'm talking about.

Modifiers.g

grammar Modifiers;

//Hard-coded version : all the permutations are specified //
permutationField : permutationModifiers ID EOF;

permutationModifiers
    : PUBLIC STATIC FINAL //ABC
    | PUBLIC FINAL STATIC //ACB
    | STATIC PUBLIC FINAL //BAC
    | STATIC FINAL PUBLIC //BCA
    | FINAL PUBLIC STATIC //CAB
    | FINAL STATIC PUBLIC //CBA
    ;

// Predicate version : use semantic predicates to prevent duplicates and ensure all the modifiers are provided //

predicateField 
    locals [java.util.HashSet<String> names = new java.util.HashSet<String>();]
    : predicateModifiers ID EOF;

predicateModifiers
    //Ensure that the full number of modifiers have been provided
    : {$predicateField::names.size() < 3}? predicateModifier predicateModifiers
    | {$predicateField::names.size() == 3}? //match nothing once we have (any) three modifiers
    ;

predicateModifier
    //Ensure that no duplicates have been provided
    : {!$predicateField::names.contains("public")}? PUBLIC {$predicateField::names.add("public");}
    | {!$predicateField::names.contains("static")}? STATIC {$predicateField::names.add("static");}
    | {!$predicateField::names.contains("final")}? FINAL {$predicateField::names.add("final");}
    ;

//Listener version : test everything when the parser calls the listener //

listenerField : listenerModifier* ID EOF;

listenerModifier  
    : PUBLIC
    | STATIC
    | FINAL
    ;


PUBLIC : 'public';
STATIC : 'static';
FINAL  : 'final';
FOO    : 'foo';
ID     : [a-zA-Z]+;
WS     : [ \r\n\t]+ -> skip; 

ModifiersTest.java

import java.util.ArrayList;
import java.util.HashSet;
import java.util.List;

import org.antlr.v4.runtime.ANTLRInputStream;
import org.antlr.v4.runtime.BaseErrorListener;
import org.antlr.v4.runtime.CommonTokenStream;
import org.antlr.v4.runtime.RecognitionException;
import org.antlr.v4.runtime.Recognizer;
import org.antlr.v4.runtime.misc.Nullable;

public class ModifiersTest {

    public static void main(String[] args) {

        run("public static final x", "ok");
        run("final static public x", "ok");
        run("static public static final x", "too many modifiers");
        run("static x", "missing modifiers");
        run("final final x", "missing & duplicated modifiers");
    }

    private static void run(String code, String title) {
        System.out.printf("%n---%n**Input : %s**%n%n\t%s%n%n", title, code);

        System.out.println("**Permutation Output**\n");
        runPermutationTest(code);
        System.out.println();

        System.out.println("**Predicate Output**\n");
        runPredicateTest(code);
        System.out.println();

        System.out.println("**Listener Output**\n");
        runListenerTest(code);
        System.out.println();

    }
    private static void runPermutationTest(String code) {
        ModifiersParser parser = createParser(code);

        parser.permutationField();
        System.out.println("\t(done)");
    }

    private static void runPredicateTest(String code) {
        ModifiersParser parser = createParser(code);

        parser.predicateField();
        System.out.println("\t(done)");
    }

    private static void runListenerTest(String code) {
        ModifiersParser parser = createParser(code);

        parser.addParseListener(new ModifiersBaseListener() {
            @Override
            public void exitListenerField(ModifiersParser.ListenerFieldContext ctx) {
                // The field rule has finished. Let's verify that no modifiers
                // were duplicated.

                HashSet<String> uniqueNames = new HashSet<String>();
                ArrayList<String> allNames = new ArrayList<String>();
                HashSet<String> expectedNames = new HashSet<String>();
                expectedNames.add("public");
                expectedNames.add("static");
                expectedNames.add("final");

                if (ctx.listenerModifier() != null && !ctx.listenerModifier().isEmpty()) {
                    List<ModifiersParser.ListenerModifierContext> modifiers = ctx.listenerModifier();

                    // Collect all the modifier names in a set.
                    for (ModifiersParser.ListenerModifierContext modifier : modifiers) {
                        uniqueNames.add(modifier.getText());
                        allNames.add(modifier.getText());
                    }
                }

                // Is the number of unique modifiers less than the number of
                // all given modifiers? If so, then there must be duplicates.
                if (uniqueNames.size() < allNames.size()) {
                    ArrayList<String> names = new ArrayList<String>(allNames);
                    for (String name : uniqueNames){
                        names.remove(name);
                    }
                    System.out.println("\tDetected duplicate modifiers : " + names);
                } else {
                    System.out.println("\t(No duplicate modifiers detected)");
                }

                //Are we missing any expected modifiers?
                if (!uniqueNames.containsAll(expectedNames)) {
                    ArrayList<String> names = new ArrayList<String>(expectedNames);
                    names.removeAll(uniqueNames);
                    System.out.println("\tDetected missing modifiers : " + names);
                } else {
                    System.out.println("\t(No missing modifiers detected)");
                }
            }
        });

        parser.listenerField();

        System.out.println("\t(done)");

    }

    private static ModifiersParser createParser(String code) {
        ANTLRInputStream input = new ANTLRInputStream(code);

        ModifiersLexer lexer = new ModifiersLexer(input);

        ModifiersParser parser = new ModifiersParser(new CommonTokenStream(lexer));

        BaseErrorListener errorListener = createErrorListener();

        lexer.addErrorListener(errorListener);
        parser.addErrorListener(errorListener);
        return parser;
    }

    private static BaseErrorListener createErrorListener() {
        BaseErrorListener errorListener = new BaseErrorListener() {

            @Override
            public void syntaxError(Recognizer<?, ?> recognizer, @Nullable Object offendingSymbol, int line,
                    int charPositionInLine, String msg, @Nullable RecognitionException e) {
                //Print the syntax error 
                System.out.printf("\t%s at (%d, %d)%n", msg, line, charPositionInLine);
            }
        };
        return errorListener;
    }
}

Test Scenarios (output from the code above)


Input : ok

public static final x

Permutation Output

(done)

Predicate Output

(done)

Listener Output

(No duplicate modifiers detected)
(No missing modifiers detected)
(done)

Input : ok

final static public x

Permutation Output

(done)

Predicate Output

(done)

Listener Output

(No duplicate modifiers detected)
(No missing modifiers detected)
(done)

Input : too many modifiers

static public static final x

Permutation Output

extraneous input 'static' expecting 'final' at (1, 14)
(done)

Predicate Output

no viable alternative at input 'static' at (1, 14)
(done)

Listener Output

Detected duplicate modifiers : [static]
(No missing modifiers detected)
(done)

Input : missing modifiers

static x

Permutation Output

no viable alternative at input 'staticx' at (1, 7)
(done)

Predicate Output

no viable alternative at input 'x' at (1, 7)
(done)

Listener Output

(No duplicate modifiers detected)
Detected missing modifiers : [final, public]
(done)

Input : missing & duplicated modifiers

final final x

Permutation Output

no viable alternative at input 'finalfinal' at (1, 6)
(done)

Predicate Output

no viable alternative at input 'final' at (1, 6)
(done)

Listener Output

Detected duplicate modifiers : [final]
Detected missing modifiers : [static, public]
(done)
Stein answered 18/2, 2013 at 22:2 Comment(3)
Thank you for your very complete and thorough answer! I am attempting to write a custom configuration file parser. My conclusion of the methods above are: 1. Permutation method -- is good if there are small number of permutations it seems to get to debug when there are alot of them. 2. In-parser validation, this method is nice and compact and is very nice for small grammars. 3. Separation of parser with code -- I liked this method the most, since it is clear as to who does what and where the error would be. It is also easy to scale. Great answer, thank you very much!Midyear
forgot to mention, my criteria would be easy to debug and scale.Midyear
@user1932405 You're welcome. I think your summary is very good and I agree that managing the verification outside of the parser is probably the best way to go for various reasons, like being able to provide more meaningful detail in the error messages than you would get from an ANTLR syntax error. There are countless ways to implement the grammar and listener -- my answer and 280Z28's show just two ways -- so you'll have plenty of flexibility for your particular needs.Stein
S
13

With ANTLR 4, I would actually prefer to use something like the following, where the input is expected to contain exactly 1 each of a, b, and c.

items : (a | b | c)*;

Then in a listener, I would use code like the following:

@Override
public void enterItems(ItemsContext ctx) {
    if (ctx.a().size() != 1) {
        // report error
    } else if (ctx.b().size() != 1) {
        // report error
    } else ...
}
Suggestive answered 18/2, 2013 at 22:36 Comment(1)
Thanks for your reply, this seems like a method which I will go forMidyear

© 2022 - 2024 — McMap. All rights reserved.