ANTLR4 lexer rule with @init block
Asked Answered
B

2

6

I have this lexer rule defined in my ANTLR v3 grammar file - it maths text in double quotes. I need to convert it to ANTLR v4. ANTLR compiler throws an error 'syntax error: mismatched input '@' expecting COLON while matching a lexer rule' (in @init line). Can lexer rule contain a @init block ? How this should be rewritten ?

DOUBLE_QUOTED_CHARACTERS
@init 
{
   int doubleQuoteMark = input.mark(); 
   int semiColonPos = -1;
}
: ('"' WS* '"') => '"' WS* '"' { $channel = HIDDEN; }
{
    RecognitionException re = new RecognitionException("Illegal empty quotes\"\"!", input);
    reportError(re);
}
| '"' (options {greedy=false;}: ~('"'))+ 
  ('"'|';' { semiColonPos = input.index(); } ('\u0020'|'\t')* ('\n'|'\r'))
{ 
    if (semiColonPos >= 0)
    {
        input.rewind(doubleQuoteMark);

        RecognitionException re = new RecognitionException("Missing closing double quote!", input);
        reportError(re);
        input.consume();            
    }
    else
    {
        setText(getText().substring(1, getText().length()-1));
    }
}
; 

Sample data:

  1. " " -> throws error "Illegal empty quotes!";
  2. "asd -> throws error "Missing closing double quote!"
  3. "text" -> returns text (valid input, content of "...")
Belldame answered 16/10, 2014 at 13:28 Comment(2)
By looking at your rule, it is not clear to me what you intent to match with DOUBLE_QUOTED_CHARACTERS. Could you give some valid input examples?Fortune
I edited my question and added some examples.Belldame
D
1

I think this is the right way to do this.

DOUBLE_QUOTED_CHARACTERS
:
{
   int doubleQuoteMark = input.mark();
   int semiColonPos = -1;
}
(
    ('"' WS* '"') => '"' WS* '"' { $channel = HIDDEN; }
    {
        RecognitionException re = new RecognitionException("Illegal empty quotes\"\"!", input);
        reportError(re);
    }
    | '"' (options {greedy=false;}: ~('"'))+
      ('"'|';' { semiColonPos = input.index(); } ('\u0020'|'\t')* ('\n'|'\r'))
    {
        if (semiColonPos >= 0)
        {
            input.rewind(doubleQuoteMark);

            RecognitionException re = new RecognitionException("Missing closing double quote!", input);
            reportError(re);
            input.consume();
        }
        else
        {
            setText(getText().substring(1, getText().length()-1));
        }
    }
)
;

There are some other errors as well in above like WS .. => ... but I am not correcting them as part of this answer. Just to keep things simple. I took hint from here

Just to hedge against that link moving or becoming invalid after sometime, quoting the text as is:

Lexer actions can appear anywhere as of 4.2, not just at the end of the outermost alternative. The lexer executes the actions at the appropriate input position, according to the placement of the action within the rule. To execute a single action for a role that has multiple alternatives, you can enclose the alts in parentheses and put the action afterwards:

END : ('endif'|'end') {System.out.println("found an end");} ;

The action conforms to the syntax of the target language. ANTLR copies the action’s contents into the generated code verbatim; there is no translation of expressions like $x.y as there is in parser actions.

Only actions within the outermost token rule are executed. In other words, if STRING calls ESC_CHAR and ESC_CHAR has an action, that action is not executed when the lexer starts matching in STRING.
Dalmatia answered 4/1, 2019 at 14:49 Comment(0)
N
-1

I in countered this problem when my .g4 grammar imported a lexer file. Importing grammar files seems to trigger lots of undocumented shortcomings in ANTLR4. So ultimately I had to stop using import. In my case, once I merged the LEXER grammar into the parser grammar (one single .g4 file) my @input and @after parsing errors vanished. I should submit a test case + bug, at least to get this documented. I will update here once I do that. I vaguely recall 2-3 issues with respect to importing lexer grammar into my parser that triggered undocumented behavior. Much is covered here on stackoverflow.

Nell answered 17/4, 2017 at 14:55 Comment(1)
LEXER RULES does not take init and after blocks.Dalmatia

© 2022 - 2024 — McMap. All rights reserved.