extra channels in antlr 4.5
Asked Answered
H

1

6

I am using antlr 4.5 to build a parser for a language with several special comment formats, which I would like to stream to different channels.

It seems antlr 4.5 has been extended with a new construct for declaring extra lexer channels:

extract from doc https://theantlrguy.atlassian.net/wiki/display/ANTLR4/Lexer+Rules

As of 4.5, you can also define channel names like you enumerations with the following construct above the lexer rules:

channels { WSCHANNEL, MYHIDDEN }

My lexing and parsing rules are in a single file, and my code looks like this:

    channels {
       ANNOT_CHANNEL,
       FORMAL_SPEC_CHANNEL,
       DOC_CHANNEL,
       COMMENT_CHANNEL,
       PRAGMAS_CHANNEL
    }

... parsing rules ...

// expression annotation (sent to a special channel)
    ANNOT: (EOL_ANNOT | LUS_ANNOT | C_ANNOT) -> channel(ANNOT_CHANNEL) ;
    fragment LUS_ANNOT: '(*!' ( COMMENT | . )*? '*)' ;
    fragment C_ANNOT: '/*!' ( COMMENT | . )*? '*/' ;
    fragment EOL_ANNOT: ('--!' | '//!') .*? EOL ;

    // formal specification annotations (sent to a special channel)
    FORMAL_SPEC: (EOL_SPEC | LUS_SPEC | C_SPEC ) -> channel(FORMAL_SPEC_CHANNEL) ;
    fragment LUS_SPEC: '(*@' ( COMMENT | . )*? '*)' ;
    fragment C_SPEC: '/*@' ( COMMENT | . )*? '*/' ;
    fragment EOL_SPEC: ('--@' | '//@' | '--%') .*? EOL;

    // documentation annotation (sent to a special channel)
    DOC: ( EOL_DOC |LUS_DOC | C_DOC ) -> channel(DOC_CHANNEL);
    fragment LUS_DOC: '(**' ( COMMENT | . )*? '*)' ;
    fragment C_DOC: '/**' ( COMMENT | . )*? '*/' ;
    fragment EOL_DOC: ('--*' | '//*') .*? EOL;

    // standard comment (sent to a special channel)
    COMMENT: ( EOL_COMMENT | LUS_COMMENT | C_COMMENT ) -> channel(COMMENT_CHANNEL);
    fragment LUS_COMMENT: '(*' ( COMMENT | . )*? '*)' ;
    fragment C_COMMENT: '/*' ( COMMENT |. )*? '*/' ;
    fragment EOL_COMMENT: ('--' | '//') .*? EOL;

    // pragmas are sent to a special channel
    PRAGMA: '#pragma' CHARACTER* '#end' -> channel(PRAGMAS_CHANNEL);

however I am still getting this 4.4-like error

warning(155): Scade6.g4:550:52: rule ANNOT contains a lexer command with an unrecognized constant value; lexer interpreters may produce incorrect output
warning(155): Scade6.g4:556:56: rule FORMAL_SPEC contains a lexer command with an unrecognized constant value; lexer interpreters may produce incorrect output
warning(155): Scade6.g4:562:45: rule DOC contains a lexer command with an unrecognized constant value; lexer interpreters may produce incorrect output
warning(155): Scade6.g4:568:62: rule COMMENT contains a lexer command with an unrecognized constant value; lexer interpreters may produce incorrect output
warning(155): Scade6.g4:574:47: rule PRAGMA contains a lexer command with an unrecognized constant value; lexer interpreters may produce incorrect output

If I split lexer and parser in two distinct files and use an import statement to import the lexer in the parser I still get the same error as above,

Using integer constants instead of names with a combined grammar

-> channel(10000)

yields the following error

error(164): Scade6.g4:8:0: custom channels are not supported in combined grammars

If I split lexer and parser apart in two files and use integer constants no warning, however it is not really satisfactory for readability.

Is there anything I can do to have extra channels named properly? (with either combined or separate lexer/parser specs, no preference)

Regards,

Halsy answered 28/1, 2015 at 16:39 Comment(0)
H
0

Is there anything I can do to have extra channels named properly?

not sure about v4.5 (have not used it), but in v4.x you could always define channels like so (assuming using java):

grammar MyGrammar;

@lexer::members {
    public static final int WHITESPACE = 1;
    public static final int COMMENTS = 2;
}

...the rest of your grammar goes here...

WS  :   [ \t\n\r]+ -> channel(WHITESPACE) ;  // channel(1)

SL_COMMENT
    :   '//' .*? '\n' -> channel(COMMENTS)   // channel(2)
    ;

If you do not already have "The Definitive ANTLR 4 Reference" book I recommend getting hold of it. Will save you a lot of time. Example above is from that book.

Higginson answered 29/1, 2015 at 10:17 Comment(4)
In fact, the book solution you recommend does not work and creates the symptoms described in my post. ANTLR does not know that WHITESPACE and COMMENTS are defined as lexer members and issues the warning. The bug was reported for version 4.2 see link and link, were never addressed until v4.5 which introduced the channels { } construct precisely for this, sadly it does not seem to do the job as expected, at least from my experiments.Halsy
The book solution does work. Tested just now on antlr4.4. It does produce same warning, but according to this the error is ONLY for interpreter. My guess is that your actual application will be using generated code for which this warning is irrelevant.Higginson
In Antlr4.7.1 using channel generates: error(50): syntax error: 'channels {' came as a complete surprise to mePhantasm
@Phantasm try to put the channel at the top, just after lexer grammar blah;Divert

© 2022 - 2024 — McMap. All rights reserved.