How to achieve Perl regular expression ^ and $ in the ANLTR4 lexer? ie. to match the start of a line and end of a line without consuming any character.
I am trying to use ANTLR4 lexer to match a # character at the start of a line but not in the middle of a line For example, to isolate and toss out all C++ preprocessor directives regardless of which directive it is while disregard a # inside a string literal. (Normally we can tokenize C++ string literals to eliminate a # appearing in the middle of a line but assuming we're not doing that). That means I only want to specify # .*? without bothering #if #ifndef #pragma, etc.
Also, the C++ standard allows whitespace and multi line comments right before and after the # e.g.
/* helo
world*/ # /* hel
l
o
*/ /*world */ifdef .....
is considered a valid preprocessor directive appearing on a single line. (the CRLFs inside the ML COMMENTs are tossed)
This's what I am doing currently:
PPLINE: '\r'? '\n' (ML_COMMENT | '\t' | '\f' |' ')* '#' (ML_COMMENT | ~[\r\n])+ -> channel(PPDIR);
But the problem is I have to rely on the existence of a CRLF before the # and toss out that CRLF altogether with the directive. I need to replace the CRLF tossed out by the CRLF of this directive line so I've to make sure the directive is terminated by a CRLF.
However, that means my grammar cannot handle a directive appearing right at the start of file (i.e. no preceding CRLF) or preceded by an EOF without terminating CRLF.
If the Perl style regex ^ $ syntax is available, I can match the SOL/EOL instead of explicitly matching and consuming CRLF.