I have an antlr4 grammar designed for an a domain specific language that is embedded into a text template.
There are two modes:
- Text (whitespace should be preserved)
- Code (whitespace should be ignored)
Sample grammar part:
template
: '{' templateBody '}'
;
templateBody
: templateChunk*
;
templateChunk
: code # codeChunk // dsl code, ignore whitespace
| text # textChunk // any text, preserve whitespace
;
The rule for code
may contain a nested reference to the template
rule. So the parser must support nesting whitespace/non-whitespace sections.
Maybe lexer modes can help - with some drawbacks:
- the code sections must be parsed in another compiler pass
- I doubt that nested sections could be mapped correctly
Yet the most promising approach seems to be the manipulation of hidden channels.
My question: Is there a best practice to fill these requirements? Is there an example grammar, that has already solved similar problems?
Appendix:
The rest of the grammar could look as following:
code
: '@' function
;
function
: Identifier '(' argument ')'
;
argument
: function
| template
;
text
: Whitespace+
| Identifier
| .+
;
Identifier
: LETTER (LETTER|DIGIT)*
;
Whitespace
: [ \t\n\r] -> channel(HIDDEN)
;
fragment LETTER
: [a-zA-Z]
;
fragment DIGIT
: [0-9]
;
In this example code
has a dummy implementation pointing out that it can contain nested code/template sections. Actually code
should support
- multiple arguments
- primitive type Arguments (ints, strings, ...)
- maps and lists
- function evaluation
- ...
code
andtext
rules so we can see if you really need a second pass or not. – Toshiatoshikotext
are context insensitive (i.e. if any occurance of these delimiters opens/closes atext
section). I gets difficult, if it depends on the parser state whether the delimiters delimit atext
or another language structure. – Glib