Unindented code breaks my grammar
Asked Answered
M

1

7

I have a .g4 grammar for / a lexer/parser, where the lexer is skipping line continuation tokens - not skipping them breaks the parser and isn't an option. Here's the lexer rule in question:

LINE_CONTINUATION : ' ' '_' '\r'? '\n' -> skip;

The problem this is causing, is that whenever a continued line starts at column 1, the parser blows up:

Sub Test()
Debug.Print "Some text " & _
vbNewLine & "Some more text"    
End Sub

I thought "Hey I know! I'll just pre-process the string I'm feeding ANTLR to insert an extra whitespace before the underscore, and change the grammar to accept it!"

So I changed the rule like this:

LINE_CONTINUATION : WS? WS '_' NEWLINE -> skip;
NEWLINE : WS? ('\r'? '\n') WS?; 
WS : [ \t]+;

...and the test code above gave me this parser error:

extraneous input 'vbNewLine' expecting WS

For now my only solution is to tell my users to properly indent their code. Is there any way I can fix that grammar rule?

(Full VBA.g4 grammar file on GitHub)

Moustache answered 5/1, 2016 at 21:56 Comment(4)
Why don't you merge LINE_CONTINUATION into WS?Gyatt
@IraBaxter WS is used in lots of other places. What do you mean?Moustache
You basically want line continuation to be treated like whitespace. OK, then add the lexical definion of line continuation to the WS token.Gyatt
Why didn't I think of that?!! Make that an answer, I'll test it out tonight!Moustache
G
4

You basically want line continuation to be treated like whitespace.

OK, then add the lexical definition of line continuation to the WS token. Then WS will pick up the line continuation, and you don't need the LINECONTINUATION anywhere.

//LINE_CONTINUATION : ' ' '_' '\r'? '\n' -> skip;
NEWLINE : WS? ('\r'? '\n') WS?; 
WS : ([ \t]+)|(' ' '_' '\r'? '\n');
Gyatt answered 5/1, 2016 at 23:57 Comment(2)
Spoke too fast. It worked.... for the specific case in the OP - so I tried changing the WS rule to WS : [ \t]+ ('_' '\r'? '\n')?;, and now it works and supports weird things like Option Base 1 being split into Option _\r\nBase _\r\n1, which is awesome - but it breaks whenever a continued line has any indentation and I don't understand why, since the definition as I understand it should also match one or more space/tab... got a clue?Moustache
I think I would have defined things differently: HWS = [ \t]+; ENDLINE= \r? \n; NEWLINE= HWS? ENDLINE; WS = HWS (ENDLINE HWS?)? ; This last bit hands your "continued line has indentation". The rest is just factoring to make it easier to understand. (HWS == "horizontal white space").Gyatt

© 2022 - 2024 — McMap. All rights reserved.