Antlr Extraneous Input
Asked Answered
R

1

8

I have a grammar file BoardFile.g4 that has (relevant parts only):

grammar Board;

//Tokens
GADGET : 'squareBumper' | 'circleBumper' | 'triangleBumper' | 'leftFlipper' | 'rightFlipper' | 'absorber' | 'portal' ;
NAME : [A-Za-z_][A-Za-z_0-9]* ;
INT : [0-9]+ ;
FLOAT : '-'?[0-9]+('.'[0-9]+)? ;
COMMENT : '#' ~( '\r' | '\n' )*;
WHITESPACE : [ \t\r\n]+ -> skip ;
KEY : [a-z] | [0-9] | 'shift' | 'ctrl' | 'alt' | 'meta' | 'space' | 'left' | 'right' | 'up' | 'down' | 'minus' | 'equals' | 'backspace' | 'openbracket' | 'closebracket' | 'backslash' | 'semicolon' | 'quote' | 'enter' | 'comma' | 'period' | 'slash' ;
KEYPRESS : 'keyup' | 'keydown' ;

//Rules
file : define+ EOF ;
define : board | ball | gadget | fire | COMMENT | key ;
board : 'board' 'name' '=' name ('gravity' '=' gravity)? ('friction1' '=' friction1)? ('friction2' '=' friction2)? ;
ball : 'ball' 'name' '=' name 'x' '=' xfloat 'y' '=' yfloat 'xVelocity' '=' xvel 'yVelocity' '=' yvel ;
gadget : gadgettype 'name' '=' name 'x' '=' xint 'y' '=' yint ('width' '=' width 'height' '=' height)? ('orientation' '=' orientation)? ('otherBoard' '=' name 'otherPortal' '=' name)? ;
fire : 'fire' 'trigger' '=' trigger 'action' '=' action ;
key : keytype 'key' '=' KEY 'action' '=' name ;

name : NAME ;
gadgettype : GADGET ;
keytype : KEYPRESS ;
gravity : FLOAT ;
friction1 : FLOAT ;
friction2 : FLOAT ;
trigger : NAME ;
action : NAME ;
yfloat : FLOAT ;
xfloat : FLOAT ;
yint : INT ;
xint : INT ;
xvel : FLOAT ;
yvel : FLOAT ;
orientation : INT ;
width : INT ;
height : INT ;

This generates the lexer and parser fine. However, when I use it against the following file, it gives the following error:

line 12:0 extraneous input 'keyup' expecting {<EOF>, KEYPRESS}

File to Parse:

board name=keysBoard gravity=5.0 friction1=0.0 friction2=0.0

# define a ball
ball name=Ball x=0.5 y=0.5 xVelocity=2.5 yVelocity=2.5

# add some flippers
leftFlipper name=FlipL1 x=16 y=2 orientation=0
leftFlipper name=FlipL2 x=16 y=9 orientation=0

# add keys. lots of keys.
keyup key=space action=apple
keydown key=a action=ball
keyup key=backslash action=cat
keydown key=period action=dog

I went through other questions about this error in SO but none helped me. I cannot figure out what's going wrong. Why am I getting this error?

Ramie answered 13/5, 2014 at 1:57 Comment(8)
What is your question? What do you expect your code to do?Nasalize
I want to parse the file correctly. Why am I getting the error? Sorry for not making it clear- I've updated the question.Ramie
how are you dealing with newlines?Jacklyn
I ignore all newLines. I have WHITESPACE : [ \t\r\n]+ -> skip ; in the grammar.Ramie
What does "parse the file correctly" mean?Nasalize
Parse the file. Right now I get the above error and so, the "keys" just don't get parsed.Ramie
What are the details of the parser rule 'name'? I tried your grammar and hard-coded 'apple' for 'name'. The parser could parse your first input line "keyup key=space action=apple". Probably, you need to provide a scaled down but fully working grammar and an input that shows the error.Gammadion
I've edited the question to include the entire grammar and the file I'm trying to parse.Ramie
A
17

The string "keyup" is being tokenized as a NAME token: that is the problem.

You must realize that the lexer operates independently from the parser. If the parser is trying to match a KEYPRESS token, the lexer does not "listen" to it, but just constructs a token following the rules:

  1. match the rule that consumes the most characters
  2. if there are more rules that match the same amount of characters, choose the one that is defined first

Taking these rules into account, and the order of your rules:

NAME : [A-Za-z_][A-Za-z_0-9]* ;

INT : [0-9]+ ;

KEY : [a-z] | [0-9] | 'shift' | 'ctrl' | 'alt' | 'meta' | 'space' | 'left' | 'right' | 'up' | 'down' | 'minus' | 'equals' | 'backspace' | 'openbracket' | 'closebracket' | 'backslash' | 'semicolon' | 'quote' | 'enter' | 'comma' | 'period' | 'slash' ;

KEYPRESS : 'keyup' | 'keydown' ;

a NAME token will be created before most of the KEY alternatives, and all of the KEYPRESS alternatives will be created.

And since an INT matches one or more digits and is defined before KEY which also has a single digit alternative, it is clear that the lexer will never produce a KEY or KEYPRESS token.

If you move the NAME and INT rule below the KEY and KEYPRESS rules, then most of the tokens will be constructed as you expect, is my guess.

EDIT

A possible solution would look like:

KEY : [a-z] | 'shift' | 'ctrl' | 'alt' | 'meta' | 'space' | 'left' | 'right' | 'up' | 'down' | 'minus' | 'equals' | 'backspace' | 'openbracket' | 'closebracket' | 'backslash' | 'semicolon' | 'quote' | 'enter' | 'comma' | 'period' | 'slash' ;

KEYPRESS : 'keyup' | 'keydown' ;

NAME : [A-Za-z_][A-Za-z_0-9]* ;

SINGLE_DIGIT : [0-9] ;

INT : [0-9]+ ;

I.e. I removed the [0-9] alternative from KEY and introduced a SINGLE_DIGIT rule (which is placed before the INT rule!).

Now create some extra parser rules:

integer : INT | SINGLE_DIGIT ;

key : KEY | SINGLE_DIGIT ;

and change all occurrences of INT inside parser rules to integer (don't call your rule int: it is a reserved word) and change all KEY to key.

And you might also want to do something similar to NAME and the [a-z] alternative in KEY (i.e. a single lowercase char would now never be tokenized as a NAME, always as a KEY).

Anethole answered 13/5, 2014 at 19:17 Comment(2)
Thank you very much! That worked! However, now that I have KEY before NAME and INT, things like leftFlipper name=a x=16 y=2 orientation=0 gives an error because now, name matches KEY before it matches NAME.Ramie
@AashishTripathee, indeed, a single digit would now not be matched as an INT but as a KEY.Anethole

© 2022 - 2024 — McMap. All rights reserved.