ANTLR doesn't give correct output tokens for Scala Grammar [closed]
Asked Answered
M

1

6

I am new to Scala and I am trying to parse Scala files with the use of Scala Grammar and ANTLR. Below is the code for Scala Grammar which I got from the git hub link:

https://github.com/antlr/grammars-v4/tree/master/scala

There are chances of repo to be moved so I am pasting the Scala grammar code here:

grammar Scala;

literal           : '-'? IntegerLiteral
                | '-'? FloatingPointLiteral
                | BooleanLiteral
                | CharacterLiteral
                | StringLiteral
                | SymbolLiteral
                | 'null' ;

qualId            : Id ('.' Id)* ;

ids               : Id (',' Id)* ;

stableId          : (Id | (Id '.')? 'this') '.' Id
                | (Id '.')? 'super' classQualifier? '.' Id ;

classQualifier    : '[' Id ']' ;

type              : functionArgTypes '=>' type
                | infixType existentialClause? ;

functionArgTypes  : infixType
                | '(' ( paramType (',' paramType )* )? ')' ;

existentialClause : 'forSome' '{' existentialDcl (Semi existentialDcl)* '}';

existentialDcl    : 'type' typeDcl
                | 'val' valDcl;

infixType         : compoundType (Id Nl? compoundType)*;

compoundType      : annotType ('with' annotType)* refinement?
                | refinement;

annotType         : simpleType annotation*;

simpleType        : simpleType typeArgs
                | simpleType '#' Id
                | stableId
                | (stableId | (Id '.')? 'this') '.' 'type'
                | '(' types ')';

typeArgs          : '[' types ']';

types             : type (',' type)*;

refinement        : Nl? '{' refineStat (Semi refineStat)* '}';

refineStat        : dcl
                | 'type' typeDef
                | ;

typePat           : type;

ascription        : ':' infixType
                | ':' annotation+
                | ':' '_' '*';

expr              : (bindings | 'implicit'? Id | '_') '=>' expr
                | expr1 ;

expr1             : 'if' '(' expr ')' Nl* expr (Semi? 'else' expr)?
                | 'while' '(' expr ')' Nl* expr
                | 'try' ('{' block '}' | expr) ('catch' '{' caseClauses '}')? ('finally' expr)?
                | 'do' expr Semi? 'while' '(' expr ')'
                | 'for' ('(' enumerators ')' | '{' enumerators '}') Nl* 'yield'? expr
                | 'throw' expr
                | 'return' expr?
                | (('new' (classTemplate | templateBody)| blockExpr | simpleExpr1 '_'?) '.') Id '=' expr
                | simpleExpr1 argumentExprs '=' expr
                | postfixExpr
                | postfixExpr ascription
                | postfixExpr 'match' '{' caseClauses '}' ;

postfixExpr       : infixExpr (Id Nl?)? ;

infixExpr         : prefixExpr
                | infixExpr Id Nl? infixExpr ;

prefixExpr        : ('-' | '+' | '~' | '!')?
                  ('new' (classTemplate | templateBody)| blockExpr | simpleExpr1 '_'?) ;

simpleExpr1       : literal
                | stableId
                | (Id '.')? 'this'
                | '_'
                | '(' exprs? ')'
                | ('new' (classTemplate | templateBody) | blockExpr ) '.' Id
                | ('new' (classTemplate | templateBody) | blockExpr ) typeArgs
                | simpleExpr1 argumentExprs
      ;

exprs             : expr (',' expr)* ;

argumentExprs     : '(' exprs? ')'
                | '(' (exprs ',')? postfixExpr ':' '_' '*' ')'
                | Nl? blockExpr ;

blockExpr         : '{' caseClauses '}'
                | '{' block '}' ;
block             : blockStat (Semi blockStat)* resultExpr? ;

blockStat         : import_
                | annotation* ('implicit' | 'lazy')? def
                | annotation* localModifier* tmplDef
                | expr1
                | ;

resultExpr        : expr1
                | (bindings | ('implicit'? Id | '_') ':' compoundType) '=>' block ;

enumerators       : generator (Semi generator)* ;

generator         : pattern1 '<-' expr (Semi? guard | Semi pattern1 '=' expr)* ;

caseClauses       : caseClause+ ;

caseClause        : 'case' pattern guard? '=>' block ;

guard             : 'if' postfixExpr ;

pattern           : pattern1 ('|' pattern1 )* ;

pattern1          : Varid ':' typePat
                | '_' ':' typePat
                | pattern2 ;

pattern2          : Varid ('@' pattern3)?
                | pattern3 ;

pattern3          : simplePattern
                | simplePattern (Id Nl? simplePattern)* ;

simplePattern     : '_'
                | Varid
                | literal
                | stableId ('(' patterns ')')?
                | stableId '(' (patterns ',')? (Varid '@')? '_' '*' ')'
                | '(' patterns? ')' ;

patterns          : pattern (',' patterns)*
                | '_' * ;

typeParamClause   : '[' variantTypeParam (',' variantTypeParam)* ']' ;

funTypeParamClause: '[' typeParam (',' typeParam)* ']' ;

variantTypeParam  : annotation? ('+' | '-')? typeParam ;

typeParam         : (Id | '_') typeParamClause? ('>:' type)? ('<:' type)?
                  ('<%' type)* (':' type)* ;

paramClauses      : paramClause* (Nl? '(' 'implicit' params ')')? ;

paramClause       : Nl? '(' params? ')' ;

params            : param (',' param)* ;

param             : annotation* Id (':' paramType)? ('=' expr)? ;

paramType         : type
                | '=>' type
                | type '*';

classParamClauses : classParamClause*
                  (Nl? '(' 'implicit' classParams ')')? ;

classParamClause  : Nl? '(' classParams? ')' ;

classParams       : classParam (',' classParam)* ;

classParam        : annotation* modifier* ('val' | 'var')?
                  Id ':' paramType ('=' expr)? ;

bindings          : '(' binding (',' binding )* ')' ;

binding           : (Id | '_') (':' type)? ;

modifier          : localModifier
                | accessModifier
                | 'override' ;

localModifier     : 'abstract'
                | 'final'
                | 'sealed'
                | 'implicit'
                | 'lazy' ;

accessModifier    : ('private' | 'protected') accessQualifier? ;

accessQualifier   : '[' (Id | 'this') ']' ;

annotation        : '@' simpleType argumentExprs* ;

constrAnnotation  : '@' simpleType argumentExprs ;

templateBody      : Nl? '{' selfType? templateStat (Semi templateStat)* '}' ;

templateStat      : import_
                | (annotation Nl?)* modifier* def
                | (annotation Nl?)* modifier* dcl
                |  expr
                | ;

selfType          : Id (':' type)? '=>'
                | 'this' ':' type '=>' ;

import_           : 'import' importExpr (',' importExpr)* ;

importExpr        : stableId '.' (Id | '_' | importSelectors) ;

importSelectors   : '{' (importSelector ',')* (importSelector | '_') '}' ;

importSelector    : Id ('=>' Id | '=>' '_') ;

dcl               : 'val' valDcl
                | 'var' varDcl
                | 'def' funDcl
                | 'type' Nl* typeDcl ;

valDcl            : ids ':' type ;

varDcl            : ids ':' type ;

funDcl            : funSig (':' type)? ;

funSig            : Id funTypeParamClause? paramClauses ;

typeDcl           : Id typeParamClause? ('>:' type)? ('<:' type)? ;

patVarDef         : 'val' patDef
                | 'var' varDef ;

def               : patVarDef
                | 'def' funDef
                | 'type' Nl* typeDef
                | tmplDef ;

patDef            : pattern2 (',' pattern2)* (':' type)* '=' expr ;

varDef            : patDef
                | ids ':' type '=' '_' ;

funDef            : funSig (':' type)? '=' expr
                | funSig Nl? '{' block '}'
                | 'this' paramClause paramClauses
                  ('=' constrExpr | Nl constrBlock) ;

typeDef           :  Id typeParamClause? '=' type ;

tmplDef           : 'case'? 'class' classDef
                | 'case' 'object' objectDef
                | 'trait' traitDef ;

classDef          : Id typeParamClause? constrAnnotation* accessModifier?
                  classParamClauses classTemplateOpt ;

traitDef          : Id typeParamClause? traitTemplateOpt ;

objectDef         : Id classTemplateOpt ;

classTemplateOpt  : 'extends' classTemplate | ('extends'? templateBody)? ;

traitTemplateOpt  : 'extends' traitTemplate | ('extends'? templateBody)? ;

classTemplate     : earlyDefs? classParents templateBody? ;

traitTemplate     : earlyDefs? traitParents templateBody? ;

classParents      : constr ('with' annotType)* ;

traitParents      : annotType ('with' annotType)* ;

constr            : annotType argumentExprs* ;

earlyDefs         : '{' (earlyDef (Semi earlyDef)*)? '}' 'with' ;

earlyDef          : (annotation Nl?)* modifier* patVarDef ;

constrExpr        : selfInvocation
                | constrBlock ;

constrBlock       : '{' selfInvocation (Semi blockStat)* '}' ;
selfInvocation    : 'this' argumentExprs+ ;

topStatSeq        : topStat (Semi topStat)* ;

topStat           : (annotation Nl?)* modifier* tmplDef
                | import_
                | packaging
                | packageObject
                | ;

packaging         : 'package' qualId Nl? '{' topStatSeq '}' ;

packageObject     : 'package' 'object' objectDef ;

compilationUnit   : ('package' qualId Semi)* topStatSeq ;

// Lexer
BooleanLiteral   :  'true' | 'false';
CharacterLiteral :  '\'' (PrintableChar | CharEscapeSeq) '\'';
StringLiteral    :  '"' StringElement* '"'
               |  '"""' MultiLineChars '"""';
SymbolLiteral    :  '\'' Plainid;
IntegerLiteral   :  (DecimalNumeral | HexNumeral) ('L' | 'l');
FloatingPointLiteral
               :  Digit+ '.' Digit+ ExponentPart? FloatType?
               |  '.' Digit+ ExponentPart? FloatType?
               |  Digit ExponentPart FloatType?
               |  Digit+ ExponentPart? FloatType;
Id               :  Plainid
               |  '`' StringLiteral '`';
Varid            :  Lower Idrest;
Nl               :  '\r'? '\n';
Semi             :  ';' |  Nl+;

Paren            :  '(' | ')' | '[' | ']' | '{' | '}';
Delim            :  '`' | '\'' | '"' | '.' | ';' | ',' ;

Comment          :  '/*' .*?  '*/'
               |  '//' .*? Nl;

// fragments
fragment UnicodeEscape    : '\\' 'u' 'u'? HexDigit HexDigit HexDigit HexDigit ;
fragment WhiteSpace       :  '\u0020' | '\u0009' | '\u000D' | '\u000A';
fragment Opchar           : PrintableChar // printableChar not matched by (whiteSpace | upper | lower |
                        // letter | digit | paren | delim | opchar | Unicode_Sm | Unicode_So)
                        ;
fragment Op               :  Opchar+;
fragment Plainid          :  Upper Idrest
                        |  Varid
                        |  Op;
fragment Idrest           :  (Letter | Digit)* ('_' Op)?;

fragment StringElement    :  '\u0020'| '\u0021'|'\u0023' .. '\u007F'  // (PrintableChar  Except '"')
                        |  CharEscapeSeq;
fragment MultiLineChars   :  ('"'? '"'? .*?)* '"'*;

fragment HexDigit         :  '0' .. '9'  |  'A' .. 'Z'  |  'a' .. 'z' ;
fragment FloatType        :  'F' | 'f' | 'D' | 'd';
fragment Upper            :  'A'  ..  'Z' | '$' | '_';  // and Unicode category Lu
fragment Lower            :  'a' .. 'z'; // and Unicode category Ll
fragment Letter           :  Upper | Lower; // and Unicode categories Lo, Lt, Nl
fragment ExponentPart     :  ('E' | 'e') ('+' | '-')? Digit+;
fragment PrintableChar    : '\u0020' .. '\u007F' ;
fragment CharEscapeSeq    : '\\' ('b' | 't' | 'n' | 'f' | 'r' | '"' | '\'' | '\\');
fragment DecimalNumeral   :  '0' | NonZeroDigit Digit*;
fragment HexNumeral       :  '0' 'x' HexDigit HexDigit+;
fragment Digit            :  '0' | NonZeroDigit;
fragment NonZeroDigit     :  '1' .. '9';

The above Scala grammar is same as what I got from Scala official website:

http://www.scala-lang.org/files/archive/spec/2.11/13-syntax-summary.html

Now I am trying to generate tokens for a scala file named scala.scala. Code for that file is below :

object HelloWorld {
  def main(args: Array[String]) {
    println("Hello, world!")
  }
}

I am running the following command to get the tokens :

grun Scala compilationUnit -tokens scala.scala

or

grun Scala expr -tokens scala.scala

or

grun Scala literal -tokens scala.scala

The output I got is:

[@0,0:18='object HelloWorld {',<68>,1:0]
[@1,19:19='\n',<70>,1:19]
[@2,20:52='  def main(args: Array[String]) {',<68>,2:0]
[@3,53:53='\n',<70>,2:33]
[@4,54:81='    println("Hello, world!")',<68>,3:0]
[@5,82:82='\n',<70>,3:28]
[@6,83:85='  }',<68>,4:0]
[@7,86:86='\n',<70>,4:3]
[@8,87:87='}',<14>,5:0]
[@9,88:88='\n',<70>,5:1]
[@10,89:88='<EOF>',<-1>,6:0]
line 1:19 no viable alternative at input 'object HelloWorld {\n'

Output in the tree form is like this :

(expr object HelloWorld { \n   def main(args: Array[String]) { \n     println("Hello, world!") \n   } \n } \n)

and output in the gui is like this :

Image exported from the antlr tool

That is completely stupid. In place of tokens it's giving me simply LOC . I tested it for the other languages Java and C and it works perfect. It gives me correct output/correct tokens which are expected for the following grammar links:

https://github.com/antlr/grammars-v4

Please correct me If I am doing something wrong because I am new to Antlr and Scala.

What I meant from token is all keywords,operands and all operators are there. According to me it's never meant to be simply Lines of Code.

Below is the Scala.tokens file which I got using Scala.g4(Scala Grammar with ANTLR).



T__0=1
T__1=2
T__2=3
T__3=4
T__4=5
T__5=6
T__6=7
T__7=8
T__8=9
T__9=10
T__10=11
T__11=12
T__12=13
T__13=14
T__14=15
T__15=16
T__16=17
T__17=18
T__18=19
T__19=20
T__20=21
T__21=22
T__22=23
T__23=24
T__24=25
T__25=26
T__26=27
T__27=28
T__28=29
T__29=30
T__30=31
T__31=32
T__32=33
T__33=34
T__34=35
T__35=36
T__36=37
T__37=38
T__38=39
T__39=40
T__40=41
T__41=42
T__42=43
T__43=44
T__44=45
T__45=46
T__46=47
T__47=48
T__48=49
T__49=50
T__50=51
T__51=52
T__52=53
T__53=54
T__54=55
T__55=56
T__56=57
T__57=58
T__58=59
T__59=60
T__60=61
BooleanLiteral=62
CharacterLiteral=63
StringLiteral=64
SymbolLiteral=65
IntegerLiteral=66
FloatingPointLiteral=67
Id=68
Varid=69
Nl=70
Semi=71
Paren=72
Delim=73
Comment=74
'-'=1
'null'=2
'.'=3
','=4
'this'=5
'super'=6
'['=7
']'=8
'=>'=9
'('=10
')'=11
'forSome'=12
'{'=13
'}'=14
'type'=15
'val'=16
'with'=17
'#'=18
':'=19
'_'=20
'*'=21
'implicit'=22
'if'=23
'else'=24
'while'=25
'try'=26
'catch'=27
'finally'=28
'do'=29
'for'=30
'yield'=31
'throw'=32
'return'=33
'new'=34
'='=35
'match'=36
'+'=37
'~'=38
'!'=39
'lazy'=40
'<-'=41
'case'=42
'|'=43
'@'=44
'>:'=45
'<:'=46
'<%'=47
'var'=48
'override'=49
'abstract'=50
'final'=51
'sealed'=52
'private'=53
'protected'=54
'import'=55
'def'=56
'class'=57
'object'=58
'trait'=59
'extends'=60
'package'=61

I am sure that these tokens are not correct. Can anyone make sure is this problem with the Scala Gramma or with the ANTLR?

Marrissa answered 8/11, 2016 at 8:32 Comment(7)
Can you try topStat instead of expr/literal in your command?Brigittebriley
I tried and I have the same output what I did paste aboveMarrissa
It seems the grammar has never been properly tested, and might have been developed with an early v4 version. Also, looking at the rule compilationUnit (which should match your input), I can't see it trying to match an object. My recommendation: don't use it.Breathing
@Bart Kiers ya I tried multiple scala files and with each file it gives me same error. I need to it actually so if there are some other solutions you suggest me .Marrissa
LOC where? I don't understand the issue other than the tree shows all errors.Outrank
@ The ANTLR GUY .. for LOC I mean to say .. if you see the output which I pasted above that looks like the LOC .. it doesn't looks like the tokens . To make it clear I gave my definition of tokens as well. It should each distinct keywords, operators, operands etc. but the output I got from Antlr is [@0,0:18='object HelloWorld {',<68>,1:0] for first line .. that is simply the first line in my source code file. did you get it ? or I can send you a proper video on gmail to make it clear(I already sent you an email regarding this please check your inbox([email protected])).Marrissa
Hi The ANTLR GUY, did you get a chance to look into that? is this problem because of the scala grammar or because of the antlr tool itself ? because my next task relies on the tokens that I will get from the tool .Marrissa
S
0

This file seem to parse fine now, so probably grammar has been fixed

Sponge answered 24/4, 2021 at 10:17 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.