ANTLR4 Lexer getTokens() returning 0 tokens
Asked Answered
W

1

6

I'm running code from here: https://github.com/bkiers/antlr4-csv-demo. I want to view the tokens analyzed by the lexer by adding this line:

System.out.println("Number of tokens: " + tokens.getTokens().size())

to Main.java:

public static void main(String[] args) throws Exception {  
    // the input source  
    String source =   
        "aaa,bbb,ccc" + "\n" +   
        "\"d,\"\"d\",eee,fff";  

    // create an instance of the lexer  
    CSVLexer lexer = new CSVLexer(new ANTLRInputStream(source));  

    // wrap a token-stream around the lexer  
    CommonTokenStream tokens = new CommonTokenStream(lexer);  

    // look at tokens analyzed
    System.out.println("Number of tokens: " + tokens.getTokens().size())

    // create the parser  
    CSVParser parser = new CSVParser(tokens);  

    // invoke the entry point of our grammar  
    List<List<String>> data = parser.file().data;  

    // display the contents of the CSV source  
    for(int r = 0; r < data.size(); r++) {  
      List<String> row = data.get(r);  
      for(int c = 0; c < row.size(); c++) {  
        System.out.println("(row=" + (r+1) + ",col=" + (c+1) + ") = " + row.get(c));  
      }  
    }  
  }  

The result printed out is: Number of tokens: 0. Why is the list returned by getTokens() empty? The rest of the parser code returns the data completely fine.

EDIT: So using lexer.getAllTokens() instead works, but why is the CommonTokenStream not returning the correct tokens?

csv.g4:

grammar CSV;

@header {
  package csv;
}

file returns [List<List<String>> data]  
@init {$data = new ArrayList<List<String>>();}  
 : (row {$data.add($row.list);})+ EOF  
 ; 

row returns [List<String> list]  
@init {$list = new ArrayList<String>();}  
 : a=value {$list.add($a.val);} (Comma b=value {$list.add($b.val);})* (LineBreak | EOF)  
 ;

value returns [String val]  
 : SimpleValue {$val = $SimpleValue.text;}  
 | QuotedValue   
   { 
     $val = $QuotedValue.text; 
     $val = $val.substring(1, $val.length()-1); // remove leading- and trailing quotes 
     $val = $val.replace("\"\"", "\""); // replace all `""` with `"` 
   }  
 ;  

Comma  
 : ','  
 ;  

LineBreak  
 : '\r'? '\n'  
 | '\r'  
 ;  

SimpleValue  
 : ~[,\r\n"]+  
 ;  

QuotedValue  
 : '"' ('""' | ~'"')* '"'  
 ;  
Watereddown answered 5/6, 2015 at 16:8 Comment(0)
A
10

Normally, the Parser is responsible for initiating the lexing of the input stream. To initiate lexing manually, call CommonTokenStream.fill() (which is implemented in BufferedTokenStream).

Amiens answered 5/6, 2015 at 22:18 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.