I'm running code from here: https://github.com/bkiers/antlr4-csv-demo. I want to view the tokens analyzed by the lexer by adding this line:
System.out.println("Number of tokens: " + tokens.getTokens().size())
to Main.java:
public static void main(String[] args) throws Exception {
// the input source
String source =
"aaa,bbb,ccc" + "\n" +
"\"d,\"\"d\",eee,fff";
// create an instance of the lexer
CSVLexer lexer = new CSVLexer(new ANTLRInputStream(source));
// wrap a token-stream around the lexer
CommonTokenStream tokens = new CommonTokenStream(lexer);
// look at tokens analyzed
System.out.println("Number of tokens: " + tokens.getTokens().size())
// create the parser
CSVParser parser = new CSVParser(tokens);
// invoke the entry point of our grammar
List<List<String>> data = parser.file().data;
// display the contents of the CSV source
for(int r = 0; r < data.size(); r++) {
List<String> row = data.get(r);
for(int c = 0; c < row.size(); c++) {
System.out.println("(row=" + (r+1) + ",col=" + (c+1) + ") = " + row.get(c));
}
}
}
The result printed out is: Number of tokens: 0
. Why is the list returned by getTokens()
empty? The rest of the parser code returns the data completely fine.
EDIT: So using lexer.getAllTokens()
instead works, but why is the CommonTokenStream
not returning the correct tokens?
csv.g4:
grammar CSV;
@header {
package csv;
}
file returns [List<List<String>> data]
@init {$data = new ArrayList<List<String>>();}
: (row {$data.add($row.list);})+ EOF
;
row returns [List<String> list]
@init {$list = new ArrayList<String>();}
: a=value {$list.add($a.val);} (Comma b=value {$list.add($b.val);})* (LineBreak | EOF)
;
value returns [String val]
: SimpleValue {$val = $SimpleValue.text;}
| QuotedValue
{
$val = $QuotedValue.text;
$val = $val.substring(1, $val.length()-1); // remove leading- and trailing quotes
$val = $val.replace("\"\"", "\""); // replace all `""` with `"`
}
;
Comma
: ','
;
LineBreak
: '\r'? '\n'
| '\r'
;
SimpleValue
: ~[,\r\n"]+
;
QuotedValue
: '"' ('""' | ~'"')* '"'
;