I'm taking a course in compiler construction, and my current assignment is to write the lexer for the language we're implementing. I can't figure out how to satisfy the requirement that the lexer must recognize concatenated tokens. That is, tokens not separated by whitespace. E.g.: the string 39if
is supposed to be recognized as the number 39
and the keyword if
. Simultaneously, the lexer must also exit(1)
when it encounters invalid input.
A simplified version of the code I have:
%{
#include <stdio.h>
%}
%option main warn debug
%%
if |
then |
else printf("keyword: %s\n", yytext);
[[:digit:]]+ printf("number: %s\n", yytext);
[[:alpha:]][[:alnum:]]* printf("identifier: %s\n", yytext);
[[:space:]]+ // skip whitespace
[[:^space:]]+ { printf("ERROR: %s\n", yytext); exit(1); }
%%
When I run this (or my complete version), and pass it the input 39if
, the error rule is matched and the output is ERROR: 39if
, when I'd like it to be:
number: 39
keyword: if
(I.e. the same as if I entered 39 if
as the input.)
Going by the manual, I have a hunch that the cause is that the error rule matches a longer possible input than the number and keyword rules, and flex will prefer it. That said, I have no idea how to resolve this situation. It seems unfeasible to write an explicit regexp that will reject all non-error input, and I don't know how else to write a "catch-all" rule for the sake of handling lexer errors.
UPDATE: I suppose I could just make the catch-all rule be . { exit(1); }
but I'd like to get some nicer debug output than "I got confused on line 1".
39 if
(i.e. separated by white space), or is that illegal? – Moritaifa<bthenx=4
(no spaces between names and keywords), you'll need something more complex. – Steelworks