As proposed by @gsg, the correct way to solve this is to use an ordinary token lexer that accepts both lower- and upper-case, and then look up keywords in a table. Having a regexp for each keyword is actually an anti-pattern mentioned by the documentation of ocamllex:
12.7 Common errors
ocamllex: transition table overflow, automaton is too big
The deterministic automata generated by ocamllex are limited to at most 32767 transitions. The message above indicates that your lexer definition is too complex and overflows this limit. This is commonly caused by lexer definitions that have separate rules for each of the alphabetic keywords of the language, as in the following example. […]
http://caml.inria.fr/pub/docs/manual-ocaml-4.00/manual026.html#toc111
The documentation continues with an explicit code example and a rewrite using a lookup table.
The lookup-table should encapsulate the “case-insensitivity” of lookups, we should therefore use a functorial associative structure (read: Map or HashTbl) that allows us to define our own comparison function. Since our lookup-table is likely to be immutable, we pick the Map:
{
type token = Case | Test | Ident of string
module KeywordTable =
Map.Make(struct
type t = string
let compare a b =
String.(compare (lowercase a) (lowercase b))
end)
let keyword_table =
List.fold_left
(fun (k, v) -> KeywordTable.add k v))
[
"case", Case;
"test", Test;
]
KeywordTable.empty
}
let ident_char = ['a'-'z' 'A'-'Z' '_']
rule next_token = parse
| ident_char+ as s {
try KeywordTable.find keyword_table s
with Not_found -> Ident s
}