OCamlLex case-insenstitive

Asked 28/1, 2016 at 17:50 Answered 29/1, 2016 at 16:8

Is there a way to have case in-sensitive token in Ocamllex specification? I already tried to make case in-sensitive token in this way:

let token = parser
    ...
   | ['C''c']['A''a']['S''s']['E''e'] { CASE }
    ...

but I'm searching for something else, if exists.

Franklinfranklinite answered 28/1, 2016 at 17:50 Comment(1)

show us what youve tried – Audette 28/1, 2016 at 18:2

Use an ordinary token lexer that accepts both lower- and upper-case, and look up keywords in a table, ignoring case:

{
type token = Case | Test | Ident of string

let keyword_tbl = Hashtbl.create 64

let _ = List.iter (fun (name, keyword) ->
    Hashtbl.add keyword_tbl name keyword) [
    "case", Case;
    "test", Test;
  ]
}

let ident_char = ['a'-'z' 'A'-'Z' '_']

rule next_token = parse
  | ident_char+ as s {
      let canon = String.lowercase s in
      try Hashtbl.find keyword_tbl canon
      with Not_found ->
        (* `Ident canon` if you want case-insensitive vars as well
         * as keywords *)
        Ident s
    }

Befuddle answered 29/1, 2016 at 4:38 Comment(0)

As proposed by @gsg, the correct way to solve this is to use an ordinary token lexer that accepts both lower- and upper-case, and then look up keywords in a table. Having a regexp for each keyword is actually an anti-pattern mentioned by the documentation of ocamllex:

12.7 Common errors

ocamllex: transition table overflow, automaton is too big The deterministic automata generated by ocamllex are limited to at most 32767 transitions. The message above indicates that your lexer definition is too complex and overflows this limit. This is commonly caused by lexer definitions that have separate rules for each of the alphabetic keywords of the language, as in the following example. […]

http://caml.inria.fr/pub/docs/manual-ocaml-4.00/manual026.html#toc111

The documentation continues with an explicit code example and a rewrite using a lookup table.

The lookup-table should encapsulate the “case-insensitivity” of lookups, we should therefore use a functorial associative structure (read: Map or HashTbl) that allows us to define our own comparison function. Since our lookup-table is likely to be immutable, we pick the Map:

{
type token = Case | Test | Ident of string

module KeywordTable =
  Map.Make(struct
    type t = string
    let compare a b =
      String.(compare (lowercase a) (lowercase b))
  end)

let keyword_table =
  List.fold_left
    (fun (k, v) -> KeywordTable.add k v))
    [
      "case", Case;
      "test", Test;
    ]
    KeywordTable.empty
}

let ident_char = ['a'-'z' 'A'-'Z' '_']

rule next_token = parse
  | ident_char+ as s {
      try KeywordTable.find keyword_table s
      with Not_found -> Ident s
    }

Promontory answered 29/1, 2016 at 16:8 Comment(0)

12.7 Common errors

Recommended topics

Hot tags