Using Ocamllex for lexing strings (The Tiger Compiler)
Asked Answered
J

1

7

I'm trying to follow Appel's "Modern Compiler Implementation in ML" and am writing the lexer using Ocamllex.

The specification asks for the lexer to return strings after translating escape sequences. The following code is an excerpt from the ocamllex input file:

 rule tiger = parse
 ...
 | '"'
     { let buffer = Buffer.create 1 in
       STRING (stringl buffer lexbuf)
     }
 and  stringl buffer = parse
 | '"' { Buffer.contents buffer }
 | "\\t" { Buffer.add_char buffer '\t'; stringl buffer lexbuf }
 | "\\n" { Buffer.add_char buffer '\n'; stringl buffer lexbuf }
 | "\\n" { Buffer.add_char buffer '\n'; stringl buffer lexbuf }
 | '\\' '"' { Buffer.add_char buffer '"'; stringl buffer lexbuf }
 | '\\' '\\' { Buffer.add_char buffer '\\'; stringl buffer lexbuf }
 | eof { raise End_of_file }
 | _ as char { Buffer.add_char buffer char; stringl buffer lexbuf }

Is there a better way?

Juliannjulianna answered 26/4, 2011 at 16:56 Comment(0)
H
5

You may be interested in looking at how the Ocaml lexer does this (search for and string). In essence, it's the same method as yours, without the nice local buffer (I find your code nicer on this point, but this is a bit less efficient), a bit more complex because more escaping is supported, and using an escape table (char_for_backslash) to factorize similar rules.

Also, you have the rule "\\n" repeated twice, and I think 1 is a very pessimistic estimate of your string length, I would rather use 20 here (to avoid needless resizing).

Heshvan answered 26/4, 2011 at 17:9 Comment(2)
Thanks! A goldmine of examples. I never thought the Ocaml compiler code would be so clear.Juliannjulianna
@Juliannjulianna you should also keep in mind that some of the code there is quite old, so not everything reflects the current best practice of OCaml development.Heshvan

© 2022 - 2024 — McMap. All rights reserved.