Feed ocamlyacc parser from explicit token list?
Asked Answered
L

3

6

Is it possible to feed an OCamlYacc-generated parser an explicit token list for analysis?

I'd like to use OCamlLex to explicitly generate a token list which I then analyze using a Yacc-generated parser later. However, the standard use case generates a parser that calls a lexer implicitly for the next token. Here tokens are computed during the yacc analysis rather than before. Conceptually a parser should only work on tokens but a Yacc-generated parser provides an interface that relies on a lexer which in my case I don't need.

Lyford answered 5/6, 2012 at 14:49 Comment(0)
R
5

If you already have a list of tokens, you can just go the ugly way and ignore the lexing buffer altogether. After all, the parse-from-lexbuf function that your parser expects is a non-pure function :

let my_tokens = ref [ (* WHATEVER *) ]
let token lexbuf = 
  match !my_tokens with 
    | []     -> EOF 
    | h :: t -> my_tokens := t ; h 

let ast = Parser.parse token (Lexbuf.from_string "")

On the other hand, it looks from your comments that you actually have a function of type Lexing.lexbuf -> token list that you're trying to fit into the Lexing.lexbuf -> token signature of your parser. If that is the case, you can easily use a queue to write a converter between the two types:

let deflate token = 
  let q = Queue.create () in
  fun lexbuf -> 
    if not (Queue.is_empty q) then Queue.pop q else   
      match token lexbuf with 
        | [   ] -> EOF 
        | [tok] -> tok
        | hd::t -> List.iter (fun tok -> Queue.add tok q) t ; hd 

let ast = Parser.parse (deflate my_lexer) lexbuf
Reactance answered 5/6, 2012 at 17:28 Comment(1)
A yacc-generated parser provides semantic actions access to position information hidden in the lexbuf state. I am therefore not sure that deflate will work but it gives me an idea.Lyford
H
7

As already mentioned by Jeffrey, Menhir specifically offers, as part of its runtime library, a module to the parsers with any kind of token stream (it just asks for a unit -> token function): MenhirLib.Convert.

(You could even use this code without using Menhir, with ocamlyacc instead. In practice the conversion is not terribly complicated so you could even re-implement it yourself.)

Husha answered 5/6, 2012 at 19:35 Comment(1)
That's very useful. I never used Menhir but the manual looks very convincing (I had overlooked the Convert module, though) and its author is well respected in the OCaml community.Lyford
R
5

If you already have a list of tokens, you can just go the ugly way and ignore the lexing buffer altogether. After all, the parse-from-lexbuf function that your parser expects is a non-pure function :

let my_tokens = ref [ (* WHATEVER *) ]
let token lexbuf = 
  match !my_tokens with 
    | []     -> EOF 
    | h :: t -> my_tokens := t ; h 

let ast = Parser.parse token (Lexbuf.from_string "")

On the other hand, it looks from your comments that you actually have a function of type Lexing.lexbuf -> token list that you're trying to fit into the Lexing.lexbuf -> token signature of your parser. If that is the case, you can easily use a queue to write a converter between the two types:

let deflate token = 
  let q = Queue.create () in
  fun lexbuf -> 
    if not (Queue.is_empty q) then Queue.pop q else   
      match token lexbuf with 
        | [   ] -> EOF 
        | [tok] -> tok
        | hd::t -> List.iter (fun tok -> Queue.add tok q) t ; hd 

let ast = Parser.parse (deflate my_lexer) lexbuf
Reactance answered 5/6, 2012 at 17:28 Comment(1)
A yacc-generated parser provides semantic actions access to position information hidden in the lexbuf state. I am therefore not sure that deflate will work but it gives me an idea.Lyford
D
1

The OCamlYacc interface does look pretty complicated; it seems to require a Lexing.lexbuf. Maybe you could consider using Lexing.from_string to feed a fixed string rather than a fixed sequence of tokens. You could also look at Menhir. I haven't used it, but it gets excellent reviews here whenever anybody mentions OCaml parser generators. It might have a more flexible lexing interface.

Drear answered 5/6, 2012 at 15:12 Comment(1)
Menhir looks like a good alternative to OCamlYacc for many reasons. However, it also seems pretty closely tied to the lexer. Lexing.from_string is not an alternative because the essential problem is that some lexer actions generated two tokens rather than one because I can only recognize the token following an arbitrary string and end up with a string token and the one following it. Hence I had planned on building up a list of tokens first. Maybe I have to introduce weird hybrid tokens to work around the limitation.Lyford

© 2022 - 2024 — McMap. All rights reserved.