OCaml + Menhir Compiling/Writing
Asked Answered
J

3

14

I'm a complete newbie when it comes to OCaml. I've only recently started using the language (about 2 weeks ago), but unfortunately, I've been tasked with making a syntax analyzer (parser + lexer, whose function is to either accept or not a sentence) for a made up language using Menhir. Now, I've found some materials on the internet regarding OCaml and Menhir:

The Menhir Manual.

This webpage for some French University course.

A short Menhir tutorial on Toss's homepage at Sourceforge.

A Menhir example on github by derdon.

A book on OCaml (with a few things about ocamllex+ocamlyacc

A random ocamllex tutorial by SooHyoung Oh.

And the examples that come with Menhir's source code.

(I can't put more than two hyperlinks, so I can't link you directly to some of the websites I'm mentioning here. Sorry!)

So, as you can see, I've been desperately searching for more and more material to aid me in the making of this program. Unfortunately, I still cannot grasp many concepts, and as such, I'm having many, many difficulties.

For starters, I have no idea how to correctly compile my program. I've been using the following command:

ocamlbuild -use-menhir -menhir "menhir --external-tokens Tokens" main.native

My program is divided in four different files: main.ml; lexer.mll; parser.mly; tokens.mly. main.ml is the part that gets input from a file in the file system given as an argument.

let filename = Sys.argv.(1)

let () =
    let inBuffer = open_in filename in
    let lineBuffer = Lexing.from_channel inBuffer in
    try
        let acceptance = Parser.main Lexer.main lineBuffer in
        match acceptance with
            | true -> print_string "Accepted!\n"
            | false -> print_string "Not accepted!\n"
    with
        | Lexer.Error msg -> Printf.fprintf stderr "%s%!\n" msg
        | Parser.Error -> Printf.fprintf stderr "At offset %d: syntax error.\n%!" (Lexing.lexeme_start lineBuffer)

The second file is lexer.mll.

{
  open Tokens
  exception Error of string
}

rule main = parse
  | [' ' '\t']+
      { main lexbuf }
  | ['0'-'9']+ as integer
      { INT (int_of_string integer) }
  | "True"
      { BOOL true }
  | "False"
      { BOOL false }
  | '+'
      { PLUS }
  | '-'
      { MINUS }
  | '*'
      { TIMES }
  | '/'
      { DIVIDE }
  | "def"
      { DEF }
  | "int"
      { INTTYPE }
  | ['A'-'Z' 'a'-'z' '_']['0'-'9' 'A'-'Z' 'a'-'z' '_']* as s
      { ID (s) }
  | '('
      { LPAREN }
  | ')'
      { RPAREN }
  | '>'
      { LARGER }
  | '<'
      { SMALLER }
  | ">="
      { EQLARGER }
  | "<="
      { EQSMALLER }
  | "="
      { EQUAL }
  | "!="
      { NOTEQUAL }
  | '~'
      { NOT }
  | "&&"
      { AND }
  | "||"
      { OR }
  | '('
      { LPAREN }
  | ')'
      { RPAREN }
  | "writeint"
      { WRITEINT }
  | '\n'
      { EOL }
  | eof
      { EOF }
  | _
      { raise (Error (Printf.sprintf "At offset %d: unexpected character.\n" (Lexing.lexeme_start lexbuf))) }

The third file is parser.mly.

%start <bool> main
%%

main:
| WRITEINT INT { true }

The fourth one is tokens.mly

%token <string> ID
%token <int> INT
%token <bool> BOOL
%token EOF EOL DEF INTTYPE LPAREN RPAREN WRITEINT
%token PLUS MINUS TIMES DIVIDE
%token LARGER SMALLER EQLARGER EQSMALLER EQUAL NOTEQUAL
%token NOT AND OR

%left OR
%left AND
%nonassoc NOT
%nonassoc LARGER SMALLER EQLARGER EQSMALLER EQUAL NOTEQUAL
%left PLUS MINUS
%left TIMES DIVIDE
%nonassoc LPAREN
%nonassoc ATTRIB

%{
type token =
  | ID of (string)
  | INT
  | BOOL
  | DEF
  | INTTYPE
  | LPAREN
  | RPAREN
  | WRITEINT
  | PLUS
  | MINUS
  | TIMES
  | DIVIDE
  | LARGER
  | SMALLER
  | EQLARGER
  | EQSMALLER
  | EQUAL
  | NOTEQUAL
  | NOT
  | AND
  | OR
  | EOF
  | EOL
%}

%%

Now, I know there is a lot of unused symbols here, but I intend to use them in my parser. No matter how many changes I make to the files, the compiler keeps blowing up on my face. I have tried everything I can think of, and nothing seems to work. What is it that is making ocamlbuild explode in a plethora of errors of unbound constructors and non-defined start symbols? What command should I be using to compile the program properly? Where can I find meaningful materials to learn about Menhir?

Jerrold answered 27/3, 2012 at 20:39 Comment(0)
Z
10

A simpler way to do this is to remove the Parser/Tokens separation. As Thomas noted, there is no need for a declaration type token = ..., because it is automatically produced by menhir from the %token directives.

So you can define parser.mly as:

%start <bool> main

%token <string> ID
%token <int> INT
%token <bool> BOOL
%token EOF EOL DEF INTTYPE LPAREN RPAREN WRITEINT
%token PLUS MINUS TIMES DIVIDE
%token LARGER SMALLER EQLARGER EQSMALLER EQUAL NOTEQUAL
%token NOT AND OR

%left OR
%left AND
%nonassoc NOT
%nonassoc LARGER SMALLER EQLARGER EQSMALLER EQUAL NOTEQUAL
%left PLUS MINUS
%left TIMES DIVIDE
%nonassoc LPAREN
%nonassoc ATTRIB
%%

main:
| WRITEINT INT { true }

and lexer.mll as:

{
  open Parser
  exception Error of string
}

[...] (* rest of the code not shown here *)

then remove tokens.mly, and compile with

ocamlbuild -use-menhir main.native

and it all works well.

Zackzackariah answered 28/3, 2012 at 4:34 Comment(2)
Indeed, having only one mly is much simpler. I haven't proposed that solution in my answer because I was assuming that @Jerrold wanted to use the "separate compilation of parsing units" feature of menhir.Eris
Thanks for all the help, guys, you've no idea how precious your posts were for me! Finally, things are starting to make some sense.Jerrold
E
7

So first, you don't need to repet the tokens in tokens.mly:

%token <string> ID
%token <int> INT
%token <bool> BOOL
%token EOF EOL DEF INTTYPE LPAREN RPAREN WRITEINT
%token PLUS MINUS TIMES DIVIDE
%token LARGER SMALLER EQLARGER EQSMALLER EQUAL NOTEQUAL
%token NOT AND OR

%left OR
%left AND
%nonassoc NOT
%nonassoc LARGER SMALLER EQLARGER EQSMALLER EQUAL NOTEQUAL
%left PLUS MINUS
%left TIMES DIVIDE
%nonassoc LPAREN
%nonassoc ATTRIB

%%

Then, I don't know the magic option to pass to ocamlbuild and I don't know menhir very well, but, in my understanding you need to "pack" all the .mly into one parser unit:

menhir tokens.mly parser.mly -base parser

Then, if you replace any occurrence of Token byt Parser in lexer.mll, ocamlbuild -no-hygiene main.byte should work. Note however that there is maybe a clever way to do it.

Eris answered 27/3, 2012 at 21:26 Comment(0)
H
1

I ran into the same problem, except that in addition the parser needed modules outside of the current direct. I couldn't figure out how to invoke ocamlbuild to specify that parser.{ml,mli} had to be built from 3 mly files, so I simply made a makefile that:

  • copies the modules .cmi from _build into the current directory (to satisfy menhir --infer)
  • invoke menhir
  • remove the copied modules to satisfy ocamlbuild
  • then invoke ocamlbuild

I am not satisfied with it, so I am interested in any better alternative, but if you really have to finish your project with minimal effort, I guess that's the way to go

edit: Actually, there is no need to copy and remove the compiled modules, just pass the option to menhir at the second step: menhir --ocamlc "ocamlc -I \"../_build/modules/\"" --infer --base parser

Sadly, this stills means that the parser generation will be wrt the previous compilation of the modules, hence an unnecessary (and failed) first compilation is to be expected.

Haematozoon answered 1/3, 2013 at 9:42 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.