I'm a complete newbie when it comes to OCaml. I've only recently started using the language (about 2 weeks ago), but unfortunately, I've been tasked with making a syntax analyzer (parser + lexer, whose function is to either accept or not a sentence) for a made up language using Menhir. Now, I've found some materials on the internet regarding OCaml and Menhir:
The Menhir Manual.
This webpage for some French University course.
A short Menhir tutorial on Toss's homepage at Sourceforge.
A Menhir example on github by derdon.
A book on OCaml (with a few things about ocamllex+ocamlyacc
A random ocamllex tutorial by SooHyoung Oh.
And the examples that come with Menhir's source code.
(I can't put more than two hyperlinks, so I can't link you directly to some of the websites I'm mentioning here. Sorry!)
So, as you can see, I've been desperately searching for more and more material to aid me in the making of this program. Unfortunately, I still cannot grasp many concepts, and as such, I'm having many, many difficulties.
For starters, I have no idea how to correctly compile my program. I've been using the following command:
ocamlbuild -use-menhir -menhir "menhir --external-tokens Tokens" main.native
My program is divided in four different files: main.ml; lexer.mll; parser.mly; tokens.mly. main.ml is the part that gets input from a file in the file system given as an argument.
let filename = Sys.argv.(1)
let () =
let inBuffer = open_in filename in
let lineBuffer = Lexing.from_channel inBuffer in
try
let acceptance = Parser.main Lexer.main lineBuffer in
match acceptance with
| true -> print_string "Accepted!\n"
| false -> print_string "Not accepted!\n"
with
| Lexer.Error msg -> Printf.fprintf stderr "%s%!\n" msg
| Parser.Error -> Printf.fprintf stderr "At offset %d: syntax error.\n%!" (Lexing.lexeme_start lineBuffer)
The second file is lexer.mll.
{
open Tokens
exception Error of string
}
rule main = parse
| [' ' '\t']+
{ main lexbuf }
| ['0'-'9']+ as integer
{ INT (int_of_string integer) }
| "True"
{ BOOL true }
| "False"
{ BOOL false }
| '+'
{ PLUS }
| '-'
{ MINUS }
| '*'
{ TIMES }
| '/'
{ DIVIDE }
| "def"
{ DEF }
| "int"
{ INTTYPE }
| ['A'-'Z' 'a'-'z' '_']['0'-'9' 'A'-'Z' 'a'-'z' '_']* as s
{ ID (s) }
| '('
{ LPAREN }
| ')'
{ RPAREN }
| '>'
{ LARGER }
| '<'
{ SMALLER }
| ">="
{ EQLARGER }
| "<="
{ EQSMALLER }
| "="
{ EQUAL }
| "!="
{ NOTEQUAL }
| '~'
{ NOT }
| "&&"
{ AND }
| "||"
{ OR }
| '('
{ LPAREN }
| ')'
{ RPAREN }
| "writeint"
{ WRITEINT }
| '\n'
{ EOL }
| eof
{ EOF }
| _
{ raise (Error (Printf.sprintf "At offset %d: unexpected character.\n" (Lexing.lexeme_start lexbuf))) }
The third file is parser.mly.
%start <bool> main
%%
main:
| WRITEINT INT { true }
The fourth one is tokens.mly
%token <string> ID
%token <int> INT
%token <bool> BOOL
%token EOF EOL DEF INTTYPE LPAREN RPAREN WRITEINT
%token PLUS MINUS TIMES DIVIDE
%token LARGER SMALLER EQLARGER EQSMALLER EQUAL NOTEQUAL
%token NOT AND OR
%left OR
%left AND
%nonassoc NOT
%nonassoc LARGER SMALLER EQLARGER EQSMALLER EQUAL NOTEQUAL
%left PLUS MINUS
%left TIMES DIVIDE
%nonassoc LPAREN
%nonassoc ATTRIB
%{
type token =
| ID of (string)
| INT
| BOOL
| DEF
| INTTYPE
| LPAREN
| RPAREN
| WRITEINT
| PLUS
| MINUS
| TIMES
| DIVIDE
| LARGER
| SMALLER
| EQLARGER
| EQSMALLER
| EQUAL
| NOTEQUAL
| NOT
| AND
| OR
| EOF
| EOL
%}
%%
Now, I know there is a lot of unused symbols here, but I intend to use them in my parser. No matter how many changes I make to the files, the compiler keeps blowing up on my face. I have tried everything I can think of, and nothing seems to work. What is it that is making ocamlbuild explode in a plethora of errors of unbound constructors and non-defined start symbols? What command should I be using to compile the program properly? Where can I find meaningful materials to learn about Menhir?
mly
is much simpler. I haven't proposed that solution in my answer because I was assuming that @Jerrold wanted to use the "separate compilation of parsing units" feature of menhir. – Eris