I am messing around writing a toy programming language in OCaml with ocamllex, and was trying to make the language sensitive to indentation changes, python-style, but am having a problem matching the beginning of a line with ocamllex's regex rules. I am used to using ^
to match the beginning of a line, but in OCaml that is the string concat operator. Google searches haven't been turning up much for me unfortunately :( Anyone know how this would work?
OCamllex matching beginning of line?
Grammar for the Regular Expressions can be found here; caml.inria.fr/pub/docs/manual-ocaml/manual026.html#regexp –
Hallucinatory
yea i saw that, I was hoping that i was just missing something and that there was more than that. Thanks! –
Progenitive
No problem. It's definitely a natural assumption based on their claim to be similar to Lex and knowledge of general regular expression languages. I'm not sure why they didn't include it; this could be something to ask the fa.caml google group. –
Hallucinatory
I'm not sure if there is explicit support for zero-length matching symbols (like ^
in Perl-style regular expressions, which matches a position rather than a substring). However, you should be able to let your lexer turn newlines into an explicit token, something like this:
parser.mly
%token EOL
%token <int> EOLWS
% other stuff here
%%
main:
EOL stmt { MyStmtDataType(0, $2) }
| EOLWS stmt { MyStmtDataType($1 - 1, $2) }
;
lexer.mll
{
open Parser
exception Eof
}
rule token = parse
[' ' '\t'] { token lexbuf } (* skip other blanks *)
| ['\n'][' ']+ as lxm { EOLWS(String.length(lxm)) }
| ['\n'] { EOL }
(* ... *)
This is untested, but the general idea is:
- Treat newlines as staetment 'starters'
- Measure whitespace that immediately follows the newline and pass its length as an
int
Caveat: you will need to preprocess your input to start with a single \n
if it doesn't contain one.
© 2022 - 2024 — McMap. All rights reserved.