Better way for parser combinators in C?
Asked Answered
E

5

6

I'm trying to bootstrap (a subset of) C from scratch, without using extra dependencies (parser generators, libraries, etc.). Also I want to make use of the idea of parser combinators, which is a fantastic technique in functional programming. I would like to borrow this idea from the functional world to procedural C, in a concise and practical way.

I tried to implement some necessary parser combinators for the following toy grammar, which is also an example from the book, Implementing Functional Languages - a tutorial, of Simon Peyton Jones.

greeting -> hg person "!"
hg       -> "hello"
          | "goodbye"

where person is any token beginning with a letter. For example, the token list

["goodbye", "James", "!"]

is parsed into

[(("goodbye", "James"), ["!"])]

(The book uses Haskell, and it's hard to make it language-agnostic, but you get the idea :-)

I implemented this in C, and you can view the code here: https://gist.github.com/4451478

This implementation costs 200+ lines of C code, which is far more than the ~20 lines of Haskell as written in the book. So I'm not sure whether I'm on the right track of doing parser combinators in C, and if there's any possible improvements. Any suggestions are welcomed. Thanks in advance.

Eating answered 4/1, 2013 at 10:27 Comment(6)
You could start using some library that provides higher-level data structures. I favor glib.Ledesma
@Ledesma I've updated my question with more details. I'm aware of the data structures, but that's apparently not what I'm asking for in this question. The keyword is parser combinators here.Eating
I realize that, which is why I only commented instead of posting an answer. You did spend some words in the question pointing out the number of lines of code, which made it sound like something you consider to be a problem, too.Ledesma
I you are aware of the limitations of the core of C and don't want to hear about implementations of lists etc, then what is your question. As your "question" stands, it looks more like "I am not happy with C". This is not a question.Vestment
@Ledesma Thanks anyway, since glib is really good :-)Eating
@AmigableClarkKant Sorry, I was in a hurry writing this question. I have cleaned it up and hope it can be understood better now.Eating
V
3

Try Cesium3 which is an implementation of parser combinators for C. (LLVM.)

Vestment answered 4/1, 2013 at 10:48 Comment(3)
I've updated my question with more details. I'm aware of the data structures, but that's apparently not what I'm asking for in this question. The keyword is parser combinators here.Eating
@XiaoJia, I updated my answer, but not sure a library is what you want?Vestment
I've done something similar with: github.com/leblancmeneses/NPEG/tree/master/Languages/npeg_cSophia
G
8

I'm looking into the subject myself and I'm following the work of Daniel Holden, author of mpc , a very well written parser combinator library for C, which allows, among other things, to embed EBNF and Regex inside C code:

  mpc_parser_t *Expr  = mpc_new("expression");
  mpc_parser_t *Prod  = mpc_new("product");
  mpc_parser_t *Value = mpc_new("value");
  mpc_parser_t *Maths = mpc_new("maths");

  mpca_lang(MPCA_LANG_PREDICTIVE,
    " expression : <product> (('+' | '-') <product>)*; "
    " product : <value>   (('*' | '/')   <value>)*;    "
    " value : /[0-9]+/ | '(' <expression> ')';         "
    " maths : /^/ <expression> /$/;                    "
    Expr, Prod, Value, Maths, NULL);

Daniel Holden, also, has written an online book where he demonstrated how it's easy to write a new language using his library. The book is entitled "Build your own Lisp". I think you will find this really useful for your project. Last but not least, in the examples of the library, there is a ready-made program which generate a parser for a subset of C. ;-)

Gaily answered 2/9, 2014 at 1:57 Comment(0)
V
3

Try Cesium3 which is an implementation of parser combinators for C. (LLVM.)

Vestment answered 4/1, 2013 at 10:48 Comment(3)
I've updated my question with more details. I'm aware of the data structures, but that's apparently not what I'm asking for in this question. The keyword is parser combinators here.Eating
@XiaoJia, I updated my answer, but not sure a library is what you want?Vestment
I've done something similar with: github.com/leblancmeneses/NPEG/tree/master/Languages/npeg_cSophia
P
3

Implementing parser combinator in C is a topic that also interests me, and recently, I wrote a parser combinator in C: https://github.com/petercommand/cparc

The following is a test case from my code that tries to parse comma separated numbers into a list of numbers. I use a list of parsers (and generate the parser from the 'list of parser' by calling parser_chain in the code) to mimic the 'do notation' in Haskell, though not as elegant.

parser_dp_return test_parser7_rest_dp(dynamic_parser_closure* ctx, input_t input) {
  parser_dp_return dp_ret;
  dp_ret.obj = ctx->objs[1]->obj;
  dp_ret.status = PARSER_NORMAL;
  dp_ret.i = input;
  dp_ret.discard_obj_callback = NULL;
  return dp_ret;
}

parser_dp_return test_parser7_full_dp(dynamic_parser_closure* ctx, input_t input) {
  parser_dp_return dp_ret;
  list* result = list_new();
  list_push_back(result, ctx->objs[0]->obj);//num
  if(ctx->objs[1] && ctx->objs[1]->obj) {
    list_append(result, ctx->objs[1]->obj);
  }
  dp_ret.status = PARSER_NORMAL;
  dp_ret.i = input;
  dp_ret.discard_obj_callback = (void (*)(void *))&list_delete;

  dp_ret.obj = result;
  return dp_ret;
}

bool test_parser7() {//comma separated values
  parser* number = num();
  parser* comma = symbol(',');
  parser* rest_parser = parser_chain_final(test_parser7_rest_dp);
  list* parser_chain_list = list_new();
  list_push_back(parser_chain_list, comma);//ctx 0
  list_push_back(parser_chain_list, number);//ctx 1
  list_push_back(parser_chain_list, rest_parser);

  parser* rest = parser_chain(parser_chain_list);
  list_delete(parser_chain_list);
  parser* many_rest = many(rest);

  list* parser_chain_full = list_new();
  list_push_back(parser_chain_full, number);//ctx 0
  list_push_back(parser_chain_full, many_rest);//ctx 1
  parser* full_parser = parser_chain_final(test_parser7_full_dp);
  list_push_back(parser_chain_full, full_parser);
  parser* final = parser_chain(parser_chain_full);

  const char* input = "1,20,300,4000,50000";
  input_t i;
  input_init(&i, input);
  parser_dp_return dp_ret = parse(final, i);
  parser_delete(number);
  parser_delete(comma);
  parser_delete(rest_parser);
  parser_delete(rest);
  parser_delete(many_rest);
  parser_delete(full_parser);
  parser_delete(final);
  bool result = true;
  test_true(&result, dp_ret.status == PARSER_NORMAL);
  list* l = dp_ret.obj;
  list_item* li = l->head;
  test_true(&result, ptr_to_int(li->item) == 1);
  li = li->next;
  test_true(&result, ptr_to_int(li->item) == 20);
  li = li->next;
  test_true(&result, ptr_to_int(li->item) == 300);
  li = li->next;
  test_true(&result, ptr_to_int(li->item) == 4000);
  li = li->next;
  test_true(&result, ptr_to_int(li->item) == 50000);
  return result;
}
Pictor answered 1/2, 2016 at 18:9 Comment(0)
C
0

I think your C code is reasonably compact for what you're doing. Haskell is just more compact for this sort of thing. Start with closures. You need functions closed over the surrounding scope to do this, and your code simulates those partially. Haskell has compact notation for lists, and properly used they are pretty good for trees like AST's, and C requires you to build your own AST library and work with "->left" and "->right" or else try to wrap those in concise macros.

Even in the C language implementations I've seen, and the unrefined C++ implementation I've written myself, I think parser combinators are a satisfying alternative to recursive descent code, which has been my parsing method of choice in the past.

Candace answered 13/4, 2013 at 19:51 Comment(0)
M
-3

Wondering, why not use something like Yacc or Bison?

I have some experience with LALR grammar in Erlang and looks quite useful to me. Much less lines of code.

Cheers...

http://www.erlang.org/doc/man/yecc.html

Massive answered 4/1, 2013 at 10:37 Comment(1)
I've been using these tools for years. What I'm asking for is whether I can implement parser combinators, which is an idea from functional programming, better in C, instead of using those old tools.Eating

© 2022 - 2024 — McMap. All rights reserved.