How to parse reserved words correctly in boost spirit
Asked Answered
A

2

6

I'm trying to parse a sequence of the syntax: < direction > < type > < name >. For example:

in float foo

where the direction can be either in, out, or in_out. I've succeeded in parsing correct text by using a qi::symbols class to convert the direction keywords to an enum.

However, the problem shows when I don't have correct text. Take the example:

int foo

The symbol table parser will except the 'in' part of the 'int' type and so the results will be:

direction: in
type: t
name: foo

And the error is not detected. What's the best way to be able to parse the in, out and in_out reserved words and ensure that they are followed by a non-identifier character so that the 'int' part of the previous text fails?

Thanks

Acatalectic answered 21/11, 2013 at 20:23 Comment(0)
B
5

In addition to the "manual" approach suggested by Mike you can

  1. use a convenience wrapper rule
  2. use the distinct parser direetive from the Spirit Repository

1. Use a convenience wrapper

I just remembered, I once came up with this quick and dirty helper:

static const qi::rule<It, qi::unused_type(const char*)> kw 
      = qi::lit(qi::_r1) >> !qi::alnum;

Which you could use like (using +"lit" to decay the array-ref into const char*):

stmt = 
         kw(+"if") >> '(' >> expr >> ')' >> block
     >> -(kw(+"else") >> block)
     ;

You can make it considerably more convenient

template <std::size_t N>
static auto kw(char const (&keyword)[N]) -> qi::rule<Iterator> {
    // qi::lit has problems with char arrays, use pointer instead.
    return qi::lit(+keyword) >> !qi::alnum;
}

So you can

kw_if   = kw("if");
kw_then = kw("then");
kw_else = kw("else");
kw_and  = kw("and");
kw_or   = kw("or");
kw_not  = kw("not");

2. Use the distinct directive from the Spirit Repository

In addition to the "manual" approach suggested by Mike you can use the distinct parser directive from the Spirit Repository:

int main()
{
    using namespace spirit_test;
    using namespace boost::spirit;

    {
        using namespace boost::spirit::ascii;

        qi::rule<char const*, space_type> r;
        r = distinct::keyword["description"] >> -lit(':') >> distinct::keyword["ident"];

        BOOST_TEST(test("description ident", r, space));
        BOOST_TEST(test("description:ident", r, space));
        BOOST_TEST(test("description: ident", r, space));
        BOOST_TEST(!test("descriptionident", r, space));
    }

    return boost::report_errors();
}
Berkshire answered 21/11, 2013 at 22:10 Comment(4)
But I would rather use a symbol table to map the keywords to values (which I store as an enum in an AST). This method would require me to write action semantics on every check? Could I wrap distinct::keyword around a symbol table parser?Acatalectic
Of course! Have you, like, tried it? coliru.stacked-crooked.com/a/6956a8e9c18553d2Berkshire
And, just for fun, here's the kw() approach to the symbols parser: coliru.stacked-crooked.com/a/c949b4cd44f3c20eBerkshire
Thanks for the comments - I'm busy with other stuff right now but I will test it out and get back to you!Acatalectic
C
3

You can use the and predicate or the not predicate parser, depending on what you would like to express. The predicate parsers just check the next symbols but don't consume them.

This says, you expect a blank (space or tab) afterwards:

rule = symbol_parser >> &qi::blank;

This says, you don't want to have a letter, number or underscore afterwards:

rule = symbol_parser >> !(qi::alnum | qi::lit("_"));
Ceroplastics answered 21/11, 2013 at 21:34 Comment(1)
+1 and there are ways (in the Spirit code base) to automate thatBerkshire

© 2022 - 2024 — McMap. All rights reserved.