How to parse text for a DSL at compile time?
Asked Answered
G

3

35

Yes. That's right. I want to be able to paste an expression like:

"a && b || c"

directly into source code as a string:

const std::string expression_text("a && b || c");

Create a lazily evaluated structure with it:

Expr expr(magical_function(expression_text));

then later on evaluate substituting in known values:

evaluate(expr, a, b, c);

I'd want to expand this little DSL later so does something a little more complicated using some non-C++ syntax so I can't simply hardcode my expression the simple way. The use case is that I'll be able to copy and paste the same logic from another module used in a different development area for another language rather than have to adapt it each time to follow C++ syntax.

If someone can get me started on at least how to do the above simple concept of 1 expression and 2 boolean operators that would be really appreciated.

Note: I posted this question due to feedback from another question I posted: How to parse DSL input to high performance expression template. Here I actually wanted an answer to a slightly different problem, but the comments provoked this specific question that I thought was worth posting as the potential answers are really worth documenting.

Genitourinary answered 22/7, 2013 at 8:56 Comment(4)
It is possible (in quite a clumsy way) in C++11, using the variadic templates: cpp-next.com/archive/2012/10/…Gardener
@Gardener this is precisely what cv_and_he has done in his answer using metaparseCabbageworm
C++ templates are Turing capable. You could define this using them. I don't think that's a good idea. If you really want arbitrary syntax "embedded" in C++, you should use a metaprogramming system capable of processing the C++, extracting your equations, "compiling them" as you see fit, re-inserting back into the C++ and compiling the result. Make files can manage all this for you.Adolescence
Related: youtu.be/HMB9oXFobJc an 90 minute talk on parsing JSON at compile time using constexpr functions and matching data structures. Requires some C++17 I think (at least C++14) and some will to do. Parsing simple expressions should be simpler, but video might be worthwhile nonetheless.Hoxsie
P
49

Disclaimer: I know nothing about metaparse, and very little about proto. The following code is my attempt (mostly via trial and error) to modify this example to do something similar to what you want.

The code can be easily divided in several parts:

1. The grammar


1.1 Token definitions

typedef token < lit_c < 'a' > > arg1_token;
typedef token < lit_c < 'b' > > arg2_token;
typedef token < lit_c < 'c' > > arg3_token;
  • token<Parser>:
    token is a parser combinator that uses Parser to parse the input and then consumes (and discards) all whitespaces afterwards. The result of the parsing is the result of Parser.
  • lit_c<char>:
    lit_c matches the specific char and the result of the parsing is that same char. In the grammar this result is overridden by the use of always.
typedef token < keyword < _S ( "true" ), bool_<true> > > true_token;
typedef token < keyword < _S ( "false" ), bool_<false> > > false_token;
  • keyword<metaparse_string,result_type=undefined>:
    keyword matches the specific metaparse_string (_S("true") returns metaparse::string<'t','r','u','e'> which is what metaparse uses internally to do its magic) and the result of the parsing is result_type.
typedef token < keyword < _S ( "&&" ) > > and_token;
typedef token < keyword < _S ( "||" ) > > or_token;
typedef token < lit_c < '!' > > not_token;

In the case of and_token and or_token the result is undefined and in the grammar below it is ignored.


1.2 "Rules" of the grammar

struct paren_exp;

First paren_exp is forward-declared.

typedef one_of< 
        paren_exp, 
        transform<true_token, build_value>,
        transform<false_token, build_value>, 
        always<arg1_token, arg<0> >,
        always<arg2_token, arg<1> >, 
        always<arg3_token, arg<2> > 
    >
    value_exp;
  • one_of<Parsers...>:
    one_of is a parser combinator that tries to match the input to one of its parameters. The result is what the first parser that matches returns.
  • transform<Parser,SemanticAction>:
    transform is a parser combinator that matches Parser. The result type is the result type of Parser transformed by SemanticAction.
  • always<Parser,NewResultType>:
    matches Parser, returns NewResultType.

    The equivalent spirit rule would be:

    value_exp = paren_exp [ _val=_1 ]
        | true_token      [ _val=build_value(_1) ]
        | false_token     [ _val=build_value(_1) ]
        | argN_token      [ _val=phx::construct<arg<N>>() ];
    
typedef one_of< 
        transform<last_of<not_token, value_exp>, build_not>, 
        value_exp
    >
    not_exp;
  • last_of<Parsers...>:
    last_of matches every one of the Parsers in sequence and its result type is the result type of the last parser.

    The equivalent spirit rule would be:

    not_exp = (omit[not_token] >> value_exp) [ _val=build_not(_1) ] 
        | value_exp                          [ _val=_1 ];
    
typedef
foldl_start_with_parser<
        last_of<and_token, not_exp>,
        not_exp,
        build_and
    > and_exp; // and_exp = not_exp >> *(omit[and_token] >> not_exp);

typedef
foldl_start_with_parser<
    last_of<or_token, and_exp>,
    and_exp,
    build_or
> or_exp;     // or_exp = and_exp >> *(omit[or_token] >> and_exp);
  • foldl_start_with_parser<RepeatingParser,InitialParser,SemanticAction>:
    this parser combinator matches InitialParser and then RepeatingParser multiple times until it fails. The result type is the result of mpl::fold<RepeatingParserSequence, InitialParserResult, SemanticAction>, where RepeatingParserSequence is a sequence of the result types of every application of RepeatingParser. If RepeatingParser never succeeds the result type is simply InitialParserResult.

    I believe (xd) that the equivalent spirit rule would be:

    or_exp = and_exp[_a=_1] 
        >> *( omit[or_token] >> and_exp [ _val = build_or(_1,_a), _a = _val ]);  
    
struct paren_exp: middle_of < lit_c < '(' > , or_exp, lit_c < ')' > > {}; 
   // paren_exp = '(' >> or_exp >> ')';
  • middle_of<Parsers...>:
    this matches the sequence of Parsers and the result type is the result of the parser that is in the middle.
typedef last_of<repeated<space>, or_exp> expression; 
   //expression = omit[*space] >> or_exp;
  • repeated<Parser>:
    this parser combinator tries to match Parser multiple times. The result is a sequence of the result types of every application of the parser, if the parser fails on its first try the result is an empty sequence. This rule simply removes any leading whitespace.
typedef build_parser<entire_input<expression> > function_parser;

This line creates a metafunction that accepts an input string and returns the result of parsing.


2. Construction of the expression

Let's look at an example walkthrough of the building of an expression. This is done in two steps: first the grammar constructs a tree that depends on build_or, build_and, build_value, build_not and arg<N>. Once you get that type, you can get the proto expression using the proto_type typedef.

"a || !b"

We start on or_expr:

  • or_expr: We try its InitialParser which is and_expr.
    • and_expr: We try its InitialParser which is not_expr.
      • not_expr: not_token fails so we try value_expr.
        • value_expr: arg1_token succeeds. The return type is arg<0> and we go back to not_expr.
      • not_expr: the return type is not modified at this step. We go back to and_expr.
    • and_expr: We try its RepeatingParser, it fails. and_expr succeeds and its return type is the return type of its InitialParser: arg<0>. We go back to or_expr.
    • or_expr: We try its RepeatingParser, or_token matches, we try and_expr.
    • and_expr: We try its InitialParser not_expr.
      • not_expr: not_token succeeds, we try value_expr.
        • value_expr: arg2_token succeeds. The return type is arg<1> and we go back to not_expr.
      • not_expr: the return type is modified by transform using build_not: build_not::apply< arg<1> >. We go back to and_expr.
    • and_expr: We try its RepeatingParser, it fails. and_expr succeeds and returns build_not::apply< arg<1> >. We go back to or_expr.
  • or_expr: RepeatingParser has succeeded, foldlp uses build_or on build_not::apply< arg<1> > and arg<0>, obtaining build_or::apply< build_not::apply< arg<1> >, arg<0> >.

Once we have this tree constructed we get its proto_type:

build_or::apply< build_not::apply< arg<1> >, arg<0> >::proto_type;
proto::logical_or< arg<0>::proto_type, build_not::apply< arg<1> >::proto_type >::type;
proto::logical_or< proto::terminal< placeholder<0> >::type, build_not::apply< arg<1> >::proto_type >::type;
proto::logical_or< proto::terminal< placeholder<0> >::type, proto::logical_not< arg<1>::proto_type >::type >::type;
proto::logical_or< proto::terminal< placeholder<0> >::type, proto::logical_not< proto::terminal< placeholder<1> >::type >::type >::type;

Full Sample Code (Running on Wandbox)

#include <iostream>
#include <vector>

#include <boost/metaparse/repeated.hpp>
#include <boost/metaparse/sequence.hpp>
#include <boost/metaparse/lit_c.hpp>
#include <boost/metaparse/last_of.hpp>
#include <boost/metaparse/middle_of.hpp>
#include <boost/metaparse/space.hpp>
#include <boost/metaparse/foldl_start_with_parser.hpp>
#include <boost/metaparse/one_of.hpp>
#include <boost/metaparse/token.hpp>
#include <boost/metaparse/entire_input.hpp>
#include <boost/metaparse/string.hpp>
#include <boost/metaparse/transform.hpp>
#include <boost/metaparse/always.hpp>
#include <boost/metaparse/build_parser.hpp>
#include <boost/metaparse/keyword.hpp>

#include <boost/mpl/apply_wrap.hpp>
#include <boost/mpl/front.hpp>
#include <boost/mpl/back.hpp>
#include <boost/mpl/bool.hpp>

#include <boost/proto/proto.hpp>
#include <boost/fusion/include/at.hpp>
#include <boost/fusion/include/make_vector.hpp>

using boost::metaparse::sequence;
using boost::metaparse::lit_c;
using boost::metaparse::last_of;
using boost::metaparse::middle_of;
using boost::metaparse::space;
using boost::metaparse::repeated;
using boost::metaparse::build_parser;
using boost::metaparse::foldl_start_with_parser;
using boost::metaparse::one_of;
using boost::metaparse::token;
using boost::metaparse::entire_input;
using boost::metaparse::transform;
using boost::metaparse::always;
using boost::metaparse::keyword;

using boost::mpl::apply_wrap1;
using boost::mpl::front;
using boost::mpl::back;
using boost::mpl::bool_;


struct build_or
{
    typedef build_or type;

    template <class C, class State>
    struct apply
    {
        typedef apply type;
        typedef typename boost::proto::logical_or<typename State::proto_type, typename C::proto_type >::type proto_type;
    };
};

struct build_and
{
    typedef build_and type;

    template <class C, class State>
    struct apply
    {
        typedef apply type;
        typedef typename boost::proto::logical_and<typename State::proto_type, typename C::proto_type >::type proto_type;
    };
};



template<bool I>
struct value //helper struct that will be used during the evaluation in the proto context
{};

struct build_value
{
    typedef build_value type;

    template <class V>
    struct apply
    {
        typedef apply type;
        typedef typename boost::proto::terminal<value<V::type::value> >::type proto_type;
    };
};

struct build_not
{
    typedef build_not type;

    template <class V>
    struct apply
    {
        typedef apply type;
        typedef typename boost::proto::logical_not<typename V::proto_type >::type proto_type;
    };
};

template<int I>
struct placeholder //helper struct that will be used during the evaluation in the proto context
{};

template<int I>
struct arg
{
    typedef arg type;
    typedef typename boost::proto::terminal<placeholder<I> >::type proto_type;
};

#ifdef _S
#error _S already defined
#endif
#define _S BOOST_METAPARSE_STRING

typedef token < keyword < _S ( "&&" ) > > and_token;
typedef token < keyword < _S ( "||" ) > > or_token;
typedef token < lit_c < '!' > > not_token;

typedef token < keyword < _S ( "true" ), bool_<true> > > true_token;
typedef token < keyword < _S ( "false" ), bool_<false> > > false_token;

typedef token < lit_c < 'a' > > arg1_token;
typedef token < lit_c < 'b' > > arg2_token;
typedef token < lit_c < 'c' > > arg3_token;


struct paren_exp;

typedef
one_of< paren_exp, transform<true_token, build_value>, transform<false_token, build_value>, always<arg1_token, arg<0> >, always<arg2_token, arg<1> >, always<arg3_token, arg<2> > >
value_exp; //value_exp = paren_exp | true_token | false_token | arg1_token | arg2_token | arg3_token;

typedef
one_of< transform<last_of<not_token, value_exp>, build_not>, value_exp>
not_exp; //not_exp = (omit[not_token] >> value_exp) | value_exp;

typedef
foldl_start_with_parser <
last_of<and_token, not_exp>,
         not_exp,
         build_and
         >
         and_exp; // and_exp = not_exp >> *(and_token >> not_exp);

typedef
foldl_start_with_parser <
last_of<or_token, and_exp>,
         and_exp,
         build_or
         >
         or_exp; // or_exp = and_exp >> *(or_token >> and_exp);

struct paren_exp: middle_of < lit_c < '(' > , or_exp, lit_c < ')' > > {}; //paren_exp = lit('(') >> or_exp >> lit('(');

typedef last_of<repeated<space>, or_exp> expression; //expression = omit[*space] >> or_exp;

typedef build_parser<entire_input<expression> > function_parser;


template <typename Args>
struct calculator_context
        : boost::proto::callable_context< calculator_context<Args> const >
{
    calculator_context ( const Args& args ) : args_ ( args ) {}
    // Values to replace the placeholders
    const Args& args_;

    // Define the result type of the calculator.
    // (This makes the calculator_context "callable".)
    typedef bool result_type;

    // Handle the placeholders:
    template<int I>
    bool operator() ( boost::proto::tag::terminal, placeholder<I> ) const
    {
        return boost::fusion::at_c<I> ( args_ );
    }

    template<bool I>
    bool operator() ( boost::proto::tag::terminal, value<I> ) const
    {
        return I;
    }
};

template <typename Args>
calculator_context<Args> make_context ( const Args& args )
{
    return calculator_context<Args> ( args );
}

template <typename Expr, typename ... Args>
int evaluate ( const Expr& expr, const Args& ... args )
{
    return boost::proto::eval ( expr, make_context ( boost::fusion::make_vector ( args... ) ) );
}

#ifdef LAMBDA
#error LAMBDA already defined
#endif
#define LAMBDA(exp) apply_wrap1<function_parser, _S(exp)>::type::proto_type{}

int main()
{
    using std::cout;
    using std::endl;

    cout << evaluate ( LAMBDA ( "true&&false" ) ) << endl;
    cout << evaluate ( LAMBDA ( "true&&a" ), false ) << endl;
    cout << evaluate ( LAMBDA ( "true&&a" ), true ) << endl;
    cout << evaluate ( LAMBDA ( "a&&b" ), true, false ) << endl;
    cout << evaluate ( LAMBDA ( "a&&(b||c)" ), true, false, true ) << endl;
    cout << evaluate ( LAMBDA ( "!a&&(false||(b&&!c||false))" ), false, true, false ) << endl;
}

/*int main(int argc , char** argv)
{
    using std::cout;
    using std::endl;

    bool a=false, b=false, c=false;

    if(argc==4)
    {
        a=(argv[1][0]=='1');
        b=(argv[2][0]=='1');
        c=(argv[3][0]=='1');
    }

    LAMBDA("a && b || c") expr;

    cout << evaluate(expr, true, true, false) << endl;
    cout << evaluate(expr, a, b, c) << endl;

    return 0;
}*/
Perrotta answered 22/7, 2013 at 8:56 Comment(7)
I'm not sure that it will be as useful as it should be. The combination of lack of knowledge about metaparse and difficulty to express myself in English may have diminished the value of the explanations. I hope you can salvage something from this mess. If you have any doubt, please ask here, and I'll try my best to help, although I can't promise that I will be able to.Foam
Thanks a lot for your edits. It's amazing, it doesn't even look like the same answer.Foam
I didn't change a word :) Just presenting it on SO is an art, and I was reading it anyways. I will still have to look at it some more, but I'm beginning to think it's not as hard as it seemed. Thanks!Cabbageworm
I found a really well explained and similar example here. The part that is the most similar to this one is in "lab 6", but the previous ones are also pretty interesting(and a really good introduction to template metaprogramming in general, and to Boost.MPL conventions in particular). Curiously it seems to be from around the same time I made this answer, shame I didn't see it then.Foam
I would be really interested in an example of the same thing using the grammar parser described here: boost.org/doc/libs/master/doc/html/metaparse/…Tasha
@JerryJeremiah This is my attempt at adapting the example using metaparse::grammar. The Proto part is unchanged, but the semantic actions have changed a lot (I think they are simpler). I'm not sure, but it seems that metaparse::grammar does not directly support recursion (needed for or_exp->paren_exp->or_exp), so I've been forced to define paren_exp outside of the grammar. Although I haven't checked any timings, this approach seems far slower at compile time and the errors are undoubtedly a lot harder to understand.Foam
Ok, so I emailed Abel Sinkovics directly to ask about the recursion. His reply was: Hi Jerry, You are right about the current implementation of grammar in Metaparse not supporting recursion. The reason for that is the current implementation: it inlines the body of the symbols when it resolves them. It can be fixed by adding some laziness to the right place, but that will probably make grammars (even) slower. Unless you are only experimenting with the library, I recommend using the combinators instead. Regards, ÁbelTasha
D
5

For long, compile-time parsing has meant using template meta programming - which seems line noise to most beginner and even intermediate C++ programmers.

However, with C++11 we got constexpr and in C++14, a lot of restrictions to constexpr were removed. C++17 even makes a few of the standard library things constexpr.

While trying to learn advanced modern C++ - I decided to write a compile time HTML parser - The idea was to create a fast HTML templating engine.

The entire code can be found here: https://github.com/rep-movsd/see-phit

I will briefly explain the things I learned when getting this to work.

Handling dynamic data structures

I needed to parse a const char* and convert it into a multiway tree - but dynamic allocation is a no-no in constexpr land.

The solution? Use an array of nodes, with indices pointing to children and siblings - essentially how you might do it in FORTRAN!

The caveat is that your list of nodes has to be of a fixed size initially. Keeping it very large seemed to make gcc slow down compilation by a huge amount. If you end up going past the end of an array, the compiler will throw an error. I wrote a small std::array like wrapper that is fully constexpr.

Parsing

Pretty much standard code that you write fro runtime parsing will work at compile time! Loops, recursion, conditionals - everything works perfectly.

One issue was - how to represent strings? Using the above mentioned approach - an array of char - is very memory hungry, tedious way of doing things. Fortunately in my case, all I ever needed was merely substrings of the original const char* input. So I wrote a small constexpr string_view like class that just holds pointers to the start and end of the relevant parsed token. Creating new literal strings is just a matter of making these views into a const char* literal.

Error reporting

The basic way to handle errors in constexpr code is to call a function that is not constexpr - the compiler stops and prints the offending line, which can easily contain an error string.

However I wanted more - I wanted the parser to display the row and the column too. I struggled for a while and ended up thinking it was impossible. But I went back to it and tried every possible thing I could think of. Finally I found a way that makes gcc print 2 numbers and an error description message. It essentially involves creating a template with two integer parameters (row and col) whose values come from the constexpr parser.

Performance

I could not clearly find any pattern as to what kind of constexpr code tends to slow the compiler down, but the default performance is not too shabby at all. I am able to parse a 1000 node HTML file in about 1.5 seconds on gcc.

clang is a bit faster.


I intend to write a more detailed description of how the code works in the wiki of the github repo - stay tuned.

Devy answered 21/8, 2017 at 21:38 Comment(0)
P
0

While technically possible to do that kind of metaprogramming with templates or constexpr, I wouldn't recommend that approach. You’ll end up with huge amount of very complex code. Gonna be hard to debug, and expensive to maintain and extend.

Instead, use any other language to generate C++ code from your expressions.

If you’re using Visual Studio, one good built-in way is T4 text templates. Here’s more details.

Otherwise, use any other language available on your platform, e.g. I did something similar in Python.

Possibility answered 22/8, 2017 at 0:52 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.