Define parsers parameterized with sub-parsers in Boost Spirit
Asked Answered
H

2

6

I would like to convert some old hand-written parsing code to Boost Spirit and learn (more of) spirit in the process. The old code uses streams and templates to parse definitions for some data-types and some containers.

Some typical formats:

VECTOR[number_of_items,(item_1, item_2 .... item_n)]
PAIR(p1, p2)
RECT[(left,top)-(right,bottom)]
Point( x, y )
Size( x, y )

The parsing functions are templates with the type of the items as template parameter and use streams as input, e.g.

 template<class T> std::istream& operator>>(std::Stream& in, std::vector<T>& v);

 template<class T1, class T2> std::istream& operator>>(std::istream& in, std::pair<T1, T2>& p);

 template<class T1, class T2> std::istream& operator>>(std::istream& in, RectType<T>& r);
 etc.

The parser (stream extractor) for vectors calls the parser for the templates type.

Using these it is possible to parse definitions of integer rectangles, double rectangles, and vectors of pairs of strings and integers.

Is it possible with Spirit to write templated parsers that call sub-parsers for the template type?

Heldentenor answered 20/1, 2014 at 15:53 Comment(1)
Also relevant: factoring out common parts of spirit rulesStonewall
S
5

As the other answer almost already made clear, Qi already has a mechanism to generate parsers on the fly, given an attribute type.

The end-user facing bit here is qi::auto_. qi::auto_ is a parser, instead of a grammar.

This has distinct advantages[1].

  • Most of all, it allows users to use the parser inside a grammar with a skipper of their own choice as well as perhaps using qi::locals<>.
  • Also, the auto_ Qi expression terminal is already defined, so there is no need at all to instantiate a grammar using a verbose template argument list:
  • Finally, the parser returns an expression template, so there is no type-erasure going on, and combining several auto_ parsers in this way is therefore not less efficient than manually composing the grammar (whereas both wrapping in a qi::rule<> and qi::grammar<> incur performance overhead)

Let's see how it's used:

std::vector<std::pair<double, int> > parsed;
bool result_ = qi::phrase_parse(first, last, qi::auto_, qi::space, parsed);

As you can see, this accomodates a skipper, as well as 'magically' selects the parser that matches parsed. Now, to get your sample format from the OP, you'd need to hook into the customization point for the auto_ parser:

namespace boost { namespace spirit { namespace traits {   
    // be careful copying expression templates. Boost trunk has `qi::copy` for this too, now
    #define PARSER_DEF(a) using type = decltype(boost::proto::deep_copy(a)); static type call() { return boost::proto::deep_copy(a); }

    template<typename T1, typename T2> 
        struct create_parser<std::pair<T1, T2> > 
        {
            PARSER_DEF('(' >> create_parser<T1>::call() >> ',' >> create_parser<T2>::call() >> ')');
        };

    template<typename TV, typename... TArgs>
        struct create_parser<std::vector<TV, TArgs...> >
        {
            PARSER_DEF('[' >> qi::omit[qi::uint_] >> ',' >> '(' >> create_parser<TV>::call() % ',' >> ')' >> ']' );
        };

    #undef PARSER_DEF
} } }

That's literally all that's needed. Here's a demo that parses:

VECTOR[ 1 ,
 (               
     PAIR (0.97, 
           5),   
     PAIR (1.75,10)   
 )               
]

And prints the parsed data as:

Parsed:
 0.97 5 
 1.75 10 

See it Live On Coliru

Full Code Listing

#include <boost/fusion/adapted.hpp>
#include <boost/spirit/home/qi.hpp>

namespace qi = boost::spirit::qi;

namespace boost { namespace spirit { namespace traits {   
    // be careful copying expression templates. Boost trunk has `qi::copy` for this too, now
    #define PARSER_DEF(a) using type = decltype(boost::proto::deep_copy(a)); static type call() { return boost::proto::deep_copy(a); }

    template<typename T1, typename T2> 
        struct create_parser<std::pair<T1, T2> > 
        {
            PARSER_DEF(lexeme [ lit("PAIR") ] >> '(' >> create_parser<T1>::call() >> ',' >> create_parser<T2>::call() >> ')');
        };

    template<typename TV, typename... TArgs>
        struct create_parser<std::vector<TV, TArgs...> >
        {
            PARSER_DEF(lexeme [ lit("VECTOR") ] >> '[' >> qi::omit[qi::uint_] >> ',' >> '(' >> create_parser<TV>::call() % ',' >> ')' >> ']' );
        };

    #undef PARSER_DEF
} } }

#include <boost/spirit/home/karma.hpp>
namespace karma = boost::spirit::karma;

int main()
{
    std::string const input("VECTOR[ 1 ,\n"
                " (               \n"
                "     PAIR (0.97, \n"
                "           5),   \n"
                "     PAIR (1.75,10)   \n"
                " )               \n"
            "]");

    std::cout << input << "\n\n";

    auto first = input.begin();
    auto last = input.end();

    std::vector<std::pair<double, int> > parsed;
    bool result_ = qi::phrase_parse(first, last, qi::auto_, qi::space, parsed);

    if (first!=last)
        std::cout << "Remaining unparsed input: '" << std::string(first, last) << "'\n";

    if (result_)
        std::cout << "Parsed:\n " << karma::format_delimited(karma::auto_ % karma::eol, " ", parsed) << "\n";
    else
        std::cout << "Parsing did not succeed\n";
}

[1] A potential downside would be that the customization point is fixed, and hence you would only be able to associate 1 auto_ parser with any type. Rolling your own base template gives you more control and enables you to (more) easily have different 'parser flavours'. However, in the end it's possible to have the best of both worlds, so I'd go for convenience first.

Stonewall answered 21/1, 2014 at 0:55 Comment(1)
+1 Thanks for explanation about qi::auto_. I didn't understand it until now.Crooked
C
2

Yes, it's possible. I'd implement that way

#include <boost/spirit/home/qi.hpp>

namespace qi = boost::spirit::qi;

template < typename _Type, typename _Iterator, typename _Enable = void >
struct parser;

template < typename _Type, typename _Iterator >
struct parser < _Type, _Iterator, typename std::enable_if < std::is_arithmetic<_Type> ::value > ::type > :
    qi::grammar<_Iterator, _Type() >
{
    parser()
        : parser::base_type(impl)
    {
        impl = qi::create_parser<_Type>() ;
    }

    qi::rule<_Iterator, _Type()> impl;
};

template < typename _Iterator >
struct parser < double, _Iterator> :
    qi::grammar<_Iterator, double() >
{
    parser()
        : parser::base_type(impl)
    {
        impl = qi::double_;
    }

    qi::rule<_Iterator, double()> impl;
};

template < typename _First, typename _Second, typename _Iterator >
struct parser < std::pair<_First, _Second>, _Iterator> :
    qi::grammar<_Iterator, std::pair<_First, _Second>() >
{
    parser()
        : parser::base_type(impl)
    {
        impl =  qi::lit('(') >> first >> ',' >> second >> ')';
    }

    qi::rule<_Iterator, std::pair<_First, _Second>()> impl;

    parser<_First, _Iterator> first;
    parser<_Second, _Iterator> second;
};

template < typename _Type, typename _Alloc, typename _Iterator >
struct parser < std::vector<_Type, _Alloc>, _Iterator> :
    qi::grammar<_Iterator, std::vector<_Type, _Alloc>() >
{
    parser()
        : parser::base_type(impl)
    {
        impl = qi::lit('[') >> qi::omit[qi::uint_] >> ",(" >> item % ',' >> ")]";
    }

    qi::rule<_Iterator, std::vector<_Type, _Alloc>()> impl;

    parser<_Type, _Iterator> item;
};

Usage example:

int main(int agrc, char *argv[])
{
    typedef std::pair<double, int> pair;

    using string = std::string;

    string input_ = { "[1,((0.97,5),(1.75,10))]" };

    string::const_iterator iterator_ = input_.begin();
    string::const_iterator end_ = input_.end();

    std::vector<pair> pairs_;

    bool result_ = qi::parse(iterator_, end_, parser <std::vector<pair>, string::const_iterator> (), pairs_);

    return 0;
}
Crooked answered 20/1, 2014 at 17:23 Comment(2)
Kinda cool. I have a suspicion it can be done much shorter and more flexible though. Lemme checkStonewall
Yup. I was right :) Here's the same but more flexible in 31 lines of code. That's 51 lines less than your version. See my answer for background ideas. By the way, +1 for the inspiration!Stonewall

© 2022 - 2024 — McMap. All rights reserved.