Parsing comma-separated list of ranges and numbers with semantic actions
Asked Answered
G

1

6

Using Boost.Spirit X3, I want to parse a comma-separated list of ranges and individual numbers (e.g. 1-4, 6, 7, 9-12) into a single std::vector<int>. Here's what I've come up with:

namespace ast {
    struct range 
    {
        int first_, last_;    
    };    

    using expr = std::vector<int>;    
}

namespace parser {        
    template<typename T>
    auto as_rule = [](auto p) { return x3::rule<struct _, T>{} = x3::as_parser(p); };

    auto const push = [](auto& ctx) { 
        x3::_val(ctx).push_back(x3::_attr(ctx)); 
    };  

    auto const expand = [](auto& ctx) { 
        for (auto i = x3::_attr(ctx).first_; i <= x3::_attr(ctx).last_; ++i) 
            x3::_val(ctx).push_back(i);  
    }; 

    auto const number = x3::uint_;
    auto const range  = as_rule<ast::range> (number >> '-' >> number                   ); 
    auto const expr   = as_rule<ast::expr>  ( -(range [expand] | number [push] ) % ',' );
} 

Given the input

    "1,2,3,4,6,7,9,10,11,12",   // individually enumerated
    "1-4,6-7,9-12",             // short-hand: using three ranges

this is successfully parsed as ( Live On Coliru ):

OK! Parsed: 1, 2, 3, 4, 6, 7, 9, 10, 11, 12, 
OK! Parsed: 1, 2, 3, 4, 6, 7, 9, 10, 11, 12, 

Question: I think I understand that applying the semantic action expand to the range part is necessary, but why do I also have to apply the semantic action push to the number part? Without it (i.e. with a plain ( -(range [expand] | number) % ',') rule for expr, the individual numbers don't get propagated into the AST ( Live On Coliru ):

OK! Parsed: 
OK! Parsed: 1, 2, 3, 4, 6, 7, 9, 10, 11, 12, 

Bonus Question: do I even need semantic actions at all to do this? The Spirit X3 documentation seems to discourage them.

Garner answered 4/1, 2016 at 20:44 Comment(3)
And - as usual, looking at it livecoding.tv/sehe (no voice today)Disencumber
great! BTW, I think I got the hint from our chat the other day that I needed to focus on a clean AST to get out of the parser, given that the grammar is fixed. For such hybrid inputs I think that means making the parser itself a bit more complicated.Garner
That's the trade off, indeed. The cut-off is (for me) when the grammar gets hard to reason about/maintain. In that case I opt for postprocessing the AST (or some hybrid)Disencumber
D
4

The FAQ of this that semantic actions suppress automatic attribute propagation. The assumption being that the semantic action will take care of it instead.

In general there are two approaches:

  • either use operator%= instead of operator= to assign the definition to the rule

  • or use the third (optional) template argument to the rule<> template, which can be specified as true to force automatic propagation semantics.


Simplified sample

Here, I simplify mostly by removing the semantic action inside the range rule itself. Now, we can drop the ast::range type altogether. No more fusion adaptation.

Instead we use the "naturally" synthesized attribute of numer>>'-'>>number which is a fusion sequence of ints (fusion::deque<int, int> in this case).

Now, all that's left to make it work, is to make sure the branches of | yield compatible types. A simple repeat(1)[] fixes that.

Live On Coliru

#include <boost/spirit/home/x3.hpp>
#include <iostream>

namespace x3 = boost::spirit::x3;

namespace ast {
    using expr = std::vector<int>;    

    struct printer {
        std::ostream& out;

        auto operator()(expr const& e) const {
            std::copy(std::begin(e), std::end(e), std::ostream_iterator<expr::value_type>(out, ", "));;
        }
    };    
}

namespace parser {        
    auto const expand = [](auto& ctx) { 
        using boost::fusion::at_c;

        for (auto i = at_c<0>(_attr(ctx)); i <= at_c<1>(_attr(ctx)); ++i) 
            x3::_val(ctx).push_back(i);  
    }; 

    auto const number = x3::uint_;
    auto const range  = x3::rule<struct _r, ast::expr> {} = (number >> '-' >> number) [expand]; 
    auto const expr   = x3::rule<struct _e, ast::expr> {} = -(range | x3::repeat(1)[number]  ) % ',';
} 

template<class Phrase, class Grammar, class Skipper, class AST, class Printer>
auto test(Phrase const& phrase, Grammar const& grammar, Skipper const& skipper, AST& data, Printer const& print)
{
    auto first = phrase.begin();
    auto last = phrase.end();
    auto& out = print.out;

    auto const ok = phrase_parse(first, last, grammar, skipper, data);
    if (ok) {
        out << "OK! Parsed: "; print(data); out << "\n";
    } else {
        out << "Parse failed:\n";
        out << "\t on input: " << phrase << "\n";
    }
    if (first != last)
        out << "\t Remaining unparsed: '" << std::string(first, last) << '\n';    
}

int main() {
    std::string numeric_tests[] =
    {
        "1,2,3,4,6,7,9,10,11,12",   // individually enumerated
        "1-4,6-7,9-12",             // short-hand: using three ranges
    };

    for (auto const& t : numeric_tests) {
        ast::expr numeric_data;
        test(t, parser::expr, x3::space, numeric_data, ast::printer{std::cout});
    }
}

Prints:

OK! Parsed: 1, 2, 3, 4, 6, 7, 9, 10, 11, 12, 
OK! Parsed: 1, 2, 3, 4, 6, 7, 9, 10, 11, 12, 
Disencumber answered 4/1, 2016 at 21:0 Comment(4)
Completed a simplified sampleDisencumber
Awesome! Thanks a lot, very educational experience watching the livecoding session!Garner
Cheers. Here's the link for future reference: livecoding.tv/video/…Disencumber
BTW, I never really thought about adding the [expand] action after the range rule, that's much cleaner indeed. The fusion::at_c is also nice to know about.Garner

© 2022 - 2024 — McMap. All rights reserved.