Understanding the List Operator (%) in Boost.Spirit
Asked Answered
S

2

6

Can you help me understand the difference between the a % b parser and its expanded a >> *(b >> a) form in Boost.Spirit? Even though the reference manual states that they are equivalent,

The list operator, a % b, is a binary operator that matches a list of one or more repetitions of a separated by occurrences of b. This is equivalent to a >> *(b >> a).

the following program produces different results depending on which is used:

#include <iostream>
#include <string>
#include <vector>

#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/spirit/include/qi.hpp>

struct Record {
  int id;
  std::vector<int> values;
};

BOOST_FUSION_ADAPT_STRUCT(Record,
  (int, id)
  (std::vector<int>, values)
)

int main() {
  namespace qi = boost::spirit::qi;

  const auto str = std::string{"1: 2, 3, 4"};

  const auto rule1 = qi::int_ >> ':' >> (qi::int_ % ',')                 >> qi::eoi;
  const auto rule2 = qi::int_ >> ':' >> (qi::int_ >> *(',' >> qi::int_)) >> qi::eoi;

  Record record1;
  if (qi::phrase_parse(str.begin(), str.end(), rule1, qi::space, record1)) {
    std::cout << record1.id << ": ";
    for (const auto& value : record1.values) { std::cout << value << ", "; }
    std::cout << '\n';
  } else {
    std::cerr << "syntax error\n";
  }

  Record record2;
  if (qi::phrase_parse(str.begin(), str.end(), rule2, qi::space, record2)) {
    std::cout << record2.id << ": ";
    for (const auto& value : record2.values) { std::cout << value << ", "; }
    std::cout << '\n';
  } else {
    std::cerr << "syntax error\n";
  }
}

Live on Coliru

1: 2, 3, 4, 
1: 2, 

rule1 and rule2 are different only in that rule1 uses the list operator ((qi::int_ % ',')) and rule2 uses its expanded form ((qi::int_ >> *(',' >> qi::int_))). However, rule1 produced 1: 2, 3, 4, (as expected) and rule2 produced 1: 2,. I cannot understand the result of rule2: 1) why is it different from that of rule1 and 2) why were 3 and 4 not included in record2.values even though phrase_parse returned true somehow?

Slice answered 20/11, 2015 at 0:1 Comment(5)
Someone who has the pleasure of remembering the Spirit terminology can explain it better, but they look equivalent in what they match, but not in the type they accept for storing the values. The second appears to work with (int, int, vector<int>) rather than (int, vector<int>). I'm guessing vector<int> is compatible with a single int_ and then the repeated int_s (within the *(…)) are ignored when storing values.Propagandist
Yup that's it, see my answer @chris. There's a big error in the code though, so there's that (anything could have happened)Fasten
@sehe, Thanks, I'm interested in Spirit, but I haven't had a chance to use it for something yet. The CppCon X3 talk was pretty cool.Propagandist
@Propagandist That triggered me to do the corresponding test cases in Spirit X3. As expected, the situation is a whole lot better on the whole (no UB, no silent attribute propagation failures, not to mention vastly improved compile times :)). See my second answerFasten
Another difference is that if b has an attribute (not the case in your example), it is ignored in a%b but not in a >>*(b >> a). a >> *(omit[b] >> a) would be closer (but with the same problems shown in sehe's answer).Tarsus
F
10

Update X3 version added

First off, you fallen into a deep trap here:

Qi rules don't work with auto. Use qi::copy or just used qi::rule<>. Your program has undefined behaviour and indeed it crashed for me (valgrind pointed out where the dangling references originated).

So, first off:

const auto rule = qi::copy(qi::int_ >> ':' >> (qi::int_ % ',')                 >> qi::eoi); 

Now, when you delete the redundancy in the program, you get:

Reproducing the problem

Live On Coliru

int main() {
    test(qi::copy(qi::int_ >> ':' >> (qi::int_ % ',')));
    test(qi::copy(qi::int_ >> ':' >> (qi::int_ >> *(',' >> qi::int_))));
}

Printing

1: 2, 3, 4, 
1: 2, 

The cause and the fix

What happened to 3, 4 which was successfully parsed?

Well, the attribute propagation rules indicate that qi::int_ >> *(',' >> qi::int_) exposes a tuple<int, vector<int> >. In a bid to magically DoTheRightThing(TM) Spirit accidentally misfires and "assigngs" the int into the attribute reference, ignoring the remaining vector<int>.

If you want to make container attributes parse as "an atomic group", use qi::as<>:

test(qi::copy(qi::int_ >> ':' >> qi::as<Record::values_t>() [ qi::int_ >> *(',' >> qi::int_)]));

Here as<> acts as a barrier for the attribute compatibility heuristics and the grammar knows what you meant:

Live On Coliru

#include <iostream>
#include <string>
#include <vector>

#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/spirit/include/qi.hpp>

struct Record {
  int id;
  using values_t = std::vector<int>;
  values_t values;
};

BOOST_FUSION_ADAPT_STRUCT(Record, id, values)

namespace qi = boost::spirit::qi;

template <typename T>
void test(T const& rule) {
    const std::string str = "1: 2, 3, 4";

    Record record;

    if (qi::phrase_parse(str.begin(), str.end(), rule >> qi::eoi, qi::space, record)) {
        std::cout << record.id << ": ";
        for (const auto& value : record.values) { std::cout << value << ", "; }
        std::cout << '\n';
    } else {
        std::cerr << "syntax error\n";
    }
}

int main() {
    test(qi::copy(qi::int_ >> ':' >> (qi::int_ % ',')));
    test(qi::copy(qi::int_ >> ':' >> (qi::int_ >> *(',' >> qi::int_))));
    test(qi::copy(qi::int_ >> ':' >> qi::as<Record::values_t>() [ qi::int_ >> *(',' >> qi::int_)]));
}

Prints

1: 2, 3, 4, 
1: 2, 
1: 2, 3, 4, 
Fasten answered 20/11, 2015 at 0:28 Comment(8)
Does it mean that, when Qi tried to match tuple<int, vector<int>> = {2, {3, 4}} with vector<int> = Record::values, Qi happened to convert tuple<int, vector<int>> = {2, {3, 4}} into vector<int> = {2}, not the expected vector<int> = {2, 3, 4}, due to some implementation details?Slice
And how can qi::as<> rectify the problem? According to the manual, qi::as<> is about commit/rollback semantics. So, when parsing is successful, shouldn't the parser produce the same result regardless of the existence of qi::as<>?Slice
@Slice qi::hold is about commit/rollback semantics. As the docs for as<>[] indicate: "Note The as<T> implicitly causes commit/rollback semantics similar in nature to the hold directive." but that's it. It's really an annotation to tell the parser how you would like your container attribute synthesized. And that is made "atomically" there. (The wording is a bit confusing.)Fasten
So can I understand it this way? --- The compound attribute rule (a: A, b: vector<A> --> (a >> b): vector<A>) does not always apply. The exact condition on when it does not apply is not specified (at least not documented) or too complicated to describe. Fortunately we can always enforce it by qi::as<vector<A>> or x3::rule<..., vector<A>>.Slice
(Continuing the previous comment) For tuple<A, vector<A>>, Qi has been accidentally written so that a parser stores only the first element of the tuple into its supplied container attribute of type vector<A>, ignoring the second element of the tuple (vector<A>). X3 rigorously rejects it at compile-time.Slice
Basically yes. I'd try to be a whole lot less confusing in the description, but I guess my answer was for that already. I wouldn't trust "rigorously" - there are always heuristics involved in a framework that does automatic, generic, extensible attribute transformations. But yeah, it's more rigid and controlled than in the old days.Fasten
Thanks! I feel that I understand it now.Slice
@Slice It's not exactly the same cases but you can see an attempt at an explanation of similar problems here and here.Tarsus
F
10

Because it's time to get people started with X3 (the new version of Spirit), and because I like to challenge msyelf to do the corresponding tasks in Spirit X3, here is the Spirit X3 version.

There's no problem with auto in X3.

The "broken" case also behaves much better, triggering this static assertion:

    // If you got an error here, then you are trying to pass
    // a fusion sequence with the wrong number of elements
    // as that expected by the (sequence) parser.
    static_assert(
        fusion::result_of::size<Attribute>::value == (l_size + r_size)
      , "Attribute does not have the expected size."
    );

That's nice, right?

The workaround seems a bit less readable:

test(int_ >> ':' >> (rule<struct _, Record::values_t>{} = (int_ >> *(',' >> int_))));

But it would be trivial to write your own as<> "directive" (or just a function), if you wanted:

namespace {
    template <typename T>
    struct as_type {
        template <typename Expr>
            auto operator[](Expr&& expr) const {
                return x3::rule<struct _, T>{"as"} = x3::as_parser(std::forward<Expr>(expr));
            }
    };

    template <typename T> static const as_type<T> as = {};
}

DEMO

Live On Coliru

#include <iostream>
#include <string>
#include <vector>

#include <boost/fusion/adapted/std_tuple.hpp>
#include <boost/spirit/home/x3.hpp>

struct Record {
    int id;
    using values_t = std::vector<int>;
    values_t values;
};

namespace x3 = boost::spirit::x3;

template <typename T>
void test(T const& rule) {
    const std::string str = "1: 2, 3, 4";

    Record record;

    auto attr = std::tie(record.id, record.values);

    if (x3::phrase_parse(str.begin(), str.end(), rule >> x3::eoi, x3::space, attr)) {
        std::cout << record.id << ": ";
        for (const auto& value : record.values) { std::cout << value << ", "; }
        std::cout << '\n';
    } else {
        std::cerr << "syntax error\n";
    }
}

namespace {
    template <typename T>
    struct as_type {
        template <typename Expr>
            auto operator[](Expr&& expr) const {
                return x3::rule<struct _, T>{"as"} = x3::as_parser(std::forward<Expr>(expr));
            }
    };

    template <typename T> static const as_type<T> as = {};
}

int main() {
    using namespace x3;
    test(int_ >> ':' >> (int_ % ','));
    //test(int_ >> ':' >> (int_ >> *(',' >> int_))); // COMPILER asserts "Attribute does not have the expected size."

    // "clumsy" x3 style workaround
    test(int_ >> ':' >> (rule<struct _, Record::values_t>{} = (int_ >> *(',' >> int_))));

    // using an ad-hoc `as<>` implementation:
    test(int_ >> ':' >> as<Record::values_t>[int_ >> *(',' >> int_)]);
}

Prints

1: 2, 3, 4, 
1: 2, 3, 4, 
1: 2, 3, 4, 
Fasten answered 20/11, 2015 at 0:49 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.