Spirit X3, Is this error handling approach useful?

Asked 15/7, 2019 at 22:44 Answered 17/7, 2019 at 1:3

After reading the the Spirit X3 tutorial on error handling and some experimentation. I was drawn to a conclusion.

I believe there is some room for improvement on the topic of error handing in X3. An important goal from my perspective is to provide a meaningful error message. First and foremost adding a semantic action that will set the _pass(ctx) member to false wouldn’t do it because X3 will try to match something else. Only throwing an x3::expectation_failure will quit the parse function prematurely, i.e. without trying to match anything else. So what is left are the parser directive expect[a] and parser operator> as well as manually throwing x3::expectation_failure from an semantic action. I do believe the vocabulary regarding this error handing is too limited. Please consider the following lines of X3 PEG grammar:

const auto a = a1 >> a2 >> a3;
const auto b = b1 >> b2 >> b3;
const auto c = c1 >> c2 >> c3;

const auto main_rule__def =
(
 a |
 b |
 c );

Now for expression a I cannot use expect[] or operator>, as other alternatives might be valid. I could be wrong but I think X3 requires me to spell out alternate wrong expressions that can match and if they match they can throw x3::expectation_failure which is cumbersome.

The question is, is there a good way of checking for error conditions in my PEG construct with the ordered alternatives for a, b and c using current X3 facilities?

If the answer is no, I would like to present my idea to provide a reasonable solution for this. I believe I would need a new parser directive for that. What should this directive do? It should call the attached semantic action when the parse fails instead. The attribute is obviously unused, but I would need the _where member to be set on the iterator position on the first occurrence of a parsing mismatch. So if a2 fails, _where should be set 1 after the end of a1. Let’s call the parsing directive neg_sa. That means negate semantic action.

pseudocode

// semantic actions
auto a_sa = [&](auto& ctx)
{
  // add _where to vector v
};

auto b_sa = [&](auto& ctx)
{
  // add _where to vector v
};

auto c_sa = [&](auto& ctx)
{
  // add _where to vector v

  // now we know we have a *real* error.
  // find the peak iterator value in the vector v
  // the position tells whether it belongs to a, b or c.
  // now we can formulate an error message like: “cannot make sense of b upto this position.”
  // lastly throw x3::expectation_failure
};

// PEG
const auto a = a1 >> a2 >> a3;
const auto b = b1 >> b2 >> b3;
const auto c = c1 >> c2 >> c3;

const auto main_rule__def =
(
 neg_sa[a][a_sa] |
 neg_sa[b][b_sa] |
 neg_sa[c][c_sa] );

I hope I presented this idea clearly. Let me know in the comment section if I need to explain something further.

Marelya answered 15/7, 2019 at 22:44 Comment(2)

@sehe as you can see I refined my idea a bit on error handling. – Marelya 15/7, 2019 at 23:36

Re-reading your pseudo-code exposition, I'm curious what actual logic would be in there. Because you could use x3::with(peak_ref) [....] and report on that location in the on_error handler. If you show a minimal example (without error handling) I could try my hand at that. – Blackheart 16/7, 2019 at 11:7

Okay, risking conflating too many things in an example, here goes:

namespace square::peg {
    using namespace x3;

    const auto quoted_string = lexeme['"' > *(print - '"') > '"'];
    const auto bare_string   = lexeme[alpha > *alnum] > ';';
    const auto two_ints      = int_ > int_;

    const auto main          = quoted_string | bare_string | two_ints;

    const auto entry_point   = skip(space)[ expect[main] > eoi ];
} // namespace square::peg

That should do. The key is that the only things that should be expectation points is things that make the respective branch fail BEYOND the point where it was unambiguously the right branch. (Otherwise, there would literally not be a hard expectation).

With two minor get_info specialization for prettier messages¹, this could lead to decent error messages even when manually catching the exception:

Live On Coliru

int main() {
    using It = std::string::const_iterator;

    for (std::string const input : {
            "   -89 0038  ",
            "   \"-89 0038\"  ",
            "   something123123      ;",
            // undecidable
            "",
            // violate expecations, no successful parse
            "   -89 oops  ",   // not an integer
            "   \"-89 0038  ", // missing "
            "   bareword ",    // missing ;
            // trailing debris, successful "main"
            "   -89 3.14  ",   // followed by .14
        })
    {
        std::cout << "====== " << std::quoted(input) << "\n";

        It iter = input.begin(), end = input.end();
        try {
        if (parse(iter, end, square::peg::entry_point)) {
            std::cout << "Parsed successfully\n";
        } else {
            std::cout << "Parsing failed\n";
        }
        } catch (x3::expectation_failure<It> const& ef) {
            auto pos = std::distance(input.begin(), ef.where());
            std::cout << "Expect " << ef.which() << " at "
                << "\n\t" << input
                << "\n\t" << std::setw(pos) << std::setfill('-') << "" << "^\n";
        }
    }
}

Prints

====== "   -89 0038  "
Parsed successfully
====== "   \"-89 0038\"  "
Parsed successfully
====== "   something123123      ;"
Parsed successfully
====== ""
Expect quoted string, bare string or integer number pair at

    ^
====== "   -89 oops  "
Expect integral number at
       -89 oops 
    -------^
====== "   \"-89 0038  "
Expect '"' at
       "-89 0038 
    --------------^
====== "   bareword "
Expect ';' at
       bareword
    ------------^
====== "   -89 3.14  "
Expect eoi at
       -89 3.14 
    --------^

This is already beyond what most people expect from their parsers.

But: Automate That, Also, More Flexible

We might not be content reporting just the one expectation and bailing out. Indeed, you can report and continue parsing as there were just a regular mismatch: this is where on_error comes in.

Let's create a tag base:

struct with_error_handling {
    template<typename It, typename Ctx>
        x3::error_handler_result on_error(It f, It l, expectation_failure<It> const& ef, Ctx const&) const {
            std::string s(f,l);
            auto pos = std::distance(f, ef.where());

            std::cout << "Expecting " << ef.which() << " at "
                << "\n\t" << s
                << "\n\t" << std::setw(pos) << std::setfill('-') << "" << "^\n";

            return error_handler_result::fail;
        }
};

Now, all we have to do is derive our rule ID from with_error_handlingand BAM!, we don't have to write any exception handlers, rules will simply "fail" with the appropriate diagnostics. What's more, some inputs can lead to multiple (hopefully helpful) diagnostics:

auto const eh = [](auto p) {
    struct _ : with_error_handling {};
    return rule<_> {} = p;
};

const auto quoted_string = eh(lexeme['"' > *(print - '"') > '"']);
const auto bare_string   = eh(lexeme[alpha > *alnum] > ';');
const auto two_ints      = eh(int_ > int_);

const auto main          = quoted_string | bare_string | two_ints;
using main_type = std::remove_cv_t<decltype(main)>;

const auto entry_point   = skip(space)[ eh(expect[main] > eoi) ];

Now, main becomes just:

Live On Coliru

for (std::string const input : { 
        "   -89 0038  ",
        "   \"-89 0038\"  ",
        "   something123123      ;",
        // undecidable
        "",
        // violate expecations, no successful parse
        "   -89 oops  ",   // not an integer
        "   \"-89 0038  ", // missing "
        "   bareword ",    // missing ;
        // trailing debris, successful "main"
        "   -89 3.14  ",   // followed by .14
    })
{
    std::cout << "====== " << std::quoted(input) << "\n";

    It iter = input.begin(), end = input.end();
    if (parse(iter, end, square::peg::entry_point)) {
        std::cout << "Parsed successfully\n";
    } else {
        std::cout << "Parsing failed\n";
    }
}

And the program prints:

====== "   -89 0038  "
Parsed successfully
====== "   \"-89 0038\"  "
Parsed successfully
====== "   something123123      ;"
Parsed successfully
====== ""
Expecting quoted string, bare string or integer number pair at 

    ^
Parsing failed
====== "   -89 oops  "
Expecting integral number at 
       -89 oops  
    -------^
Expecting quoted string, bare string or integer number pair at 
       -89 oops  
    ^
Parsing failed
====== "   \"-89 0038  "
Expecting '"' at 
       "-89 0038  
    --------------^
Expecting quoted string, bare string or integer number pair at 
       "-89 0038  
    ^
Parsing failed
====== "   bareword "
Expecting ';' at 
       bareword 
    ------------^
Expecting quoted string, bare string or integer number pair at 
       bareword 
    ^
Parsing failed
====== "   -89 3.14  "
Expecting eoi at 
       -89 3.14  
    --------^
Parsing failed

Attribute Propagation, `on_success`

Parsers aren't very useful when they don't actually parse anything, so let's add some constructive value handling, also showcaseing on_success:

Defining some AST types to receive the attributes:

struct quoted : std::string {};
struct bare   : std::string {};
using  two_i  = std::pair<int, int>;
using Value = boost::variant<quoted, bare, two_i>;

Make sure we can print Values:

static inline std::ostream& operator<<(std::ostream& os, Value const& v) {
    struct {
        std::ostream& _os;
        void operator()(quoted const& v) const { _os << "quoted(" << std::quoted(v) << ")";             } 
        void operator()(bare const& v) const   { _os << "bare(" << v << ")";                            } 
        void operator()(two_i const& v) const  { _os << "two_i(" << v.first << ", " << v.second << ")"; } 
    } vis{os};

    boost::apply_visitor(vis, v);
    return os;
}

Now, use the old as<> trick to coerce attribute types, this time with error-handling:

As icing on the cake, let's demonstrate on_success in with_error_handling:

    template<typename It, typename Ctx>
        void on_success(It f, It l, two_i const& v, Ctx const&) const {
            std::cout << "Parsed " << std::quoted(std::string(f,l)) << " as integer pair " << v.first << ", " << v.second << "\n";
        }

Now with largely unmodified main program (just prints the result value as well):

Live On Coliru

    It iter = input.begin(), end = input.end();
    Value v;
    if (parse(iter, end, square::peg::entry_point, v)) {
        std::cout << "Result value: " << v << "\n";
    } else {
        std::cout << "Parsing failed\n";
    }

Prints

====== "   -89 0038  "
Parsed "-89 0038" as integer pair -89, 38
Result value: two_i(-89, 38)
====== "   \"-89 0038\"  "
Result value: quoted("-89 0038")
====== "   something123123      ;"
Result value: bare(something123123)
====== ""
Expecting quoted string, bare string or integer number pair at 

    ^
Parsing failed
====== "   -89 oops  "
Expecting integral number at 
       -89 oops  
    -------^
Expecting quoted string, bare string or integer number pair at 
       -89 oops  
    ^
Parsing failed
====== "   \"-89 0038  "
Expecting '"' at 
       "-89 0038  
    --------------^
Expecting quoted string, bare string or integer number pair at 
       "-89 0038  
    ^
Parsing failed
====== "   bareword "
Expecting ';' at 
       bareword 
    ------------^
Expecting quoted string, bare string or integer number pair at 
       bareword 
    ^
Parsing failed
====== "   -89 3.14  "
Parsed "-89 3" as integer pair -89, 3
Expecting eoi at 
       -89 3.14  
    --------^
Parsing failed

Really Overdoing Things

I don't know about you, but I hate doing side-effects, let alone printing to the console from a parser. Let's use x3::with instead.

We want to append to the diagnostics via the Ctx& argument instead of writing to std::cout in the on_error handler:

struct with_error_handling {
    struct diags;

    template<typename It, typename Ctx>
        x3::error_handler_result on_error(It f, It l, expectation_failure<It> const& ef, Ctx const& ctx) const {
            std::string s(f,l);
            auto pos = std::distance(f, ef.where());

            std::ostringstream oss;
            oss << "Expecting " << ef.which() << " at "
                << "\n\t" << s
                << "\n\t" << std::setw(pos) << std::setfill('-') << "" << "^";

            x3::get<diags>(ctx).push_back(oss.str());

            return error_handler_result::fail;
        }
};

And on the call site, we can pass the context:

std::vector<std::string> diags;

if (parse(iter, end, x3::with<D>(diags) [square::peg::entry_point], v)) {
    std::cout << "Result value: " << v;
} else {
    std::cout << "Parsing failed";
}

std::cout << " with " << diags.size() << " diagnostics messages: \n";

The full program also prints the diagnostics:

Live On Wandbox²

Full Listing

//#define BOOST_SPIRIT_X3_DEBUG
#include <boost/fusion/adapted.hpp>
#include <boost/spirit/home/x3.hpp>
#include <iostream>
#include <iomanip>

namespace x3 = boost::spirit::x3;

struct quoted : std::string {};
struct bare   : std::string {};
using  two_i  = std::pair<int, int>;
using Value = boost::variant<quoted, bare, two_i>;

static inline std::ostream& operator<<(std::ostream& os, Value const& v) {
    struct {
        std::ostream& _os;
        void operator()(quoted const& v) const { _os << "quoted(" << std::quoted(v) << ")";             } 
        void operator()(bare const& v) const   { _os << "bare(" << v << ")";                            } 
        void operator()(two_i const& v) const  { _os << "two_i(" << v.first << ", " << v.second << ")"; } 
    } vis{os};

    boost::apply_visitor(vis, v);
    return os;
}

namespace square::peg {
    using namespace x3;

    struct with_error_handling {
        struct diags;

        template<typename It, typename Ctx>
            x3::error_handler_result on_error(It f, It l, expectation_failure<It> const& ef, Ctx const& ctx) const {
                std::string s(f,l);
                auto pos = std::distance(f, ef.where());

                std::ostringstream oss;
                oss << "Expecting " << ef.which() << " at "
                    << "\n\t" << s
                    << "\n\t" << std::setw(pos) << std::setfill('-') << "" << "^";

                x3::get<diags>(ctx).push_back(oss.str());

                return error_handler_result::fail;
            }
    };

    template <typename T = x3::unused_type> auto const as = [](auto p) {
        struct _ : with_error_handling {};
        return rule<_, T> {} = p;
    };

    const auto quoted_string = as<quoted>(lexeme['"' > *(print - '"') > '"']);
    const auto bare_string   = as<bare>(lexeme[alpha > *alnum] > ';');
    const auto two_ints      = as<two_i>(int_ > int_);

    const auto main          = quoted_string | bare_string | two_ints;
    using main_type = std::remove_cv_t<decltype(main)>;

    const auto entry_point   = skip(space)[ as<Value>(expect[main] > eoi) ];
} // namespace square::peg

namespace boost::spirit::x3 {
    template <> struct get_info<int_type> {
        typedef std::string result_type;
        std::string operator()(int_type const&) const { return "integral number"; }
    };
    template <> struct get_info<square::peg::main_type> {
        typedef std::string result_type;
        std::string operator()(square::peg::main_type const&) const { return "quoted string, bare string or integer number pair"; }
    };
}

int main() {
    using It = std::string::const_iterator;
    using D = square::peg::with_error_handling::diags;

    for (std::string const input : { 
            "   -89 0038  ",
            "   \"-89 0038\"  ",
            "   something123123      ;",
            // undecidable
            "",
            // violate expecations, no successful parse
            "   -89 oops  ",   // not an integer
            "   \"-89 0038  ", // missing "
            "   bareword ",    // missing ;
            // trailing debris, successful "main"
            "   -89 3.14  ",   // followed by .14
        })
    {
        std::cout << "====== " << std::quoted(input) << "\n";

        It iter = input.begin(), end = input.end();
        Value v;
        std::vector<std::string> diags;

        if (parse(iter, end, x3::with<D>(diags) [square::peg::entry_point], v)) {
            std::cout << "Result value: " << v;
        } else {
            std::cout << "Parsing failed";
        }

        std::cout << " with " << diags.size() << " diagnostics messages: \n";

        for(auto& msg: diags) {
            std::cout << " - " << msg << "\n";
        }
    }
}

¹ you could use rules with their names instead, obviating this more complex trick

² on older versions of the library you may have to battle to get reference semantics on the with<> data: Live On Coliru

Blackheart answered 17/7, 2019 at 1:3 Comment(0)

Now for expression a I cannot use expect[] or operator>, as other alternatives might be valid. I could be wrong but I think X3 requires me to spell out alternate wrong expressions that can match and if they match they can throw x3::expectation_failure which is cumbersome.

That's simple:

const auto main_rule__def = x3::expect [
 a |
 b |
 c ];

Or, even:

const auto main_rule__def = x3::eps > (
 a |
 b |
 c );

If the answer is no, I would like to present my idea to provide a reasonable solution for this. I believe I would need a new parser directive for that. What should this directive do? It should call the attached semantic action when the parse fails instead.

The existing x3::on_error feature already knows how to do this. Mind you: it's a little bit intricate, but on the same merit it's also pretty flexible.

Basically what it requires is for you to implement a static interface on the ID type (x3::rule<ID, Attr>, likely main_rule_class in your chosen convention). There are compiler examples in the repository that show how to use it.

Side note: there's both on_success and on_error using this paradigm

The on_error member will be called on a default-constructed copy of the ID type, with the parameters ID().on_error(first, last, expectation_failure_object, context).

const auto main_rule__def =
(
 neg_sa[a][a_sa] |
 neg_sa[b][b_sa] |
 neg_sa[c][c_sa] );

To be honest, I think you're paving over your confusion here. What good does it give that you have 3 separate error actions? How would you decide which error happened?

Really there only two possibilities:

Either you DO know that a specific branch was required AND it failed (that's an expectation failure and you can by definition code that as an expectation point inside one of a, b or c).
Or you DO NOT know which branch was implied (say, when branches can start out with similar productions and they failed inside those). In that case, nobody can ever tell which error handler should be invoked, so having more than one is beside the point.

Actually the correct thing to do is fail the main_rule at the higher level and it would mean "none of the possible branches succeeded".

This is the expect[ a | b | c ] way of dealing with it.

Blackheart answered 16/7, 2019 at 9:52 Comment(2)

I think I need to explain it somewhat better. You say: "Or you DO NOT know which branch was implied", Yes that's the case. But I believe it is telling for which branch the iterator has advanced the most before it failed compared to the other alternatives. Then I assume the intention in the input was for that branch. And I'm writing an error message referring to that branch. That is what happens in my c_sa semantic action. But maybe it is possible to do it already with on_error. – Marelya 16/7, 2019 at 15:31

That was my thinking. I'm willing to help out with such. And I just remember solving an eerily similar task before in X3: https://mcmap.net/q/1470923/-spirit-qi-how-can-i-write-a-nonterminal-parser – Blackheart 16/7, 2019 at 21:50

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

But: Automate That, Also, More Flexible

Attribute Propagation, on_success

Really Overdoing Things

Full Listing

Recommended topics

Hot tags

Attribute Propagation, `on_success`