parsing into several vector members
Asked Answered
D

1

5

I want to recursively parse a string and store the results in one struct. I've written a parser that can handle one iteration. The input is formatted as follows:

v  1.5 2.0 2.5
v  3.0 3.5 4.0
f 1 2 3
f 4 5 6 
v  4.5 5.0 5.5
v  6.0 6.5 7.0
f 7 8 9
f 10 11 12

The problem is that it only parses the first 4 lines, it stops at the third encountered 'v'. The complete code is given below. How do I modify this code so it also parses the rest of the input into the same struct? I've tried modifying the start rule from start = vertex >> elements to start = *(vertex >> elements), but that only gives a huge compilation error. Same goes for start = +(vertex >> elements). Any ideas how I should modify the rules?

#include <iostream>
#include <sstream>
#include <fstream>

#include "boost/spirit/include/qi.hpp"
#include "boost/spirit/include/support_iso8859_1.hpp"
#include "boost/fusion/include/adapt_struct.hpp"


struct ElemParseData
{
    std::vector<float> verts;
    std::vector<unsigned int> idx;
};

BOOST_FUSION_ADAPT_STRUCT(
    ElemParseData,
    (std::vector<float>, verts)
    (std::vector<unsigned int>, idx)
)


bool doParse( ElemParseData &parseData, const std::string &data )
{
    namespace qi      = boost::spirit::qi;
    namespace iso8859 = boost::spirit::iso8859_1;

    struct objGram : qi::grammar<std::string::const_iterator, ElemParseData(), iso8859::space_type>
    {
        objGram() : objGram::base_type(start)
        {
            vertex   = *('v' >> qi::double_ >> qi::double_ >> qi::double_);
            elements = *('f' >> qi::int_ >> qi::int_ >> qi::int_);

            start = vertex >> elements;
        }

        qi::rule<std::string::const_iterator, ElemParseData(), iso8859::space_type> start;
        qi::rule<std::string::const_iterator, std::vector<float>(), iso8859::space_type> vertex;
        qi::rule<std::string::const_iterator, std::vector<unsigned int>(), iso8859::space_type> elements;

    } objGrammar;

    std::string::const_iterator f = data.cbegin();
    bool res = qi::phrase_parse( f, data.cend(), objGrammar, iso8859::space, parseData );


    // print everything that hasn't been processed by the parser
    std::cout << "#### Trail ####" << std::endl;
    std::cout << std::string(f, data.cend()) << std::endl;

    return res;
}


int main( int argc, char* argv[] )
{
    std::stringstream ss;
    std::filebuf fb;
    if ( fb.open("parsetest.txt", std::ios::in) )
    {
        std::istream is(&fb);
        while (is)
            ss << char(is.get());
        fb.close();
    }


    ElemParseData parseData;
    bool res = doParse( parseData, ss.str() );


    // print results
    std::cout << std::endl << "Parsing result: " << res << std::endl;
    std::cout << "---######### ResultData #########---" << std::endl;
    std::cout << "---- Begin vertex data ----" << std::endl;
    std::vector<float>::iterator it;
    for ( it = parseData.verts.begin(); it != parseData.verts.end(); ++it )
        std::cout << *it << std::endl;
    std::cout << "---- End vertex data ----" << std::endl;

    std::cout << std::endl;

    std::cout << "---- Begin index data ----" << std::endl;
    std::vector<unsigned int>::iterator idxIt;
    for ( idxIt = parseData.idx.begin(); idxIt != parseData.idx.end(); ++idxIt )
            std::cout << *idxIt << std::endl;
    std::cout << "---- End index data ----" << std::endl;

    std::cout << "Press enter to exit" << std::endl;
    std::cin.get();
}

P.S.: If needed, the compilation error can be found here.

EDIT: I'm trying to write an Wavefront .OBJ parser. The input given here is just a simplification of my problem.

Darkish answered 15/7, 2013 at 18:16 Comment(3)
Why is there a need to recursively parse? Looks like iteration would be simpler.Olin
I want recursion, because I do not know how many times 'v' and 'f' repeat. 'v' and 'f' always follow each other, but this combination can occur an infinite amount of time (theoretically)Darkish
Still iteration fits the bill: it's a simple loop of either vertices or indices :) See my answer, of courseInterdict
I
11

There are several ways :)

  1. Custom attribute traits
  2. The same using semantic actions
  3. Everything in semantic actions, at detail level

1. Custom attribute traits

The cleanest, IMO would to replace the Fusion Sequence Adaptation (BOOST_FUSION_ADAPT_STRUCT) by custom container attribute traits for Spirit:

namespace boost { namespace spirit { namespace traits {

    template<> 
        struct is_container<ElemParseData, void> : mpl::true_ { };
    template<> 
        struct container_value<ElemParseData, void> { 
             typedef boost::variant<float, unsigned int> type;
        };
    template <>
        struct push_back_container<ElemParseData, std::vector<float>, void> {
            static bool call(ElemParseData& c, std::vector<float> const& val) {
                c.verts.insert(c.verts.end(), val.begin(), val.end());
                return true;
            }
        };
    template <>
        struct push_back_container<ElemParseData, std::vector<unsigned int>, void> {
            static bool call(ElemParseData& c, std::vector<unsigned int> const& val) {
                c.idx.insert(c.idx.end(), val.begin(), val.end());
                return true;
            }
        };
}}}

Without changes to the grammar, this will simply result in the same effect. However, now you can modify the parser to expect the desired grammar:

    vertex   = 'v' >> qi::double_ >> qi::double_ >> qi::double_;
    elements = 'f' >> qi::int_ >> qi::int_ >> qi::int_;

    start = *(vertex | elements);

And because of the traits, Spirit will "just know" how to insert into ElemParseData. See it live on Coliru

2. The same using semantic actions

You can wire it up in semantic actions:

    start = *(  
               vertex   [phx::bind(insert, _val, _1)] 
             | elements [phx::bind(insert, _val, _1)]
             );

With insert a member of type inserter:

struct inserter {
    template <typename,typename> struct result { typedef void type; };

    template <typename Attr, typename Vec>
        void operator()(Attr& attr, Vec const& v) const { dispatch(attr, v); }
    private:
    static void dispatch(ElemParseData& data, std::vector<float> vertices) {
        data.verts.insert(data.verts.end(), vertices.begin(), vertices.end());
    }
    static void dispatch(ElemParseData& data, std::vector<unsigned int> indices) {
        data.idx.insert(data.idx.end(), indices.begin(), indices.end());
    }
};

This looks largely the same, and it does the same: live on Coliru

3. Everything in semantic actions, at detail level

This is the only solution that doesn't require any kind of plumbing, except perhaps inclusion of boost/spirit/include/phoenix.hpp:

struct objGram : qi::grammar<std::string::const_iterator, ElemParseData(), iso8859::space_type>
{
    objGram() : objGram::base_type(start)
    {
        using namespace qi;

        auto add_vertex = phx::push_back(phx::bind(&ElemParseData::verts, _r1), _1);
        auto add_index  = phx::push_back(phx::bind(&ElemParseData::idx,   _r1), _1);
        vertex   = 'v' >> double_ [add_vertex] >> double_ [add_vertex] >> double_ [add_vertex];
        elements = 'f' >> int_    [add_index]  >> int_    [add_index]  >> int_    [add_index] ;

        start = *(vertex(_val) | elements(_val));
    }

    qi::rule<std::string::const_iterator, ElemParseData(), iso8859::space_type> start;
    qi::rule<std::string::const_iterator, void(ElemParseData&), iso8859::space_type> vertex, elements;
} objGrammar;

Note:

  • One slight advantage here would be that there is less copying of values
  • A disadvantage is that you lose 'atomicity' (if a line fails to parse after, say, the second value, the first two values will have been pushed into the ElemParseData members irrevocably).

Side note

There is a bug in the read loop, prefer the simpler options:

std::filebuf fb;
if (fb.open("parsetest.txt", std::ios::in))
{
    ss << &fb;
    fb.close();
}

Or consider boost::spirit::istream_iterator

Interdict answered 15/7, 2013 at 21:19 Comment(12)
Yes! This is exactly what I wanted. Thank you :). Those traits look interesting. I will look into those some more, see what other cool things I can do with it. About your side note: can you please tell me what the bug is? I don't really see it.. It's about the char cast isn't it?...Darkish
@KrienLinnenbank I just added two more approaches, in decreasing levels of encapsulation. The latter may have some interesting performance characteristics. However, see this: Boost Spirit: “Semantic actions are evil”?Interdict
And option 3. live on Coliru as well. About the bug: it was a spurious character being read after EOFInterdict
@Interdict - Krien mentioned that he tried something like this: vertex = *('v' >> qi::double_ >> qi::double_ >> qi::double_); elements = *('f' >> qi::int_ >> qi::int_ >> qi::int_); start = *(vertex >> elements); if I omit attribute propagation to start - any explanation why this cannot work?Vizor
@Interdict Your answers have always been great, but lately they are simply amazing. stackoverflow.com/search?q=user:85371+[boost-spirit] is a great learning resource about spirit.Wakashan
@G.Civardi The problem is that the attribute that you get from that starting rule is basically std::vector<ElemParseData>. This approach should work if Krien changed the rule signature, passed a vector to phrase_parse and then "flattened" the vector into a single ElemParseData. But that is extra work compared to "simply" adding the specializations sehe recommends.Wakashan
@cv_and_he - I tried to solve the task using semantic actions (similar approach as sehe/2) but I used the rules as I stated above .. Everything compiled fine but during parsing it was endless loop. I would like to understand why it is happening... :-)Vizor
@cv_and_he Indeed, I also managed to get that working before I asked the question, but the extra work concatenating the vectors didn't seem like the "right" way to do it..Darkish
@cv_and_he indeed it does (but given slightly changed Spirit Traits). And thanks for the kind words. I'm a little surprised, though, because I've always been gleaning the 'advanced' bits from you :/Interdict
@Interdict Do you do anything special to compile spirit code successfully on coliru? The times I've tried, even with really simple programs, it has always timed out.Wakashan
@cv_and_he yeah, sorry; I know the author and maintainer there (he's a lounger and he kindly gave me a backdoor to raise the compilation time limits... I'm sure I could tell you, but not here :( It's such a shame that liveworkspace.org is still "Temporarily under maintenance"Interdict
@Interdict No problem, it was simply curiosity. I'm sure that you will help lots of people with that. There is nothing better than a quick demo, especially with code that is usually hard to read.Wakashan

© 2022 - 2024 — McMap. All rights reserved.