Parsing text file with binary envelope using boost Spririt

About

Asked 19/3, 2015 at 12:39 Answered 19/3, 2015 at 12:49

Solved c++parsing boost boost-spirit boost-spirit-qi

I'm currently trying to write a parser for an ASCII text file that is surrounded by a small envelope with checksum.

The basic structure of the file is: <0x02><"File payload"><0x03><16bit CRC>

and I want to extract the payload in another string to feed it to the next parser.

The parser expression I use to parse this envelope is:

qi::phrase_parse(
    first, last,
    char_('\x02') >> *print >> char_('\x02') >> *xdigit,
    space
);

The input is consumed... and I already tried to dump out the payload:

qi::phrase_parse(
    first, last,
    char_('\x02') >> *print[cout << _1] >> char_('\x02') >> *xdigit,
    space
);

But problem is that every newline, blank etc. is omitted!

Now my questions:

How do I extract the content between the 0x02/0x03 (ETX/STX) bytes correctly without omitting spaces, newlines etc.
And is my approach to first remove the envelope and then parse the payload good or is there another better approach I should use?

Dogfight answered 19/3, 2015 at 12:39 Comment(0)

Use e.g. qi::seek/qi::confix to get you started (both part of the repository http://www.boost.org/doc/libs/1_57_0/libs/spirit/repository/doc/html/spirit_repository/qi_components/directives/confix.html).

But problem is that every newline, blank etc. is omitted!

Well, that's what a skipper does. Don't use one, or:

Use `qi::raw[]`

To extract the intervening text, I suggest using qi::raw. Although I'm not sure you actually want to copy it to a string (copying sounds expensive). You could do this probably when the source is a stream (or other source of input iterators).

Seminal rule:

myrule = '\x02' > raw [ *(char_ - '\x03') ] > '\x03';

You could add the checksumming:

myrule = '\x02' > raw [ *(char_ - '\x03') ] [ _a = _checksum(_1) ] > '\x03' >> qi::word(_a);

Assuming

qi::locals<uint16_t>
_checksum is a suitable Phoenix functor that takes a pair of source iterators and returns uint16_t

Of course you might prefer to keep checksumming outside the parser.

Chancechancel answered 19/3, 2015 at 12:49 Comment(7)

Thanks alot! I missed the 'confix' chapter so far and it seems the best approach to me. In the end I'd prefer to directly parse the file into the data structures of the program. However: Can I do that and calculate the checksum at the same time? Calling two functors? – Dogfight 19/3, 2015 at 13:5

No problem. Just, do as I showed and don't forget about %= assignment to the rule – Chancechancel 19/3, 2015 at 13:7

Ok... thanks. I'll try to follow your suggestion! Before I have to write the lexer I guess. Let's see how far I can get without asking dumb questions again. :-) – Dogfight 19/3, 2015 at 13:16

Just one last question: What is the best approach to develop the parser using boost Qi AND Spirit.Lex? Start with the lexer first or implement parsers for the tokens? – Dogfight 19/3, 2015 at 15:18

Well, if you're building a parser on the tokenstream, you better have a tokenstream :) That said, I don't generally recommen separating the Lexer out. It adds enough complexity to knock your parser over the sweet spot for Qi usage – Chancechancel 19/3, 2015 at 15:23

Sorry... I don't understand your last answer. You mean: You'd recommend no lexer at all? – Dogfight 19/3, 2015 at 15:33

It's hard to provide a general recommendation, without knowing the grammar, but in general, once grammars get involved enough to "require" (benefit from) a lexer, I'd say Spirit is likely not the most convenient tool anymore. Of course, if you know exactly the ins and outs and the limitations, you could find the exceptions to this rule of thumb :) – Chancechancel 19/3, 2015 at 16:13

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Use qi::raw[]

Recommended topics

Hot tags

Use `qi::raw[]`