Parsing text file with binary envelope using boost Spririt
Asked Answered
D

1

1

I'm currently trying to write a parser for an ASCII text file that is surrounded by a small envelope with checksum.

The basic structure of the file is: <0x02><"File payload"><0x03><16bit CRC>

and I want to extract the payload in another string to feed it to the next parser.

The parser expression I use to parse this envelope is:

qi::phrase_parse(
    first, last,
    char_('\x02') >> *print >> char_('\x02') >> *xdigit,
    space
);

The input is consumed... and I already tried to dump out the payload:

qi::phrase_parse(
    first, last,
    char_('\x02') >> *print[cout << _1] >> char_('\x02') >> *xdigit,
    space
);

But problem is that every newline, blank etc. is omitted!

Now my questions:

  1. How do I extract the content between the 0x02/0x03 (ETX/STX) bytes correctly without omitting spaces, newlines etc.

  2. And is my approach to first remove the envelope and then parse the payload good or is there another better approach I should use?

Dogfight answered 19/3, 2015 at 12:39 Comment(0)
C
1

Use e.g. qi::seek/qi::confix to get you started (both part of the repository http://www.boost.org/doc/libs/1_57_0/libs/spirit/repository/doc/html/spirit_repository/qi_components/directives/confix.html).

But problem is that every newline, blank etc. is omitted!

Well, that's what a skipper does. Don't use one, or:

Use qi::raw[]

To extract the intervening text, I suggest using qi::raw. Although I'm not sure you actually want to copy it to a string (copying sounds expensive). You could do this probably when the source is a stream (or other source of input iterators).

Seminal rule:

myrule = '\x02' > raw [ *(char_ - '\x03') ] > '\x03';

You could add the checksumming:

myrule = '\x02' > raw [ *(char_ - '\x03') ] [ _a = _checksum(_1) ] > '\x03' >> qi::word(_a);

Assuming

  • qi::locals<uint16_t>
  • _checksum is a suitable Phoenix functor that takes a pair of source iterators and returns uint16_t

Of course you might prefer to keep checksumming outside the parser.

Chancechancel answered 19/3, 2015 at 12:49 Comment(7)
Thanks alot! I missed the 'confix' chapter so far and it seems the best approach to me. In the end I'd prefer to directly parse the file into the data structures of the program. However: Can I do that and calculate the checksum at the same time? Calling two functors?Dogfight
No problem. Just, do as I showed and don't forget about %= assignment to the ruleChancechancel
Ok... thanks. I'll try to follow your suggestion! Before I have to write the lexer I guess. Let's see how far I can get without asking dumb questions again. :-)Dogfight
Just one last question: What is the best approach to develop the parser using boost Qi AND Spirit.Lex? Start with the lexer first or implement parsers for the tokens?Dogfight
Well, if you're building a parser on the tokenstream, you better have a tokenstream :) That said, I don't generally recommen separating the Lexer out. It adds enough complexity to knock your parser over the sweet spot for Qi usageChancechancel
Sorry... I don't understand your last answer. You mean: You'd recommend no lexer at all?Dogfight
It's hard to provide a general recommendation, without knowing the grammar, but in general, once grammars get involved enough to "require" (benefit from) a lexer, I'd say Spirit is likely not the most convenient tool anymore. Of course, if you know exactly the ins and outs and the limitations, you could find the exceptions to this rule of thumb :)Chancechancel

© 2022 - 2024 — McMap. All rights reserved.