Framework for building structured binary data parsers?
Asked Answered
C

2

9

I have some experience with Pragmatic-Programmer-type code generation: specifying a data structure in a platform-neutral format and writing templates for a code generator that consume these data structure files and produce code that pulls raw bytes into language-specific data structures, does scaling on the numeric data, prints out the data, etc. The nice pragmatic(TM) ideas are that (a) I can change data structures by modifying my specification file and regenerating the source (which is DRY and all that) and (b) I can add additional functions that can be generated for all of my structures just by modifying my templates.

What I had used was a Perl script called Jeeves which worked, but it's general purpose, and any functions I wanted to write to manipulate my data I was writing from the ground up.

Are there any frameworks that are well-suited for creating parsers for structured binary data? What I've read of Antlr suggests that that's overkill. My current target langauges of interest are C#, C++, and Java, if it matters.

Thanks as always.

Edit: I'll put a bounty on this question. If there are any areas that I should be looking it (keywords to search on) or other ways of attacking this problem that you've developed yourself, I'd love to hear about them.

Cunaxa answered 14/3, 2011 at 23:20 Comment(4)
Not a ready-to-use framework, but at least some food for thought: okmij.org/ftp/Scheme/binary-parse.scm - the same approach can be applied in C#.Dogmatism
I don't know why you think ANTLR (or any parser generator) is overkill. You need to have some way to define your data, and that requires a specification language, for which there must somewhere be a parser; there is little point in writing a completely ad hoc parser for this purpose. You will also need some kind of parser to parse the raw binary data; conventional parser generators are generally implmeneted to process text but you can easily abuse them to process binary data. Now, for parsing binary data, a regulare must be adequate; a full CFG is probably not necessary.Huckaby
@Ira: Good point about the specification language. My Jeeves workings had one, and I just used it rather than add to it or design it.Cunaxa
... (typo in last sentence of my comment) "Now, for parsing binary data, a regular expression (FSA) may be adequate; a full CFG is probably not necessary".Huckaby
H
2

You might find ASN.1 interesting, as it provide an absract way to describe the data you might be processing. If you use ASN.1 to describe the data abstractly, you need a way to map that abstract data to concrete binary streams, for which ECN (Encoding Control Notation) is likely the right choice.

The New Jersey Machine Toolkit is actually focused on binary data streams corresponding to instruction sets, but I think that's a superset of just binary streams. It has very nice facilities for defining fields in terms of bit strings, and automatically generating accessors and generators of such. This might be particularly useful if your binary data structures contain pointers to other parts of the data stream.

Huckaby answered 18/3, 2011 at 16:54 Comment(0)
F
4

Also you may look to a relatively new project Kaitai Struct, which provides a language for that purpose and also has a good IDE: Kaitai.io

Forewent answered 10/11, 2016 at 12:1 Comment(0)
H
2

You might find ASN.1 interesting, as it provide an absract way to describe the data you might be processing. If you use ASN.1 to describe the data abstractly, you need a way to map that abstract data to concrete binary streams, for which ECN (Encoding Control Notation) is likely the right choice.

The New Jersey Machine Toolkit is actually focused on binary data streams corresponding to instruction sets, but I think that's a superset of just binary streams. It has very nice facilities for defining fields in terms of bit strings, and automatically generating accessors and generators of such. This might be particularly useful if your binary data structures contain pointers to other parts of the data stream.

Huckaby answered 18/3, 2011 at 16:54 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.