Are there any Java Frameworks for binary file parsing?
Asked Answered
S

8

17

My problem is, that I want to parse binary files of different types with a generic parser which is implemented in JAVA. Maybe describing the file format with a configuration file which is read by the parser or creating Java classes which parse the files according to some sort of parsing rules.

I have searched quite a bit on the internet but found almost nothing on this topic.

What I have found are just things which deal with compiler-generators (Jay, Cojen, etc.) but I don't think that I can use them to generate something for parsing binary files. But I could be wrong on that assumption.

Are there any frameworks which deal especially with easy parsing of binary files or can anyone give me a hint how I could use parser/compiler-generators to do so?

Update: I'm looking for something where I can write a config-file like

file:
  header: FIXED("MAGIC")
  body: content(10)

content:
  value1: BYTE
  value2: LONG
  value3: STRING(10)

and it generates automatically something which parses files which start with "MAGIC", followed by ten times the content-package (which itself consists of a byte, a long and a 10-byte string).

Update2: I found something comparable what I'm looking for, "Construct", but sadly this is a Python-Framework. Maybe this helps someone to get an idea, what I'm looking for.

Subternatural answered 13/3, 2009 at 21:45 Comment(0)
P
11

give a try to preon

Passional answered 12/6, 2009 at 9:1 Comment(3)
Unfortunately the license is GPL. Update: It's GPL Version 2. See here for more info: #6699240Hat
@Hat - As a comment on that Question indicates, you can get a commercial license if GPL doesn't work for you.Beebe
Preon is now MIT licensedCordell
L
12

Using Preon:

public class File {

  @BoundString(match="MAGIC")
  private String header;

  @BoundList(size="10", type=Body.class)
  private List<Body> body;

  private static class Body {

    @Bound
    byte value1;

    @Bound
    long value2;

    @BoundString(size="10")
    String value3;

  }


}

Decoding data:

Codec<File> codec = Codecs.create(File.class);
File file = codecs.decode(codec, buffer);

Let me know if you are running into problems.

Lamori answered 10/7, 2009 at 19:5 Comment(0)
P
11

give a try to preon

Passional answered 12/6, 2009 at 9:1 Comment(3)
Unfortunately the license is GPL. Update: It's GPL Version 2. See here for more info: #6699240Hat
@Hat - As a comment on that Question indicates, you can get a commercial license if GPL doesn't work for you.Beebe
Preon is now MIT licensedCordell
K
9

I have used DataInputStream for reading binary files and I write the rules in Java. ;) Binary files can have just about any format so there is no general rule for how to read them.

Frameworks don't always make things simpler. In your case, the description file is longer than the code to just read the data using a DataInputStream.

public static void parse(DataInput in) throws IOException {
//        file:
//          header: FIXED("MAGIC")
    String header = readAsString(in, 5);
    assert header.equals("MAGIC");
//          body: content(10)
// ?? not sure what this means
//        content:
    for(int i=0;i<10;i++) {
//          value1: BYTE
        byte value1 = in.readByte();
//          value2: LONG
        long value2 = in.readLong();
//          value3: STRING(10)
        String value3 = readAsString(in, 10);
    }
}

public static String readAsString(DataInput in, int len) throws IOException {
    byte[] bytes = new byte[len];
    in.readFully(bytes);
    return new String(bytes);
}

If you want to have a configuration file you could use a Java Configuration File. http://www.google.co.uk/search?q=java+configuration+file

Krongold answered 13/3, 2009 at 21:52 Comment(12)
I know that binary files can have any format, but a PNG-file has always the same structure, a BMP has always the same structure... What I want is: create a "description" for a eg. BMP file, feed it to the framework, feed a BMP and then have easy access to each single element of the parsed content.Subternatural
Kosi2801: If you can write specification or just a serie of "how it really works" steps for such framework, then it's doable. If you can't, it's not.Tecu
@Perter Lawrey: In that simple case you're right, the config is longer. But my goal is to be able to deliver something to the client where I can provide additional configs later to support additional file formats. Think of parsing image files where you cannot update the binary for the customer...Subternatural
@Subternatural I am not sure why you feel providing a text file is any easier than providing a class or jar file. If you need to provide the update as text you can use BeanShell, apache-jci or the builtin Compiler API to compile/load java code on the fly.Krongold
Nice example, but I'd strongly discourage calling that String-constructor! Always define a character set, or it will come back to bite you in the behind ...Alecalecia
With your update you've lined out a solution where the structure of the file to read is hardcoded in the source. What I need is, that the config-file describes the structure of the file and the parser reads it in in an easy-accessible way. Changes only happen in the config file...Subternatural
You can deploy a java file as a config file. There is no difference except the format you use. What is the difference between a config file written in text and a java file written in text. One is neither more hard coded that the other.Krongold
Tried to keep business out as long as possible, but: With the software the customer has to be able to provide a description of the file-format to the app without the need to learn Java and have to deal with all of our (to be done then) interfaces. Just let him "describe" his file-format and done...Subternatural
So instead of learning a sub-set of java (which they might know or be able to get help off the web for) they have to learn a new language which is unlikely to describe all the situations they will need which they definitely won't know and won't have much incentive to invest much time in.Krongold
Exactly. :) If the customer orders it and doesn't want to be convinced, we try to deliver. I also think it's safer and easier to provide him with 2 pages of description and a restricted language instead of allowing him to fiddle him around in the guts of the application and doing all sorts of stuff.Subternatural
This reason is valid. In this case I would suggest something which has a simple mapping to what you would do in Java. Note: you may have to support loops and conditional logic which is not simple. You may find that you introduce more bugs than you solve.Krongold
Right, that's why I've been initially asking for an already existing framework. But it seems, there is nothing in that direction available and I'm on my own here. I think I'll stick to parser generators and create a file-reading backend for the new commands. Thanks for the conversation&ideas :)Subternatural
C
3

Google's Protocol Buffers

Candie answered 15/3, 2009 at 20:2 Comment(1)
Had a short look at it but it seems not to provide bit-level access to the data stream content but encapsulates it somehow (to provide optionality etc.). My assumption is also backed because it seems to be necessary to use the Protocol Buffers on both ends of the comm channel.Subternatural
R
1

Parser combinator library is an option. JParsec works fine, however it could be slow.

Rubber answered 14/3, 2009 at 21:51 Comment(0)
N
1

I have been developing a framework for Java which allows to parse binary data https://github.com/raydac/java-binary-block-parser in the case you should just describe structure of your binary file in pseudolanguage

Nicollenicolson answered 11/7, 2014 at 9:59 Comment(0)
P
-2

You can parse binary files with parsers like JavaCC. Here you can find a simple example. Probably it's a bit more difficult than parsing text files.

Pharynx answered 14/3, 2009 at 9:19 Comment(0)
P
-3

Have you looking into the world of parsers. A good parser is yacc, and there may be a port of it for java.

Proboscidean answered 14/3, 2009 at 3:14 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.