Preferred way to parse a custom binary flat file?
Asked Answered
G

5

6

I have a flat file generated by a C program. Each record in the file consists of a fixed length header followed by data. The header contains a field indicating the size of the following data. My ultimate goal is to write a C#/.NET program to query this flat file, so I'm looking for the most efficient way to read the file using C#.

I am having trouble finding the .NET equivalent of line 7 in the following code. As far as I can tell, I have to issue multiple reads (one for each field of the header using BinaryReader) and then issue one read to get the data following the header. I'm trying to learn a way to parse a record in two read operations (one read to get the fixed length header and a second read to get the following data).

This is the C code I am trying to duplicate using C#/.NET:

struct header header; /* 1-byte aligned structure (48 bytes) */
char *data;

FILE* fp = fopen("flatfile", "r");
while (!feof(fp))
{
  fread(&header, 48, 1, fp);
  /* Read header.length number of bytes to get the data. */
  data = (char*)malloc(header.length);
  fread(data, header.length, 1, fp);
  /* Do stuff... */
  free(data);
}

This is C structure of the header:

struct header
{
    char  id[2];
    char  toname[12];
    char  fromname[12];
    char  routeto[6];
    char  routefrom[6];
    char  flag1;
    char  flag2;
    char  flag3;
    char  flag4;
    char  cycl[4];
    unsigned short len;
};

I've come up with this C# object to represent the C header:

[StructLayout(LayoutKind.Sequential, Pack = 1, CharSet = CharSet.Ansi, Size = 48)]
class RouterHeader
{
    [MarshalAs(UnmanagedType.ByValArray, SizeConst = 2)]
    char[] Type;

    [MarshalAs(UnmanagedType.ByValArray, SizeConst = 12)]
    char[] To;

    [MarshalAs(UnmanagedType.ByValArray, SizeConst = 12)]
    char[] From;

    [MarshalAs(UnmanagedType.ByValArray, SizeConst = 6)]
    char[] RouteTo;

    [MarshalAs(UnmanagedType.ByValArray, SizeConst = 6)]
    char[] RouteFrom;

    [MarshalAs(UnmanagedType.ByValArray, SizeConst = 4)]
    char[] Flags;

    [MarshalAs(UnmanagedType.ByValArray, SizeConst = 4)]
    char[] Cycle;

    UInt16 Length;
}
Gayden answered 21/8, 2010 at 15:40 Comment(6)
How does your header look like?Timehonored
possible duplicate of A C# equivalent of C's fread file i/oConfounded
I've edited the post to contain the header structure.Gayden
You should look at Marshal.PtrToStructure (see example also): msdn.microsoft.com/en-us/library/4ca6d5z7.aspxTimehonored
The link Hans Passant provided has the answer. I would give him credit, but I'm not sure what to do since he posted as a comment instead of an answer.Gayden
Have you looked at fixed array buffers (msdn link: msdn.microsoft.com/en-us/library/zycewsya.aspx )(other link: dotnetperls.com/fixed-buffer)?Signora
G
0

The link Hans Passant provided has the answer. I would give him credit, but I'm not sure what to do since he posted as a comment instead of an answer.

Gayden answered 25/8, 2010 at 14:26 Comment(0)
E
2

Well, you can use one call to Stream.Read to read the length (although you need to check the return value to make sure you've read everything you've asked for; you may not get it all in one go) and then another call to Stream.Read to get the data itself into a byte array (again, looping until you've read anything). Once it's all in memory, you can pick out the appropriate bytes from the buffer to create an instance of your struct (or class).

Personally I prefer to do all of this explicitly rather than using StructLayout - the latter always feels somewhat brittle to me.

Earache answered 21/8, 2010 at 17:34 Comment(0)
A
0

As an alternative, you could try using a union-like structure to create a header struct that you could read in one go (as a String of an appropriate length for example), but then are able to reference the individual fields when you're information from that struct.

You can find some more details on using StructLayouts and FieldOffsets to achieve that sort of thing here.

There's some further discussion on reading & writing binary files with C# here. It's suggested that using BinaryReader to read in multiple fields is generally more efficient for small (<40) number of fields.

Arrowroot answered 21/8, 2010 at 15:52 Comment(0)
K
0

I would reccomend you just write code (one statement per field) that reads the fields one by one. It is a little extra code, but gives more flexibility. To begin with, it relieves you from the requirement that your in memory datastructure has to have the same layout as the file has on disk. It could be part of another structure, you can use String in stead of char[], for example.

Also consider: What if you need to write a version 2.0, where a new field is added at the end of the struct? In your example, you'd need to define a new struct, and you'll be stuck with both definitions. If you choose the read/write in code, you can support both with the same code by reading the new element conditionally.

Kelter answered 21/8, 2010 at 19:20 Comment(0)
A
0

My inclination would be to read the data into an array, and then assemble the data object appropriately, using shifts and adds to handle words, longwords, etc. I have some utility classes to handle that sort of thing.

Antinode answered 21/8, 2010 at 22:10 Comment(0)
G
0

The link Hans Passant provided has the answer. I would give him credit, but I'm not sure what to do since he posted as a comment instead of an answer.

Gayden answered 25/8, 2010 at 14:26 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.