Read fixed width record from text file
Asked Answered
D

7

21

I've got a text file full of records where each field in each record is a fixed width. My first approach would be to parse each record simply using string.Substring(). Is there a better way?

For example, the format could be described as:

<Field1(8)><Field2(16)><Field3(12)>

And an example file with two records could look like:

SomeData0000000000123456SomeMoreData
Data2   0000000000555555MoreData    

I just want to make sure I'm not overlooking a more elegant way than Substring().


Update: I ultimately went with a regex like Killersponge suggested:

private readonly Regex reLot = new Regex(REGEX_LOT, RegexOptions.Compiled);
const string REGEX_LOT = "^(?<Field1>.{6})" +
                        "(?<Field2>.{16})" +
                        "(?<Field3>.{12})";

I then use the following to access the fields:

Match match = reLot.Match(record);
string field1 = match.Groups["Field1"].Value;
Diarmuid answered 2/10, 2008 at 14:49 Comment(2)
Following library can be used: https://github.com/borisdj/FixedWidthParserWriterWira
SoftCircuits.FixedWidthParser is free and makes this very easy. It will also automatically map fixed-width fields to class properties.Peculiarity
H
8

Substring sounds good to me. The only downside I can immediately think of is that it means copying the data each time, but I wouldn't worry about that until you prove it's a bottleneck. Substring is simple :)

You could use a regex to match a whole record at a time and capture the fields, but I think that would be overkill.

Halfblood answered 2/10, 2008 at 14:55 Comment(2)
Yea, I tried to think of a way to use a regex, but think it's the wrong tool for the job and as you said, overkill.Diarmuid
regex? ^(.{8})(.{16})(.*)$ for the above definition of fields, assuming that the last field may or may not be padded out with spaces.Naif
A
34

Use FileHelpers.

Example:

[FixedLengthRecord()] 
public class MyData
{ 
  [FieldFixedLength(8)] 
  public string someData; 

  [FieldFixedLength(16)] 
  public int SomeNumber; 

  [FieldFixedLength(12)] 
  [FieldTrim(TrimMode.Right)]
  public string someMoreData;
}

Then, it's as simple as this:

var engine = new FileHelperEngine<MyData>(); 

// To Read Use: 
var res = engine.ReadFile("FileIn.txt"); 

// To Write Use: 
engine.WriteFile("FileOut.txt", res); 
Allen answered 2/10, 2008 at 15:14 Comment(2)
That's in need of some Generics, maybe I should take a look and do it up :PNaif
-1 for external library dependent solution, suboptimal.Heathcote
H
8

Substring sounds good to me. The only downside I can immediately think of is that it means copying the data each time, but I wouldn't worry about that until you prove it's a bottleneck. Substring is simple :)

You could use a regex to match a whole record at a time and capture the fields, but I think that would be overkill.

Halfblood answered 2/10, 2008 at 14:55 Comment(2)
Yea, I tried to think of a way to use a regex, but think it's the wrong tool for the job and as you said, overkill.Diarmuid
regex? ^(.{8})(.{16})(.*)$ for the above definition of fields, assuming that the last field may or may not be padded out with spaces.Naif
L
8

Why reinvent the wheel? Use .NET's TextFieldParser class per this how-to for Visual Basic: How to read from fixed-width text files.

Laflam answered 23/9, 2012 at 4:50 Comment(0)
N
2

You may have to watch out, if the end of the lines aren't padded out with spaces to fill the field, your substring won't work without a bit of fiddling to work out how much more of the line there is to read. This of course only applies to the last field :)

Naif answered 2/10, 2008 at 15:0 Comment(0)
F
1

Unfortunately out of the box the CLR only provides Substring for this.

Someone over at CodeProject made a custom parser using attributes to define fields, you might wanna look at that.

Flannel answered 2/10, 2008 at 15:17 Comment(0)
C
0

Nope, Substring is fine. That's what it's for.

Camel answered 2/10, 2008 at 14:55 Comment(0)
H
0

You could set up an ODBC data source for the fixed format file, and then access it as any other database table. This has the added advantage that specific knowledge of the file format is not compiled into your code for that fateful day that someone decides to stick an extra field in the middle.

Hedi answered 2/10, 2008 at 19:26 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.