Parsing a fixed-width formatted file in Java
Asked Answered
J

10

20

I've got a file from a vendor that has 115 fixed-width fields per line. How can I parse that file into the 115 fields so I can use them in my code?

My first thought is just to make constants for each field like NAME_START_POSITION and NAME_LENGTH and using substring. That just seems ugly, so I'm curious about better ways of doing this. None of the couple of libraries a Google search turned up seemed any better, either.

Jovita answered 22/10, 2009 at 20:38 Comment(2)
You might want to check out the related question #5885563Nuncupative
look at com.ancientprogramming.fixedformat4j libraryRemmer
O
23

I would use a flat file parser like flatworm instead of reinventing the wheel: it has a clean API, is simple to use, has decent error handling and a simple file format descriptor. Another option is jFFP but I prefer the first one.

Operation answered 22/10, 2009 at 20:45 Comment(6)
I just wanted to follow up with a thanks for a pointer to Flatworm. It works like a champ and my whole team at work is now using it.Jovita
@Jovita I'm glad to know you liked it. And thank you very much for the follow up, it's very much appreciated!Operation
I tried the library a few days ago and it was broken beyond repair. I would try the previous version but i do not see any docs for itMicrobalance
This is a great tool! Is there a way to integrate it into some kind of editor - eclipse?Breechloader
Are you guys still using this flatworm tool? the DTD reference is broken in the file format XML definition. How can I resolve this?Yiddish
Late to the game but github.com/ffpojo/ffpojo looks nice as it maps to and from POJOsDrillstock
F
8

I've played arround with fixedformat4j and it is quite nice. Easy to configure converters and the like.

Flagellate answered 22/10, 2009 at 21:2 Comment(1)
Note that ff4j uses runtime annotations, which makes mass parsing pretty slow.Knothole
C
7

uniVocity-parsers comes with a FixedWidthParser and FixedWidthWriter the can support tricky fixed-width formats, including lines with different fields, paddings, etc.

// creates the sequence of field lengths in the file to be parsed
FixedWidthFields fields = new FixedWidthFields(4, 5, 40, 40, 8);

// creates the default settings for a fixed width parser
FixedWidthParserSettings settings = new FixedWidthParserSettings(fields); // many settings here, check the tutorial.

//sets the character used for padding unwritten spaces in the file
settings.getFormat().setPadding('_');

// creates a fixed-width parser with the given settings
FixedWidthParser parser = new FixedWidthParser(settings);

// parses all rows in one go.
List<String[]> allRows = parser.parseAll(new File("path/to/fixed.txt")));

Here are a few examples for parsing all sorts of fixed-width inputs.

And here are some other examples for writing in general and other fixed-width examples specific to the fixed-width format.

Disclosure: I'm the author of this library, it's open-source and free (Apache 2.0 License)

Cloistral answered 2/5, 2016 at 3:49 Comment(0)
U
1

Here is a basic implementation I use:

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.OutputStream;
import java.io.OutputStreamWriter;
import java.io.Reader;
import java.io.Writer;

public class FlatFileParser {

  public static void main(String[] args) {
    File inputFile = new File("data.in");
    File outputFile = new File("data.out");
    int columnLengths[] = {7, 4, 10, 1};
    String charset = "ISO-8859-1";
    String delimiter = "~";

    System.out.println(
        convertFixedWidthFile(inputFile, outputFile, columnLengths, delimiter, charset)
        + " lines written to " + outputFile.getAbsolutePath());
  }

  /**
   * Converts a fixed width file to a delimited file.
   * <p>
   * This method ignores (consumes) newline and carriage return
   * characters. Lines returned is based strictly on the aggregated
   * lengths of the columns.
   *
   * A RuntimeException is thrown if run-off characters are detected
   * at eof.
   *
   * @param inputFile the fixed width file
   * @param outputFile the generated delimited file
   * @param columnLengths the array of column lengths
   * @param delimiter the delimiter used to split the columns
   * @param charsetName the charset name of the supplied files
   * @return the number of completed lines
   */
  public static final long convertFixedWidthFile(
      File inputFile,
      File outputFile,
      int columnLengths[],
      String delimiter,
      String charsetName) {

    InputStream inputStream = null;
    Reader inputStreamReader = null;
    OutputStream outputStream = null;
    Writer outputStreamWriter = null;
    String newline = System.getProperty("line.separator");
    String separator;
    int data;
    int currentIndex = 0;
    int currentLength = columnLengths[currentIndex];
    int currentPosition = 0;
    long lines = 0;

    try {
      inputStream = new FileInputStream(inputFile);
      inputStreamReader = new InputStreamReader(inputStream, charsetName);
      outputStream = new FileOutputStream(outputFile);
      outputStreamWriter = new OutputStreamWriter(outputStream, charsetName);

      while((data = inputStreamReader.read()) != -1) {
        if(data != 13 && data != 10) {
          outputStreamWriter.write(data);
          if(++currentPosition > (currentLength - 1)) {
            currentIndex++;
            separator = delimiter;
            if(currentIndex > columnLengths.length - 1) {
              currentIndex = 0;
              separator = newline;
              lines++;
            }
            outputStreamWriter.write(separator);
            currentLength = columnLengths[currentIndex];
            currentPosition = 0;
          }
        }
      }
      if(currentIndex > 0 || currentPosition > 0) {
        String line = "Line " + ((int)lines + 1);
        String column = ", Column " + ((int)currentIndex + 1);
        String position = ", Position " + ((int)currentPosition);
        throw new RuntimeException("Incomplete record detected. " + line + column + position);
      }
      return lines;
    }
    catch (Throwable e) {
      throw new RuntimeException(e);
    }
    finally {
      try {
        inputStreamReader.close();
        outputStreamWriter.close();
      }
      catch (Throwable e) {
        throw new RuntimeException(e);
      }
    }
  }
}
Undoing answered 11/6, 2016 at 11:44 Comment(2)
2 years later but I hope you see this. Why do you need to check if the read in character, data, is equal to 13 or 10 if the only possible returns are the character from the inputstream or -1 which denotes the end of a file?Ujiji
You are correct ... This implementation is used for fixed width records that end in newline.Undoing
S
1

Most suitable for Scala, but probably you could use it in Java

I was so fed up with the fact that there is no proper library for fixed length format that I have created my own. You can check it out here: https://github.com/atais/Fixed-Length

A basic usage is that you create a case class and it's described as an HList (Shapeless):

case class Employee(name: String, number: Option[Int], manager: Boolean)

object Employee {

    import com.github.atais.util.Read._
    import cats.implicits._
    import com.github.atais.util.Write._
    import Codec._

    implicit val employeeCodec: Codec[Employee] = {
      fixed[String](0, 10) <<:
        fixed[Option[Int]](10, 13, Alignment.Right) <<:
        fixed[Boolean](13, 18)
    }.as[Employee]
}

And you can easily decode your lines now or encode your object:

import Employee._
Parser.decode[Employee](exampleString)
Parser.encode(exampleObject)
Seismism answered 12/7, 2017 at 10:18 Comment(0)
A
1

If your string is called inStr, convert it to a char array and use the String(char[], start, length) constructor

char[] intStrChar = inStr.toCharArray();
String charfirst10 = new String(intStrChar,0,9);
String char10to20 = new String(intStrChar,10,19);
Arella answered 1/6, 2018 at 13:7 Comment(0)
F
0

The Apache Commons CSV project can handle fixed with files.

Looks like the fixed width functionality didn't survive promotion from the sandbox.

Fluctuant answered 22/10, 2009 at 21:33 Comment(4)
That seems to be "in the sandbox". I'm not familiar with commons, but I get the impression that it means it's not 'done' yet?Fraud
It means there is no official release. This is significantly different from "doesn't work". Based on the amount of time it's been in the sandbox, no one appears to to be pushing it towards release, but it still ends up getting widely used.Fluctuant
Can you elaborate on that? I just had a look at the API and could not find any hint/proof that it actually supports fixed width columns instead of delimiters. BTW the current URL is commons.apache.org/proper/commons-csvColorado
You could vote for such a feature issues.apache.org/jira/browse/CSV-272Discolor
G
0

Here is the plain java code to read fixedwidth file:

import java.io.File;
import java.io.FileNotFoundException;
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.Arrays;
import java.util.List;

public class FixedWidth {

    public static void main(String[] args) throws FileNotFoundException, IOException {
        // String S1="NHJAMES TURNER M123-45-67890004224345";
        String FixedLengths = "2,15,15,1,11,10";

        List<String> items = Arrays.asList(FixedLengths.split("\\s*,\\s*"));
        File file = new File("src/sample.txt");

        try (BufferedReader br = new BufferedReader(new FileReader(file))) {
            String line1;
            while ((line1 = br.readLine()) != null) {
                // process the line.

                int n = 0;
                String line = "";
                for (String i : items) {
                    // System.out.println("Before"+n);
                    if (i == items.get(items.size() - 1)) {
                        line = line + line1.substring(n, n + Integer.parseInt(i)).trim();
                    } else {
                        line = line + line1.substring(n, n + Integer.parseInt(i)).trim() + ",";
                    }
                    // System.out.println(
                    // S1.substring(n,n+Integer.parseInt(i)));
                    n = n + Integer.parseInt(i);
                    // System.out.println("After"+n);
                }
                System.out.println(line);
            }
        }

    }

}
Gumwood answered 17/9, 2015 at 16:41 Comment(0)
H
0
/*The method takes three parameters, fixed length record , length of record which will come from schema , say 10 columns and third parameter is delimiter*/
public class Testing {

    public static void main(String as[]) throws InterruptedException {

        fixedLengthRecordProcessor("1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10", 10, ",");

    }

    public static void fixedLengthRecordProcessor(String input, int reclength, String dilimiter) {
        String[] values = input.split(dilimiter);
        String record = "";
        int recCounter = 0;
        for (Object O : values) {

            if (recCounter == reclength) {
                System.out.println(record.substring(0, record.length() - 1));// process
                                                                                // your
                                                                                // record
                record = "";
                record = record + O.toString() + ",";
                recCounter = 1;
            } else {

                record = record + O.toString() + ",";

                recCounter++;

            }

        }
        System.out.println(record.substring(0, record.length() - 1)); // process
                                                                        // your
                                                                        // record
    }

}
Hasdrubal answered 10/7, 2016 at 13:13 Comment(0)
N
0

Another library that can be used to parse a fixed width text source: https://github.com/org-tigris-jsapar/jsapar

Allows you to define a schema in xml or in code and parse fixed width text into java beans or fetch values from an internal format.

Disclosure: I am the author of the jsapar library. If it does not fulfill your needs, on this page you can find a comprehensive list of other parsing libraries. Most of them are only for delimited files but some can parse fixed width as well.

Nun answered 10/6, 2019 at 14:33 Comment(1)
If you're going to link to a library you wrote, as can be seen on the project's contributor's page, you must disclose that it's yours directly in your answer. Posts that link to affiliated content and do not disclose that affiliation will be marked as spam and removed. Please read this guide for how to format your posts.Klopstock

© 2022 - 2024 — McMap. All rights reserved.