Why are fixed-width file formats still in use?
Asked Answered
R

6

17

Are there any advantages to a fixed-width file format over something like XML? I realize XML would likely take up more disk space to store the same amount of data but the file could also be compressed. I guess you could also, in theory, read a specific piece of data based on where it is in the file (just grab those bytes). But other than that, what else?

Rusty answered 5/10, 2011 at 19:41 Comment(7)
File formats for what? Do you want images or video as compressed XML?Manner
No, I really don't. I don't know what kind of file formats are you talking about and it's not clear to me from your question. Maybe I just didn't encounter those formats. Could you specify that more explicitly?Manner
I haven't seen a fixed-width file format for years. I thought they died out with punched cards. They're very efficient of course, but have zero potential for change.Yuzik
I have seem then for well logs (drilling), and for geological information, such as CanStrat. Usually, these are legacy files, and may be coming from/going to equipment that just hasn't been updated in a long time. Sometimes, these files are also printed out on dot-matrix printers and the fixed width ensure that it stays in the width limit (ie: 72 characters).Torbernite
As long as there are mainframes and COBOL there will be fixed-width file formats.Opposable
I'll necro this post. @MichaelKay data warehousing uses fixed-width files like mad, even to this day. Personally, for database work, I prefer fixed width over any delimited or hierarchical format 24/7.Narrowminded
In my experience, most of the fixed-width data formats in use are from legacy systems, or are interacting with a legacy system.Fiddlestick
I
29

When the data is large (Giga/Terra-bytes), fixed width format files can be MUCH more efficient.

Since each record and field has fixed sizes, you can simply seek to the (for example) n-millionth row and read a couple of records from there. You can also memory map the whole file into memory and get rather efficient and easy random access to everything.

XML files aren't a good fit in these cases.

Instillation answered 27/1, 2012 at 21:48 Comment(1)
"can be" - yes. But not always. I remember a COBOL application that was rewritten in a 4GL and ran four times faster; on investigation the speed gain was entirely due to use of variable-length fields and records. COBOL was spending all its time doing unnecessary space-padding to fill up the fixed width.Yuzik
M
9

XML is complicated. And especially if you do validation according to a schema. This may not look important, because somebody else already wrote XML parser that you can use. But this adds quite a lot of processing, which means it takes longer. This may not be a problem in many cases, but sometimes can.

If you want to save one integer into a custom file format, it takes just 4 bytes and when you want to load it, you just copy those 4 bytes into memory (assuming the file format and your platform have the same endianness). But with XML, it might take something like 10–30 bytes. And loading it is means comparing strings and parsing decimal representations of integers and probably more.

Again, those performance and storage size differences may very well be too minuscule for you to even consider (and the work that it would take to devise custom format might be non-trivial), but in many cases, those differences do matter.

For example, I encountered a system that uses SMS messages for transmission of some data. That means you have 140 bytes (!) per message. And the device that sends and recieves those messages doesn't have GBs of memory and GHz of CPU. In that situation, you make sure that every bit counts and you certainly don't use XML.

Manner answered 5/10, 2011 at 19:59 Comment(2)
Thanks for the answer. However, I'd argue that a complex fixed-width file is more confusing than a complex XML file. At least you can read the XML file!Rusty
I still have no idea what kind of fixed-width formats are you talking about, so I can't respond to that.Manner
H
8

I know this is old, but I deal with both Fixed Width and XML daily. You can pretty much sum it up to:

XML = Readability

Fixed Width = Speed and Low Resource Consumption

XML is largely for readability by a human. I don't care what anyone says about structure and validation. If you're running a system that really doesn't need and should have humans reading the files your passing back and forth, then you're really just adding this as overhead to the amount of time it takes to process the file and to the size of the file, affecting how long the file may take to transfer it contents as well as another impact to processing. All of this will also impact memory usage by the system consuming the XML file. There are advantages however to XML. You can more loosely define your structure. Sometimes its easier if your file and code don't both require a field to be 255 characters long. Only that your code loads that limit period. Another advantage is that XML can/should come with an XML Schema that defines requirements of the XML contents. This helps with having multiple system's that consume a single API. If you can provide your schema to a developer, they can pretty quickly make typed objects that serialize into proper formatted and structured XML.

Fixed Width is for speed and minimal resource consumption. It can be more tedious to setup than XML. Ensuring that all systems know exact positions of "columns" in the Fixed Width file. Often not all systems utilize the same or all columns, so you end up with only a single system that fully understands the Fixed Width contents. This can make it challenging to grow an API or System utilizing your transferred file contents. However because there are no field labels, no tags, nothing but raw data, you can often get a smaller package sent across the wire. Not always true, in some cases, you may have a large number of text fields that common have small amounts of data stored in the fields, but must retain a large column width for one off cases where a paragraph length was input. Now you've got a bunch of white space holding positions in your Fixed Width file and XML may actually reduce your overall package size.

Generally speaking though, XML is for readability. You can't typically just pick up a Fixed Width file or even a CSV file and immediately start grasping at what the data means. Where as well labeled XML files, you can.

There's a number of advantages and disadvantages that I've not gone into, but this is where I see the real meat and potatoes of the differences.

Hoodlum answered 25/1, 2018 at 23:36 Comment(1)
I saw a whitepaper years ago where an enterprise looked at its network use, and 75% of the bandwidth was useless XML going back and forth between internal "services". This does not even count the CPU burned in parsing it. XML has its place, just like every other tool, but it got turned into one heck of a hammer and a lot of people went looking for nails, and found them everywhere.Narrowminded
N
5

I too had the same questions until I realized the power of fixed width. We have a table that has millions of records extracting them into a file as a JSON swelled up the file size to 15GB and 2+hrs. While using the fixed widht brought it down to 6.5GB and 15 minutes.

Extraction and writing a fixed width is faster than JSON.

I tried CSV's too and even here the Fixed width scored better.

Nemesis answered 8/8, 2016 at 6:2 Comment(0)
C
2

Probably mostly for legacy reasons, since parsers for XML, JSON (etc) exist pretty much on all platforms.

Theoretically fixed-width formats can be more space-efficient, as you suggest; and reading bit simpler. But these do not seem like significant benefits.

For what it is worth, tabular (but not fixed-width) formats like CSV have their uses, combining bit more compact representation and possibly better readability; CSV works quite nicely for map/reduce style jobs.

Copalm answered 5/10, 2011 at 19:48 Comment(0)
C
1

One reason could be that processing XML (not just reading and loading into memory structures, but think about regex searching in an XML file vs. a simple fixed-width or delimited file, or even making manual quick-fixes to bad data) is more complicated than fixed-width files. Sure, there are many libraries that can do it for you now, but if there isn't one for the platform you're working on, do you really want to write an XML parser, or a program that just reads n bytes at location x?

Chromomere answered 5/10, 2011 at 19:44 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.