When and Why is XML preferable to CSV? [closed]

I

20

34

sometimes it feels like XML has been used just because it was fashionable.

Ivonneivor answered 30/11, 2009 at 14:25 Comment(0)

X

66

Some strengths:

You can validate XML data against XSD
You can easily provide contracts (as XSD) to other parties that should either create/consume XML data, without literally describing them
You can have one to many relations in multi-levels in XML data representation
XML is arguably more readable than CSV
XML is natively supported by the .net framework

To name a few from the top of my head.

Xena answered 30/11, 2009 at 14:25 Comment(11)

This list is already quite good. In my opinion, you should also add that there are standard parsers available everywhere, and that XML compresses quite good, so the larger size is not really an issue. – Subaquatic 30/11, 2009 at 14:34

XML is more readable than CSV - > far from always! firstname,lastnames like John,Doe Bart,Smith Will,Bate as opposed to the whole <firstname><lastname> stuf for example. With few fields, the tags make things often less readable – Phia 30/11, 2009 at 14:39

+1 But XML is not always more readable – Darby 30/11, 2009 at 14:42

@Phia & @surfrbum: I didn't say that XML is faster readable than CSV. If you only have few columns, CSV is almost certainly faster to read. When there are lots of columns CSV is impossible to read especially when there are undefined values inside. But I edited my answer anyway. :) – Xena 30/11, 2009 at 16:25

I feel XMLs are intuitive to read especially when they hold hierarchical data. Imagine reading a comma separated App.Config file!. – Shamble 30/11, 2009 at 16:53

well, I would say it's a kind of biaise opinion as there is no weakness presented. and therefore give an incomplete information. – Urus 4/12, 2009 at 16:19

It was never stated to be unbias. I work with csv a lot.l Don't like it one bit. Its only advantage is that file sizes are significantly smaller for large datasets. altho oleDB does allow a schema.ini file to specify datatypes it is very hacky imo. I am strongly leaning towards not using csv anymore. – Mccormac 22/12, 2009 at 14:49

"XML is more readable than CSV - > far from always!". – Thai 30/1, 2012 at 16:38

@SteJav: You've misread the statement. XML is arguably more readable than CSV is the way that's written. – Xena 30/1, 2012 at 20:31

Xml vs CSV is a trade off. XML gives you all of these wonderful features at the cost of speed and efficiency. – Panache 27/8, 2012 at 1:47

Robert - No mis-reading. "...You've misread the statement. XML is arguably more readable than CSV..". Hence my statement stands:- "far from always" is therefore not incorrect. – Thai 26/11, 2012 at 10:46

G

23

.csv files are good when your data is strictly tabular and you know its structure. As soon as you start having relationships between different levels of your data, xml tends to work better because relationships can be made obvious (even without schemas) just by nesting.

Gumdrop answered 30/11, 2009 at 14:25 Comment(1)

This has been my experience as well, when choosing between xml and csv. If your data naturally lends itself to a structured table format, then csv may be the best choice. If not, xml may be the safe choice. – Stocking 30/11, 2009 at 16:24

A

20

XML has become the default for its many benefits that lots of other people have already mentioned. So the question really becomes "When and Why is CSV preferable to XML?".

I feel CSV is preferable to XML when: - you are loading simple tabular data - you are in control of both the generation and consumption of the data file - the dataset is large

CSV is perfectly usable if the first 2 points are true, and has a performance benefit that becomes more significant the larger the dataset is.

I did a quick test loading ~8000 records each with 6 text fields. Loading and parsing the XML took ~8 seconds. Loading the CSV took less than 1 second.

The overhead of XML is worth it in a lot of cases, but when the stars align, CSV makes more sense.

Aorist answered 30/11, 2009 at 14:25 Comment(0)

H

14

CSV is useful when you just have a series of a values that relate to some piece of information and you know you will always store values for each field.

XML has the benefit of having self-describing data (tags) and having hierarchy - which gives you a lot more flexibility in the way that you store the data.

Huffman answered 30/11, 2009 at 14:25 Comment(0)

O

8

I found an interesting performance test on the net. God example of drawbacks of XML when the features of XML is not needed.

"I tried Steven's experiment from a different angle. I filled an Excel XP spreadsheet with a single-digit number, saved it in both XML and in a comma-delimited text file (CSV). I then compressed both with WinZip and then opened both with Excel. Here's what I found:

The XML file was 840MB, the CSV 34MB -- a 2,500% difference Compressed, the XML file was 2.5MB, the CSV 0.00015MB (150KB) -- a 1,670% difference.

Equally dramatic is the time it took to uncompress and render the files as an Excel spreadsheet: It took about 20 minutes with the XML file; the CSV took 1 minute -- a 2,000% difference."

http://www.xml.com/pub/a/2004/12/15/deviant.html

Overreach answered 30/11, 2009 at 14:25 Comment(1)

That's a limit case because you have a lot of small data with a '<data></data>' delimiter that weighs the file. – Butterflies 23/7, 2019 at 9:15

H

7

You can have a much more complex hierarchy, etc. and structure with XML vs. CSV. It offers a lot more flexibility.

Highboy answered 30/11, 2009 at 14:25 Comment(1)

Flexibility always comes with complexity. The more stuff your TV does, the more buttons on the remote. – Paranoia 9/2, 2013 at 15:59

S

5

XML is preferrable over CSV when the data is unstructured (unknown schema) and will be read by a human.

Arguably, unless the data contains predominantly text, CSV is also meant for human consumption.

Also relevant, is if your data is 2 or 3 dimensional. CSV is most suitable for 2 dimensional text, and due to its' verbosity, XML works well with 3 dimensional data.

The whole "standardness" of XML is hyperbole, and should not be taken literally. XML does have huge technical issues and many of the solutions aren't particularly elegant, or in many cases useful:

It uses text to specify its own text-encoding (chicken and egg?)
None of the more common schema languages for XML work particularly well.
The ancient and commonplace way of creating mark-up languages using <tags> is not particularly helpful as a standard.
XML tries to retroactively shoehorn more powerful mark-up languages such as the SGML based ones, into itself, creating a mess of incompatible legacy.
It still remains to be determined whether or not XML text escape sequences can work for anything but the most simple cases (ie. friendly data).

To be clear, XML is probably the incorrect choice for 90% of the data interchange it is currently being used for, since those uses break some or all of the above assumptions.

Stimson answered 30/11, 2009 at 14:25 Comment(0)

B

4

I have found of the greatest advantages of XML to be the parsing functionality and the strict validation that comes out-of-the-box with most XML libraries. The insistence on well-formedness and easy-to-understand error message (xyz not closed in line x, column y) are a real help compared to hunting broken values, or unknown behaviour, because of an error in the CSV file.

Banneret answered 30/11, 2009 at 14:25 Comment(0)

D

4

Of course it is fashionable and buzz-worthy sometimes. It all depends on your application. I prefer config files in XML because they are easy to parse. Whereas, I use CSV files for DataGridView or database dumps.

This Daily WTF : XML vs CSV The Choice is Obvious will help you make your decision ;)

Darby answered 30/11, 2009 at 14:25 Comment(0)

P

3

In addition to the other answers, XML allows you to specify which character set the document is in.

Prohibitive answered 30/11, 2009 at 14:25 Comment(0)

S

2

I would say use XML (and or JSON) because someday you or someone (with a short temper and a large gun collection) may have to go find an error in the CSV data.

So yes, I'm saying readability, don't forget to think of the other guy! He may be thinking about you.

Segarra answered 30/11, 2009 at 14:25 Comment(1)

+1 lol yes very true - Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live. ~Martin Golding – Sabec 24/1, 2012 at 11:54

F

2

I don't have enough reputation to comment on the relevant answer, but someone suggested compressing the XML as a way to gain size parity with csv formats. While this is true, XML compression can somtimes come back to bite you. If you are transferring XML data from point to point and it fails, it's nice to be able to read the XML and figure out what went wrong. If the XML is compressed and the transfer fails, it's sometimes not possible to decompress it and examine the contents. In other words compressing XML cancels out the human-readability advantage it has.

Fritter answered 30/11, 2009 at 14:25 Comment(1)

That's why you compress it with gzip (transparently using zlib) you can view the data with zcat as easily as you can cat a CSV file and gzip -t will tell you if there has been any corruption. How do you know if your csv file has been corrupted? – Parttime 30/11, 2009 at 16:31

M

2

CSV is more lightweight if you want to move things about since its normally 2 more times smaller than XML

XML is standard and won't be hit by different OS'es version of CSV

Maestricht answered 30/11, 2009 at 14:25 Comment(2)

If you use attributes you can really cut down on how large the XML is. – Nonprofessional 30/11, 2009 at 14:30

You can also compress XML quite good, even if you don't use attributes. I have some test data here that compressed XML with really small (read: one to three letter tags and attributes) elements is compressed to about 14% of the original size. – Subaquatic 30/11, 2009 at 14:35

S

1

You can easily traverse through XML data even when you have complex data.

Check these links:

Sherrell answered 30/11, 2009 at 14:25 Comment(0)

T

1

XML provides a way of tagging your data with metadata (provided by the tag names and attribute names), whereas CSV does not. Couple this with the ability to define structured hierarchies and it makes XML easier to understand when provided with just the data, whereas CSV would require an accompanying tool or document to describe how each value is interpreted.

Thunderstone answered 30/11, 2009 at 14:25 Comment(0)

L

0

There are existing parsers and emitters for it in every language and database
They deal with encoding for me
They deal with escaping for me

That's all that matters to me.

Sure, there's a semi-standard way to do escaping in CSV (i.e., "the way Excel does it"), and it's not exactly hard to write yourself, but it does take some time. And then you've got to implicitly agree on a character encoding out-of-band. But then, because it's so simple, people try to write it themselves, and invariably screw up either #2 or #3.

JSON also meets #2 and #3 and is getting close to satisfying #1. It's also arguably simpler, at least for non-document files. Not surprisingly, I find myself using it more and more, internally and externally.

Larcher answered 30/11, 2009 at 14:25 Comment(0)

M

0

I've also found that some cvs generators/parsers have a lot of difficulty with general text data. Long text strings with a lot of carriage returns and commas and quotations, etc etc, just make life really difficult when it comes to manipulating a cvs.

SSMS likes to truncate csv for fun.

Marchpast answered 30/11, 2009 at 14:25 Comment(1)

Exactly the problem. If you are considering csv, you should more consider colon separation such as, key:value:key:value or whatever. This is how *nix has always done it and for good reason. – Stridulate 30/11, 2009 at 15:41

V

0

And again one more for XML: The X in XML stands for Extensible (I know, not really mnemonic :-P). That means, with the help of the XML namespace mechanism, you can join any two XML languages you like and combine them in the same document. Given that there is only one CSV 'language' (not counting the myriads of delimiter styles), XML can handle quite a lot of complexity, and that in a modular way.

This however, is the advantage of CSV: If you really have tabular data, XML syntax is most often overkill.

Violone answered 30/11, 2009 at 14:25 Comment(1)

only one CSV 'language' = Quick question what's the separator for a csv file in the european version of Excel? – Parttime 30/11, 2009 at 16:30

A

0

Structured, human readable, easier to edit, validation, parsability, transformability, typing, namespaces, powerful libraries behind it, are all amongst many of the reasons.

Above all else though it is standard.

Asylum answered 30/11, 2009 at 14:25 Comment(2)

HTML is SGML. XML is SGML. HTML is NOT XML. XHTML is XML. – Thunderstone 30/11, 2009 at 14:36

Good point. Strong similarities being XML is SGML, but yes HTML is not XML. XHTML is my preference these days. I like my HTML to be as structured as XML. Still preferable to coding a web site in CSV though ;) – Asylum 30/11, 2009 at 15:1

P

-2

And I also prefer it because it's much more readable.

Probity answered 30/11, 2009 at 14:25 Comment(0)

Recommended topics

Hot tags