What is the difference between YAML and JSON?
Asked Answered
N

14

939

What are the differences between YAML and JSON, specifically considering the following things?

  • Performance (encode/decode time)
  • Memory consumption
  • Expression clarity
  • Library availability, ease of use (I prefer C)

I was planning to use one of these two in our embedded system to store configure files.

Navarrette answered 13/11, 2009 at 2:42 Comment(9)
Be aware that JSON can be considered a subset of YAML: en.wikipedia.org/wiki/JSON#YAMLBump
@Charles, yes, but they have some subtle difference:ajaxian.com/archives/json-yaml-its-getting-closer-to-truthNavarrette
Since YAML's (approximately) a superset of JSON, the question of performance can't be answered without assumptions of whether you will use that expressiveness. If you don't need it: how fast are YAML parsers at reading JSON? If you do need it: how much slower are JSON parsers when you allow for a possibly-longer JSON representation of the same idea?Uela
@jokoon I guess "I'd prefer a C library" (e.g libyaml)Wojcik
Not advocating it, but I had seen some Lua data formats which are very very simple.Barchan
YAML documents can be complex and hard to read. A "Billion laughs" attack is possible with YAML. On the other hand, complex objects, graphs and other structures can be serialized efficiently in YAML. For interchange formats and simple structures, JSON is preferred. For complex object serialization, or for grammar definitions, YAML may be preferred.Booth
The biggest difference is that YAML needs to be formattedd with a ruler.Scheldt
HTTP and email headers look like YAML, so YAML beauty is older and more used than JSON on web ;)Mulford
@pierrotlefou, there's something wrong with the link you've posted for subtle differences between YAML and JSON. The link seems to take to a totally irrelevant page.Acrolith
A
808

Technically YAML is a superset of JSON. This means that, in theory at least, a YAML parser can understand JSON, but not necessarily the other way around.

See the official specs, in the section entitled "YAML: Relation to JSON".

In general, there are certain things I like about YAML that are not available in JSON.

  • As @jdupont pointed out, YAML is visually easier to look at. In fact the YAML homepage is itself valid YAML, yet it is easy for a human to read.
  • YAML has the ability to reference other items within a YAML file using "anchors." Thus it can handle relational information as one might find in a MySQL database.
  • YAML is more robust about embedding other serialization formats such as JSON or XML within a YAML file.

In practice neither of these last two points will likely matter for things that you or I do, but in the long term, I think YAML will be a more robust and viable data serialization format.

Right now, AJAX and other web technologies tend to use JSON. YAML is currently being used more for offline data processes. For example, it is included by default in the C-based OpenCV computer vision package, whereas JSON is not.

You will find C libraries for both JSON and YAML. YAML's libraries tend to be newer, but I have had no trouble with them in the past. See for example Yaml-cpp.

Apparitor answered 13/11, 2009 at 14:28 Comment(12)
YAML is a superset of a particular form of JSON syntax. That is, if you use JSON in a way that's compatible with YAML, then it is a proper subset. As pierr commented above, the specs are [aiming toward compatibility](ajaxian.com/archives/json-yaml-its-getting-closer-to-truth).Odeliaodelinda
Also YAML supports comments which is handy.Premillennial
Doesn't JSON "support comments"? Isn't JSON a more particular form of object literal notation?Lack
@ErikAronesty JSON was close to a subset of YAML 1.1, but since YAML 1.2 it is now a true subset. YAML 1.2 was primarily released to iron out the last few incompatibilities between the two specifications.Tabbie
From the YAML 1.2 spec: "The primary objective of this revision is to bring YAML into compliance with JSON as an official subset."Silverpoint
Is json a subset of YAML 1.2, given that this cstring will not parse in yaml: "{\n\t\"abc\":\"xyz\"\n}"Arria
@EvanBenn 1. json is not a strict subset of yaml. json allows duplicate keys. yaml does not. if someone decided to deserialize into a multimap (seen it, been there), then yaml simply doesn't work. 2. i would hazard that bugs are more likely in the necessarily more complex yaml parsersBooth
@Lack JSON5 supports comments, but not normal JSONCarbineer
@Evan Benn -- your string will parse in the official parser at ben-kiki.org/ypaste/cgi-bin/ypaste.pl. PyYAML 5.2 gives the same error as various other online parsers, it looks like it doesn't understand the difference between indentation space ("s-space", SP only) and ignored whitespace ("s-white", SP and TAB). Also see yaml.org/spec/1.2/spec.html#id2778101 (Example 6.2) which PyYAML fails on.Gingham
If you have a compelling need to use JSON, and want comments, it might be reasonable to add text fields whose only purpose is documentation.Russell
YAML is a superset of a subset of JSON, is that right?Piscine
YAML can support multiple sections with --- which JSON can't.Dagney
B
276

Differences:

  1. YAML, depending on how you use it, can be more readable than JSON
  2. JSON is often faster and is probably still interoperable with more systems
  3. It's possible to write a "good enough" JSON parser very quickly
  4. Duplicate keys, which are potentially valid JSON, are definitely invalid YAML.
  5. YAML has a ton of features, including comments and relational anchors. YAML syntax is accordingly quite complex, and can be hard to understand.
  6. It is possible to write recursive structures in yaml: {a: &b [*b]}, which will loop infinitely in some converters. Even with circular detection, a "yaml bomb" is still possible (see xml bomb).
  7. Because there are no references, it is impossible to serialize complex structures with object references in JSON. YAML serialization can therefore be more efficient.
  8. In some coding environments, the use of YAML can allow an attacker to execute arbitrary code.

Observations:

  1. Python programmers are generally big fans of YAML, because of the use of indentation, rather than bracketed syntax, to indicate levels.
  2. Many programmers consider the attachment of "meaning" to indentation a poor choice.
  3. If the data format will be leaving an application's environment, parsed within a UI, or sent in a messaging layer, JSON might be a better choice.
  4. YAML can be used, directly, for complex tasks like grammar definitions, and is often a better choice than inventing a new language.
Booth answered 7/6, 2013 at 14:17 Comment(12)
It is. The entire purpose of Yaml 1.2 was to resolve the few compatibility differences to make JSON a strict subset. If you believe the spec didn't achieve its purpose, Erik, please point to an example somewhere of valid JSON that violates the YAML spec and/or breaks a verified 1.2-compliant YAML parser.Kruter
@Kruter The YAML Spec says there are potentially valid JSON files that would be invalid YAML. But it's not likely in real use. "JSON's RFC4627 requires that mappings keys merely “SHOULD” be unique, while YAML insists they “MUST” be. Technically, YAML therefore complies with the JSON spec, choosing to treat duplicates as an error. In practice, since JSON is silent on the semantics of such duplicates, the only portable JSON files are those with unique keys, which are therefore valid YAML files." - yaml.org/spec/1.2/spec.html#id2759572Milne
Fair point. (Albeit unlikely to come up in practice, but that wasn’t the question.) Thanks.Kruter
To comment on the use of indent; well, I believe that might require getting used to and not everyone would like it. For example, I am a .NET guy. I was looking at a travis.yml file and was wondering why there was a problem. I found out that I had a tab where it out not to be. Not everyone is used to things blowing up due to space/tab/new lines preferences.Gae
Tabs are simply not allowed at all as indentation characters. IMHO, that is good coding style in all languages - with or without syntactic indentation.Tabbie
On the matter of tabs versus spaces: medium.com/@hoffa/…Grazynagreabe
Observation point 2 "Many programmers" - this is unsourced "weasel" language. Find a source and make a quantifiable statement or remove it.Phenothiazine
@Phenothiazine I personally like python and YAML and literally use them every day. I tend to use YAML for stuff people have to edit often and JSON for stuff that people "might" need to look at. I have been subjected to valid criticism by C++ devs who find indentation to be confusing.... especially if there are multiple levels or longer function blocks. Of course... good testable code doesn't have those things, so it's usually not an issue. This is my personal observation, but any casual google search will yield many results... .so it's trivial to verify.Booth
@Kruter this valid json will not parse as yaml: "{\n\t\"abc\":\"xyz\"\n}"Arria
@EvanBenn that works in PyYAML==5.3.1, also jsonformatter.org/yaml-validator/55a9b2 says validBooth
@EvanBenn, or later readers I guess: That snippet on jsonformatter.org is actually a string, equally valid in JSON and YAML. But if you expand it to a three line JSON object (which can't be displayed in a comment here, but I saved it as jsonformatter.org/yaml-validator/73c67a), it still passes validation as YAML. Older versions of PyYAML didn't like the TAB, leading people to think the problem was with the YAML 1.2 spec itself.Gingham
The only remaining incompatibility between the two specifications to my knowledge is the fact that duplicate keys are always invalid in YAML and are not invalid in JsonBooth
K
122

Bypassing esoteric theory

This answers the title, not the details as most just read the title from a search result on google like me so I felt it was necessary to explain from a web developer perspective.

  1. YAML uses space indentation, which is familiar territory for Python developers.
  2. JavaScript developers love JSON because it is a subset of JavaScript and can be directly interpreted and written inside JavaScript, along with using a shorthand way to declare JSON, requiring no double quotes in keys when using typical variable names without spaces.
  3. There are a plethora of parsers that work very well in all languages for both YAML and JSON.
  4. YAML's space format can be much easier to look at in many cases because the formatting requires a more human-readable approach.
  5. YAML's form while being more compact and easier to look at can be deceptively difficult to hand edit if you don't have space formatting visible in your editor. Tabs are not spaces so that further confuses if you don't have an editor to interpret your keystrokes into spaces.
  6. JSON is much faster to serialize and deserialize because of significantly less features than YAML to check for, which enables smaller and lighter code to process JSON.
  7. A common misconception is that YAML needs less punctuation and is more compact than JSON but this is completely false. Whitespace is invisible so it seems like there are less characters, but if you count the actual whitespace which is necessary to be there for YAML to be interpreted properly along with proper indentation, you will find YAML actually requires more characters than JSON. JSON doesn't use whitespace to represent hierarchy or grouping and can be easily flattened with unnecessary whitespace removed for more compact transport.

The Elephant in the room: The Internet itself

JavaScript so clearly dominates the web by a huge margin and JavaScript developers prefer using JSON as the data format overwhelmingly along with popular web APIs so it becomes difficult to argue using YAML over JSON when doing web programming in the general sense as you will likely be outvoted in a team environment. In fact, the majority of web programmers aren't even aware YAML exists, let alone consider using it.

If you are doing any web programming, JSON is the default way to go because no translation step is needed when working with JavaScript so then you must come up with a better argument to use YAML over JSON in that case.

Korten answered 23/1, 2016 at 2:40 Comment(15)
I disagree that python developers prefer YAML. Pythons dict is basicaly JSON, list of dicts is also basically JSON. Python has build in json lib. On a side note I'm a python developer and I prefer JSON (most of python developers I know prefer JSON).Haematogenous
When you talk about a "translation step", it's worth distinguishing between mental and mechanical translation. The human reader familiar with JavaScript will be equally familiar with JSON. To the machine the difference between JavaScript and YAML is marginal; in practice one rarely does eval(jsonString) because of injection attacks. JSON's advantage of being native JavaScript is lost when both JSON and YAML must be parsed.Kordofan
@toolbear its safe to use JSON.parse(jsonString) as opposed to eval(jsonString)Korten
In my experience one reason JSON's simplicity/minimalism is advantageous over YAML has to do with interop between parsing libraries, particularly across platforms. Just compare the JSON and YAML specs and it will be obvious that YAML implementations will necessarily have more bugs, missing features, or different interpretations of the spec; each one of these is a potential interop error when producing and consuming YAML from different stacks. JSON's simplicity combined with it having more eyes on it leads to better interoperability. This matches my experience.Kordofan
@JasonSebring that's my point. JSON.parse is a parser. Which puts JSON on the same footing as YAML in terms of the presence of a "translation step" in the context of machines.Kordofan
@toolbear agreed but what I was meaning was you can actually declare JSON within JavaScript and its actually JavaScript so there is no translation step in that regard. If you are parsing incoming data then that is a different story but always the case you must parse anything that is a data structure in that case so it is a wash on that point.Korten
The one thing that really bothers me about white-space is how easy it is to be confused and get it wrong as indenting over or not could mean its nested or on the same level and also very easy to error that out if you don't have a guide rule. its like the hidden oops this is really not that easy type scenario no one says when editing yaml. never had that issue with json.Korten
@JasonSebring. You'd almost wonder why YAML went with spaces. My first 'dip in the ocean' of YAML led to a broken app... all because of spaces. You'd have thought that maybe using an indentation not using non-printing chars would have made a lot more sense! (That is, why on earth didn't they choose '.' rather than ' '?) To understand YAML you have to go to the specs. To understand JSON doesn't need that. (I've been to the former, and never the latter). This to me indicates a format that's not really 'human readable'Mysia
@Mysia yah this was my experience. My boss forced us to use YAML over JSON and it made things unnecessarily crappy to edit and ingest. I wrote this because of that hoping up votes would vindicate me.Korten
A very well thought out write up. An yes that elephant in the room is sitting on the couch eating a bag of { json: peanuts}Leeuwenhoek
@HonoredMule As a random IT guy who more often hacks things than creates them from scratch... being human writable IS being human readable, and being human readable across multiple IDEs and platforms without wondering how whitespace is being rendered is golden. To me this makes the supposed innate human readability of whitespace a wash. I've gone crosseyed again, crap.Coerce
Trouble with tabs in YAML means (a) not reading the error message, and (b) having an editor that does not highlight tabs. Both problems are easily rectifiable, so I do not understand the complaints.Democrat
You mix up web development with frontend development. If you are working as well on the backend you will have to deal a lot with yaml files. Being able to comment is a huge advantage over JSON. Also JSON sucks a lot with some strict rules like the use of only double quotes or not allowing commas after last property in the object. Readability is also a huge advantage.Nutriment
@TechNomad i do more backend work actually but ok. I agree about comments for sure. I did that post because at the time, the boss demanded yaml through the entire stack...sucked more than helped.Korten
@JasonSebring ah, ok, sorry. I just have to edit a lot of yaml when dealing with Symfony and Docker. While on the frontend I in fact never come across any yaml configs. When starting working with Symfony I was also uncomfortable with Yaml. But when it comes to a comparison with JSON in a non frontend environment - I personally think yaml is more enjoyable.Nutriment
E
73

This question is 6 years old, but strangely, none of the answers really addresses all four points (speed, memory, expressiveness, portability).

Speed

Obviously this is implementation-dependent, but because JSON is so widely used, and so easy to implement, it has tended to receive greater native support, and hence speed. Considering that YAML does everything that JSON does, plus a truckload more, it's likely that of any comparable implementations of both, the JSON one will be quicker.

However, given that a YAML file can be slightly smaller than its JSON counterpart (due to fewer " and , characters), it's possible that a highly optimised YAML parser might be quicker in exceptional circumstances.

Memory

Basically the same argument applies. It's hard to see why a YAML parser would ever be more memory efficient than a JSON parser, if they're representing the same data structure.

Expressiveness

As noted by others, Python programmers tend towards preferring YAML, JavaScript programmers towards JSON. I'll make these observations:

  • It's easy to memorise the entire syntax of JSON, and hence be very confident about understanding the meaning of any JSON file. YAML is not truly understandable by any human. The number of subtleties and edge cases is extreme.
  • Because few parsers implement the entire spec, it's even harder to be certain about the meaning of a given expression in a given context.
  • The lack of comments in JSON is, in practice, a real pain.

Portability

It's hard to imagine a modern language without a JSON library. It's also hard to imagine a JSON parser implementing anything less than the full spec. YAML has widespread support, but is less ubiquitous than JSON, and each parser implements a different subset. Hence YAML files are less interoperable than you might think.

Summary

JSON is the winner for performance (if relevant) and interoperability. YAML is better for human-maintained files. HJSON is a decent compromise although with much reduced portability. JSON5 is a more reasonable compromise, with well-defined syntax.

Erotomania answered 12/4, 2016 at 13:36 Comment(5)
I actually thought YAML was smaller because of invisible characters that fooled me as such. Invisible => Not there, well actually not. If you count the invisible characters having to be there, especially as YAML gets larger nesting, it quickly surpasses JSON. I just thought that was very interesting as the human-readable portion fools most of us into that notion until I really thought about it as you can flatten JSON and YAML, not so much. I also found YAML to be very difficult to hand edit, not read, just edit as you need editor guides turned on, easily mistaking nested items at times.Korten
I feel that none of the answers here state this explicitly: "For settings/config files, YAML is better (for the reasons mentioned above by everybody). For machine/machine interop use JSON". In other words: if your target audience is a human, YAML is better. If the target is another program (but you still want the data to be human readable), use JSON.Pish
That's true, but the question laid out some pretty specific parameters about how they wanted the two compared. Personally, I would never use YAML for anything. I'd either use JSON for interoperability, or JSON6 if human maintenance is important.Erotomania
Actually to refine that: I generally prefer to use straight JavaScript, rather than JSON, for configuration files, for NodeJS projects. It looks like JSON, but with tons of advantages, like commenting, less verbose quotes, and the ability to write expressions etc.Erotomania
HJSON.org is now a gambling site. The project can be found at https://hjson.github.io/Gogetter
A
56

GIT and YAML

The other answers are good. Read those first. But I'll add one other reason to use YAML sometimes: git.

Increasingly, many programming projects use git repositories for distribution and archival. And, while a git repo's history can equally store JSON and YAML files, the "diff" method used for tracking and displaying changes to a file is line-oriented. Since YAML is forced to be line-oriented, any small changes in a YAML file are easier to see by a human.

It is true, of course, that JSON files can be "made pretty" by sorting the strings/keys and adding indentation. But this is not the default and I'm lazy.

Personally, I generally use JSON for system-to-system interaction. I often use YAML for config files, static files, and tracked files. (I also generally avoid adding YAML relational anchors. Life is too short to hunt down loops.)

Also, if speed and space are really a concern, I don't use either. You might want to look at BSON.

Astern answered 23/3, 2018 at 19:35 Comment(1)
YAML if often compiled to JSON, especially when using git. In GitHub Actions, for example, a .workflow.yml file is required to define a workflow, but it just gets compiled to JSON when it's runChemise
D
28

I find YAML to be easier on the eyes: less parenthesis, "" etc. Although there is the annoyance of tabs in YAML... but one gets the hang of it.

In terms of performance/resources, I wouldn't expect big differences between the two.

Futhermore, we are talking about configuration files and so I wouldn't expect a high frequency of encode/decode activity, no?

Dowlen answered 13/11, 2009 at 2:46 Comment(3)
I wondered what you meant by the annoyance of tabs. I believe the thing is tab characters are not allowed in yaml, which personally I think is a good idea in any source file.Uela
@poolie: jldupont is likely referring to the syntactically significant leading whitespace in YAML.Odeliaodelinda
OK but they're not tabs.Uela
P
27

Technically YAML offers a lot more than JSON (YAML v1.2 is a superset of JSON):

  • comments
  • anchors and inheritance - example of 3 identical items:

    item1: &anchor_name
      name: Test
      title: Test title
    item2: *anchor_name
    item3:
      <<: *anchor_name
      # You may add extra stuff.
    
  • ...

Most of the time people will not use those extra features and the main difference is that YAML uses indentation whilst JSON uses brackets. This makes YAML more concise and readable (for the trained eye).

Which one to choose?

  • YAML extra features and concise notation makes it a good choice for configuration files (non-user provided files).
  • JSON limited features, wide support, and faster parsing makes it a great choice for interoperability and user provided data.
Pyrazole answered 3/3, 2016 at 13:31 Comment(0)
P
23

If you don't need any features which YAML has and JSON doesn't, I would prefer JSON because it is very simple and is widely supported (has a lot of libraries in many languages). YAML is more complex and has less support. I don't think the parsing speed or memory use will be very much different, and maybe not a big part of your program's performance.

Pyrophotometer answered 13/11, 2009 at 2:50 Comment(5)
in which way is YAML more complex?Henden
For example, YAML supports anchors, as has been noted in another answer. There are other features, such as extensible data types. This makes it more complex to parse, and explains why YAML has larger spec. It may hurt performance depending on parser implementation (take a look at this question: #2452232).Fascicule
Complexity is better than simplicity if that complexity buys you power to achieve overall greater simplicity. That is certainly true depending on the complexity of your data model.Crossstaff
I may be a little late here but YAML can add in comments whereas JSON can't. To me it is a big help when comes to documentation of specificationsSpurling
@Accatyyc. I think the fact that people are here asking questions about the difference is a sure sign YAML ain't all that easy. I've never asked a question about JSON (except "why can't I have comments in it?")Mysia
L
19

Benchmark results

Below are the results of a benchmark to compare YAML vs JSON loading times, on Python and Perl

JSON is much faster, at the expense of some readability, and features such as comments

Test method

Results

Python 3.8.3 timeit
    JSON:            0.108
    YAML CLoader:    3.684
    YAML:           29.763

Perl 5.26.2 Benchmark::cmpthese
    JSON XS:         0.107
    YAML XS:         0.574
    YAML Syck:       1.050

Perl 5.26.2 Dumbbench (Brian D Foy, excludes outliers)
    JSON XS:         0.102
    YAML XS:         0.514
    YAML Syck:       1.027
Longlimbed answered 10/7, 2020 at 22:13 Comment(1)
JSON faster because it doesn't have to deal with references, many types of containers, tags etc.Illusion
H
16

From: Arnaud Lauret Book “The Design of Web APIs.” :

The JSON data format

JSON is a text data format based on how the JavaScript programming language describes data but is, despite its name, completely language-independent (see https://www.json.org/). Using JSON, you can describe objects containing unordered name/value pairs and also arrays or lists containing ordered values, as shown in this figure.

enter image description here

An object is delimited by curly braces ({}). A name is a quoted string ("name") and is sep- arated from its value by a colon (:). A value can be a string like "value", a number like 1.23, a Boolean (true or false), the null value null, an object, or an array. An array is delimited by brackets ([]), and its values are separated by commas (,). The JSON format is easily parsed using any programming language. It is also relatively easy to read and write. It is widely adopted for many uses such as databases, configura- tion files, and, of course, APIs.

YAML

YAML (YAML Ain’t Markup Language) is a human-friendly, data serialization format. Like JSON, YAML (http://yaml.org) is a key/value data format. The figure shows a comparison of the two.

enter image description here

Note the following points:

  • There are no double quotes (" ") around property names and values in YAML.

  • JSON’s structural curly braces ({}) and commas (,) are replaced by newlines and indentation in YAML.

  • Array brackets ([]) and commas (,) are replaced by dashes (-) and newlines in YAML.

  • Unlike JSON, YAML allows comments beginning with a hash mark (#). It is relatively easy to convert one of those formats into the other. Be forewarned though, you will lose comments when converting a YAML document to JSON.

Holocaine answered 21/10, 2019 at 15:8 Comment(0)
D
7

Since this question now features prominently when searching for YAML and JSON, it's worth noting one rarely-cited difference between the two: license. JSON purports to have a license which JSON users must adhere to (including the legally-ambiguous "shall be used for Good, not Evil"). YAML carries no such license claim, and that might be an important difference (to your lawyer, if not to you).

Defrayal answered 20/9, 2016 at 7:18 Comment(1)
I don't use JSON, I use the exact equivalent of JSON without calling it JSON. I call it PS-OFF. You going to sue me for using { "": #, [] }???Reforest
D
6

Sometimes you don't have to decide for one over the other.

In Go, for example, you can have both at the same time:

type Person struct {
    Name string `json:"name" yaml:"name"`
    Age int `json:"age" yaml:"age"`
}
Demonize answered 29/3, 2017 at 14:0 Comment(0)
T
4

I find both YAML and JSON to be very effective. The only two things that really dictate when one is used over the other for me is one, what the language is used most popularly with. For example, if I'm using Java, Javascript, I'll use JSON. For Java, I'll use their own objects, which are pretty much JSON but lacking in some features, and convert it to JSON if I need to or make it in JSON in the first place. I do that because that's a common thing in Java and makes it easier for other Java developers to modify my code. The second thing is whether I'm using it for the program to remember attributes, or if the program is receiving instructions in the form of a config file, in this case I'll use YAML, because it's very easily human read, has nice looking syntax, and is very easy to modify, even if you have no idea how YAML works. Then, the program will read it and convert it to JSON, or whatever is preferred for that language.

In the end, it honestly doesn't matter. Both JSON and YAML are easily read by any experienced programmer.

Toiletry answered 6/12, 2016 at 0:52 Comment(0)
U
1

If you are concerned about better parsing speed then storing the data in JSON is the option. I had to parse the data from a location where the file was subject to modification from other users and hence I used YAML as it provides better readability compared to JSON. And you can also add comments in the YAML file which can't be done in a JSON file.

Unborn answered 21/4, 2021 at 7:3 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.