TOML vs YAML vs StrictYAML
Asked Answered
P

3

80

TOML said "TOML and YAML both emphasize human readability features, like comments that make it easier to understand the purpose of a given line. TOML differs in combining these, allowing comments (unlike JSON) but preserving simplicity (unlike YAML)."

I can see TOML doesn’t rely on significant whitespace, but other than that I am not sure about the simplicity it claims. What is that exactly ?

Then I see StrictYAML, "StrictYAML is a type-safe YAML parser that parses and validates a restricted subset of the YAML specification." Type-safe, what is that exactly (again)? What is the problem TOML didn't fix for YAML while StrictYAML thinks he does ? I did read through articles on StrictYAML site but I am still not clear.

So both TOML and StrictYAML want to fix the "problem" YAML has. But except for the indentation, what is the problem ?

---- update ----

I found here in reddit the author of StrictYaml talked about YAML vs TOML. But the answer I got so far said "strictyaml displays a rather poor understanding of YAML", while https://github.com/crdoconnor/strictyaml has got 957 stars as in 2021/12/28. So I am bit lost at which one I should use and I stick with YAML because most of my yaml is simple.

YAML downsides:

Implicit typing causes surprise type changes. (e.g. put 3 where you previously had a string and it will magically turn into an int).

A bunch of nasty "hidden features" like node anchors and references that make it look unclear (although to be fair a lot of people don't use this).

TOML downsides:

Noisier syntax (especially with multiline strings).

The way arrays/tables are done is confusing, especially arrays of tables.

I wrote a library that removed most of the nasty stuff I didn't like about YAML leaving the core which I liked. It's got a pretty detailed comparison between it and a bunch of other config formats, e.g.: https://hitchdev.com/strictyaml/why-not/toml/

Paulie answered 14/12, 2020 at 3:28 Comment(1)
I gain more insight about TOML since I first asked this question a year ago, to me, it is a configuration file for key-value pairs, grouped by section (table as they call it). For small, simple project, it's fine. Otherwise I see criticism from time to time.Paulie
I
119

This may be an opinionated answer as I have written multiple YAML implementations.


Common Criticism of YAML addressed by the alternatives

YAML's outstanding semantic feature is that it can represent a possibly cyclic graph. Moreover, YAML mappings can use complex nodes (sequences or mappings) as keys. These features are what you potentially need when you want to represent an arbitrary data structure.

Another exotic YAML feature is tags. Their goal is to abstract over different types in different programming languages, e.g., a !!map would be a dict in Python but an object in JavaScript. While seldom used explicitly, implicit tag resolution is why false is usually loaded as a boolean value while droggeljug is loaded as a string. The apparent goal here was to reduce noise by not requiring to write boolean values like !!bool false or forcing quotes on every string value.

However, the reality has shown that many people are confused by this, and YAML defines that yes may be parsed as boolean has not helped either. YAML 1.2 tried to remedy this a bit by describing different schemas you can use, where the basic „failsafe“ schema exclusively loads to mappings, sequences, and strings, and the more complex „JSON“ and „core“ schemas do additional type guessing. However, most YAML implementations, prominently PyYAML, remained on YAML 1.1 for a long time (many implementations were originally rewritten PyYAML code, e.g., libyaml, SnakeYAML). This cemented the view that YAML makes questionable typing decisions that need fixing.

Nowadays, some implementations improved, and you can use the failsafe schema to avoid unwanted boolean values. In this regard, StrictYAML restricts itself to the failsafe schema; don't believe its argument that this is some novelty PyYAML can't do.

A common security issue with YAML implementations is that they mapped tags to arbitrary constructor calls (you can read up about an exploit in Ruby on Rails based on this here). Mind that this is not a YAML shortcoming; YAML doesn't suggest to call unknown functions during object construction anywhere. The base issue here is that data serialization is the enemy of data encapsulation; if your programming language offers constructors as the sole method for constructing an object, that's what you need to do when deserializing data. The remedy here is only to call known constructors, which was implemented broadly after a series of such exploits (another one with SnakeYAML iirc) surfaced. Nowadays, to call unknown constructors, you need to use a class aptly named DangerLoader in PyYAML.

TOML

TOML's main semantic difference is that it doesn't support cycles, complex keys, or tags. This means that while you can load YAML in an arbitrary user-defined class, you always load TOML into tables or arrays containing your data.

For example, while YAML allows you to load {foo: 1, bar: 2} into an object of a class with foo and bar integer fields, TOML will always load this into a table. A prominent example of YAML's capabilities you usually find in documentation is that it can load the scalar 1d6 into an object {number: 1, sides: 6}; TOML will always load it as string "1d6".

TOML's perceived simplicity here is that it doesn't do some stuff that YAML does. For example, if you're using a statically typed language like Java, after loading {foo: 1, bar: 2} into an object myObject, you can access myObject.foo safely (getting the integer 1). If you used TOML, you would need to do myObject["foo"], which could raise an exception if the key doesn't exist. This is less true in scripting languages like Python: Here, myObject.foo compiles and fails with a runtime error if foo does not happen to be a property of myObject.

My perspective from answering a lot of YAML questions here is that people don't use YAML's features and often load everything into a structure like Map<String, Object> (using Java as an example) and take it from there. If you do this, you could as well use TOML.

A different kind of simplicity TOML offers its syntax: Since it is vastly simpler than YAML, it is easier to emit errors users can understand. For example, a common error text in YAML syntax errors is „mapping values are not allowed in this context“ (try searching this on SO to find tons of questions). You get this for example here:

foo: 1
  bar: 2

The error message does not help the user in fixing the error. This is because of YAML's complex syntax: YAML thinks 1 and bar are part of a multi-line scalar (because bar: is indented more than foo:), puts them together, then sees a second : and fails because multi-line scalars may not be used as implicit keys. However, most likely, the user simply either is-indented bar: or was under the impression that they can give both a scalar value to foo (1) and some children. It would be tough to write error messages that can help the user because of the possibilities in YAML syntax.

Since TOML's syntax is much simpler, the error messages are easier to understand. This is a big plus if the user writing TOML is not expected to be someone with a background in parsing grammars.

TOML has a conceptual advantage over YAML: Since its structure allows less freedom, it tends to be easier to read. When reading TOML, you always know, „okay, I'm gonna have nested tables with values in them“ while with YAML, you have some arbitrary structure. I believe this requires more cognitive load when reading a YAML file.

StrictYAML

StrictYAML argues that it provides type-safety, but since YAML isn't a programming language and specifically doesn't support assignments, this claim doesn't make any sense based on the Wikipedia definition which is linked by StrictYAML (type safety comes and goes with the programming language you use; e.g., any YAML is typesafe after loading it into a proper Java class instance, but you'll never be type-safe in a language like Python). Going over its list of removed features, it displays a rather poor understanding of YAML:

  • Implicit Typing: Can be deactivated in YAML implementations using the failsafe schema, as discussed above.
  • Direct representations of objects: It simply links to the Ruby on Rails incident, implying that this cannot be avoided, even though most implementations are safe today without removing the feature.
  • Duplicate Keys Disallowed: The YAML specification already requires this.
  • Node anchors and refs: StrictYAML argues that using this for deduplication is unreadable to non-programmers, ignoring that the intention was to be able to serialize cyclic structures, which is not possible without anchors and aliases.

On the deserialization side,

All data is a string, list or OrderedDict

It is basically the same structure TOML supports (I believe StrictYAML supports complex keys in mappings as neither list nor OrderedDict are hashable in Python).

You are also losing the ability to deserialize to predefined class structures. One could argue that the inability to construct a class object with well-defined fields makes StrictYAML less type-safe than standard YAML: A standard YAML implementation can guarantee that the returned object has a certain structure described by types, while StrictYAML gives you on every level either a string, a list or an OrderedDict and you can't do anything to restrict it.

While quite some of its arguments are flawed, the resulting language is still usable. For example, with StrictYAML, you do not need to care about the billion laughs attack haunts some YAML implementations. Again, this is not a YAML problem but an implementations problem, as YAML does not require an implementation to duplicate a node that is anchored and referred to from multiple places.

Bottom Line

Quite some YAML issues stem from poor implementations, not from issues in the language itself. However, YAML as a language certainly is complex and syntactic errors can be hard to understand, which could be a valid reason to use a language like TOML. As for StrictYAML, it does offer some benefit, but I suggest against using it because it does not have a proper specification and only a single implementation, which is a setup that is very prone to becoming a maintenance nightmare (project could be discontinued, breaking changes easily possible).

Invariant answered 14/12, 2020 at 13:2 Comment(7)
Hi, as I read your answers to other YAML questions, e.g. #42248035. Do you see indent is a weakness YAML has compared to TOML ?Paulie
@Paulie I see indentation for structuring as interesting but ultimately flawed concept (not just for YAML, also for Python or Nim). Without a visible token that ends a structure, code gets more difficult to read the more nesting it has and the longer it is. While in Python you can restructure your code to battle excessive nesting, this is not really possible in YAML. However a similar argument has also been made for C's } (versus Pascal's end <name>;) and } is still used by most modern languages so it seems to be personal preference.Invariant
But at least TOML doesn't have this problem. Maybe that is one of reasons for its "simplicity"?Paulie
like blog.heroku.com/why-buildpacks-use-toml said they chose toml over yaml because of thatPaulie
@Paulie I'd say TOML is bad for deeply nested structures since the necessity for giving the complete table's path each time makes it hard to understand the structure of the data. You can indent your code to remedy this a bit, but then the fully blown paths seriously violate DRY. If you dislike significant indentation, you can still use YAML with flow syntax; had YAML an option to disallow block (indentation-based) syntax, that would more often than not be the better choice imho.Invariant
Nesting in TOML is indeed a problem that has yet to be addressed eg. see the issue about it and my example in TOML and YAML here: github.com/toml-lang/toml/issues/781#issuecomment-1004397808Hot
My first exposure to TOML made me perceive that YAML is more readable: github.com/42wim/matterbridge/wiki/Gateway-config-%28basic%29. After reading more about TOML after some years I finally realize it is an array, and because it does not require indentation for the member [[gateway.inout]] it makes it harder to follow as well.Bodhisattva
C
27

StrictYAML Type Safety -- The "Norway Problem"

Here's an example given by the StrictYAML people:

countries:
- FR
- DE
- NO

Implicitly translates "NO" to a False value.

>>> from pyyaml import load
>>> load(the_configuration)
{'countries': ['FR', 'DE', False]}

https://hitchdev.com/strictyaml/why/implicit-typing-removed

Concordant answered 2/4, 2021 at 19:4 Comment(3)
Thanks for clarifying that! Maybe you can also give some comment about TOML.:) The other answer I got said it was bad.Paulie
The actual output of that load call also includes <stdin>:1: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.. You can easily do load(the_configuration, Loader=yaml.BaseLoader) to avoid this, as described in the page linked by the warning. This fixes all problems described on that StrictYAML page.Invariant
Yet safe_load(the_configuration) still suffers from the issue described in this answer and produce no warning.Callus
J
1

Yaml is extremely complex, way more than it should be for what most use as a config/data interchange language. Not only does the complexity lead to typing bugs, as mentioned already, but even security bugs. Yes, Yaml can even instantiate objects in the programming language loading it!

Hence, many libraries implement a "safe loader", but it is not default in some of them. You have to opt in. :-/

Both StrictYaml and TOML try to tame that complexity. StrictYaml does that by removing 90% of the features, all the problematic ones. It returns only strings, and you must define a schema for proper output typing. TOML does with a ini-like syntax and not supporting them in the first place.

All in all StrictYaml is more powerful and still safe. One probably needs a competent editor with collapsible indentation guides for larger documents however.

TOML is fine for small to mid-size files, but is in an awkward place between simple .ini and the powerful StrictYaml.

Jaipur answered 23/8, 2023 at 19:10 Comment(2)
Thanks for the answer but I don't see how we can draw the conclusion "StrictYaml is more powerful" from it.Paulie
It supports a validating schema. Also hierarchical data syntax is more obvious, therefore less discouraged. A bit more powerful and better for larger files.Jaipur

© 2022 - 2024 — McMap. All rights reserved.