As you noticed, one thing is what the specifications say the other what commonly available parsers (both YAML and JSON) process. You should therefore take several aspects into account and use the least common denominator to not be able to load your JSON with a YAML parser.
On the JSON side there are multiple standards and best practises. Originally a JSON text would have to have an object or array at the topmost level. This is still so according to the fail1.json
files available on the json.org site:
"A JSON payload should be an object or array, not a string."
According to RFC7159 any value can be at the top level (apart from using a string, this leads to rather boring JSON files):
A JSON text is a serialized value. Note that certain previous
specifications of JSON constrained a JSON text to be an object or an
array. Implementations that generate only objects or arrays where a
JSON text is called for will be interoperable in the sense that all
implementations will accept these as conforming JSON texts.
Because of the problems with JSON hijacking *by redefining the array handing in older browsers) there have been implementations that only accept an object at the top level (i.e. the first character of the file has to be {
.
On the YAML side there are fewer competing standards than with JSON, but things get muddled by the persistent usage of YAML 1.1, and is not helped by the fact that if you google for "yaml current spec" the first hit is yaml.org/spec/current.html and that is actually an old working-draft for YAML 1.1
Apart from the UTF-32 support the other answer mentioned, which is largely a non-issue in a world using UTF-8 almost exclusively, there are a few things to take into account, especially if you want PyYAML to to be able to parse your JSON (PyYAML still implements most of YAML 1.1 only, close to eight years after the YAML 1.2 spec release):
numbers in JSON don't need a dot in the mantissa, even if such a number has an exponent:
but the Floating-Point Language-Independent Type for YAML™ Version 1.1 does require that dot:
|[-]?0\.([0-9]*[1-9])?e[-+](0|[1-9][0-9]+) (scientific)
^--- no ? or * associated with this dot
(in the YAML 1.2 spec this regex has changed to:
-? [1-9] ( \. [0-9]* [1-9] )? ( e [-+] [1-9] [0-9]* )?.
allowing the dot to disappear even if there is an e
(and no E
) and exponent.
This is the cause for your 12345e999
being handled differently by JSON (overflow) and PyYAML (string). In YAML 1.1 this can only be interpreted as a string and hence doesn't need quotes and can be plain scalar.
In YAML 1.1 there are escape sequences, but this is not a superset from what JSON supports. The forward slash (/
) can be escaped in JSON, but not in YAML 1.1 (it can in YAML 1.2, rule 53)
In JSON as well as in YAML 1.1 you can use \uNNNN
to indicate a 16 bit unicode code point. Although the YAML 1.1 spec (and YAML 1.2) mentions surrogate pairs in conjunction with using UTF-16, nothing is mentioned about such pairs as escaped sequences ("\uD834\uDD1E"
). This string sequence is explicitly mentioned in RFC 7159 as representing the G clef character (U+1D11E). I don't know of any YAML parser that support this, PyYAML throws a:
yaml.reader.ReaderError: unacceptable character #xd834: special characters are not allowed
So as long as you write your JSON
- as UTF-8
- with the top-level being an object
- scientific numbers always with a dot
- no
\/
escape sequence
- no
\uNNNN
characters between \uD7FF
and \uE000
(exclusive), nor \uFFFE
, nor \uFFFF
you should be fine for both JSON and YAML (1.1) parsers.
¹ In ruamel.yaml a YAML 1.2 parser of which I am the author, the \/
and scientific numbers without dot are handled correctly: your 12345e999
loads as type float
and prints as inf
.
12345e999
example shows that the file wasn't valid JSON or YAML. 1) It was after all interpreted without error by both implementations (which, of course, might be buggy); and 2) AFAIK neither YAML nor JSON spec strictly define the range of floating point values that have to be supported by an implementation, so implementation-specific behaviour is fair game. – Ravishing