Why should json.loads be preferred to ast.literal_eval for parsing JSON?
Asked Answered
U

7

19

I have a dictionary that is stored in a db field as a string. I am trying to parse it into a dict, but json.loads gives me an error.

Why does json.loads fail on this and ast.literal_eval works? Is one preferable over the other?

>>> c.iframe_data
u"{u'person': u'Annabelle!', u'csrfmiddlewaretoken': u'wTE9RZGvjCh9RCL00pLloxOYZItQ98JN'}"

# json fails
>>> json.loads(c.iframe_data)
Traceback (most recent call last):
ValueError: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)

# ast.literal_eval works
>>> ast.literal_eval(c.iframe_data)
{u'person': u'Annabelle!', u'csrfmiddlewaretoken': u'wTE9RZGvjCh9RCL00pLloxOYZItQ98JN'}
Ulcerous answered 9/2, 2015 at 6:7 Comment(3)
It would be great to have this question the canonical for this topic (because it keeps getting asked, implicitly or explicitly). However since the JSON the OP supplied is actually illegal/malformed, should answers in part also consider "How forgiving is <each approach> to illegal/malformed JSON?" (similar to BeautifulSoup vs lxml on scraping XML/HTML/CSS?). For example, should the OP here use a regex to fixup/preprocess the unwanted and illegal u' prefixes? (which is only an issue that would occur in Python 2.x, which is near-EOL). And then simply use json.loads, already?Chartism
If there are DBs out there still containing illegal/malformed JSON exported the wrong way from Python 2.x, then this question becomes less canonical.Chartism
we could use ast.litteral_eval to make "py(s)on"Puppet
B
15

json.loads failed because your c.iframe_data value is not a valid JSON document. In valid json document string are quoted in double quote and there isn't anything like u for converting strings to unicode.

Using json.loads(c.iframe_data) means deserialize the JSON document in c.iframe_data

ast.literal_eval is used whenever you need eval to evaluate input expression. If you have Python expressions as an input that you want to evaluate.

Is one preferable over the other?

It depends on the data. See this answer for more context.

Bhang answered 9/2, 2015 at 6:16 Comment(2)
Also in json data there isn't any thing like u for converting strings to unicode.Communalism
In Python 3, "strings" are all Unicode, and binary data is held in bytes. The separation is much less clear in Python 2, for which there will be no further support after 31st December 2019.Officialdom
O
12

I have a dictionary that is stored in a db field as a string.

This is a design fault. While it's perfectly possible, as someone appears to have done, to extract the repr of a dictionary, there's no guarantee that the repr of an object can be evaluated at all.

In the presence of only string keys and string and numeric values, most times the Python eval function will reproduce the value from its repr, but I am unsure why you think that this would make it valid JSON, for example.

I am trying to parse it into a dict, but json.loads gives me an error.

Naturally. You aren't storing JSON in the database, so it hardly seems reasonable to expect it to parse as JSON. While it's interesting that ast.literal_eval can be used to parse the value, again there are no guarantees beyond relatively simple Python types.

Since it appears your data is indeed limited to such types, the real solution to your problem is to correct the way the data is stored, by converting the dictionary to a string with json.dumps before storage in the database. Some database systems (e.g., PostgreSQL) have JSON types to make querying such data simpler, and I'd recommend you use such types if they are available to you.

As to which is "better," that will always depend on the specific application, but JSON was explicitly designed as a compact human-readable machine-parseable format for simple structured data, whereas your current representation is based on formats specific to Python, which (for example) would be tediously difficult to evaluate in other languages. JSON is the applicable standard here, and you will benefit from using it.

Officialdom answered 10/3, 2019 at 10:39 Comment(0)
B
5

json.loads is used specifically to parse JSON which is quite a restrictive format. There is no u'...' syntax and all strings are delimited by double quotes, not single quotes. Use json.dumps to serialise something that can be read by json.loads.

So json.loads(string) is the inverse of json.dumps(object) whereas ast.literal_eval(string) is (vaguely) the inverse of repr(object).

JSON is nice because it's portable -- there are parsers for it trivially available in pretty much every language. So if you want to send JSON to a Javascript frontend you'll have no issues.

ast.literal_eval isn't easily portable but it's slightly richer: you can use tuples, sets, and dicts whose keys aren't restricted to strings, for example.

Also json.loads is significantly faster than ast.literal_eval.

Buckingham answered 9/2, 2015 at 6:38 Comment(0)
S
3

Because that u"{u'person': u'Annabelle!', u'csrfmiddlewaretoken': u'wTE9RZGvjCh9RCL00pLloxOYZItQ98JN'}" is a Python unicode string, not a Javascript Object Notation , in chrome console:

bad = {u'person': u'Annabelle!', u'csrfmiddlewaretoken': u'wTE9RZGvjCh9RCL00pLloxOYZItQ98JN'}
SyntaxError: Unexpected string
good = {'person': 'Annabelle!', 'csrfmiddlewaretoken': 'wTE9RZGvjCh9RCL00pLloxOYZItQ98JN'}
Object {person: "Annabelle!", csrfmiddlewaretoken: "wTE9RZGvjCh9RCL00pLloxOYZItQ98JN"}

Or you can use yaml to deal with it:

>>> a = '{"person": "Annabelle!", "csrfmiddlewaretoken": "wTE9RZGvjCh9RCL00pLloxOYZItQ98JN"}'
>>> json.loads(a)
{u'person': u'Annabelle!', u'csrfmiddlewaretoken': u'wTE9RZGvjCh9RCL00pLloxOYZItQ98JN'}
>>> import ast
>>> ast.literal_eval(a)
{'person': 'Annabelle!', 'csrfmiddlewaretoken': 'wTE9RZGvjCh9RCL00pLloxOYZItQ98JN'}
>>> import yaml
>>> a = '{u"person": u"Annabelle!", u"csrfmiddlewaretoken": u"wTE9RZGvjCh9RCL00pLloxOYZItQ98JN"}'
>>> yaml.load(a)
{'u"person"': 'u"Annabelle!"', 'u"csrfmiddlewaretoken"': 'u"wTE9RZGvjCh9RCL00pLloxOYZItQ98JN"'}
>>> a = u'{u"person": u"Annabelle!", u"csrfmiddlewaretoken": u"wTE9RZGvjCh9RCL00pLloxOYZItQ98JN"}'
>>> yaml.load(a)
{'u"person"': 'u"Annabelle!"', 'u"csrfmiddlewaretoken"': 'u"wTE9RZGvjCh9RCL00pLloxOYZItQ98JN"'}
Steeplejack answered 9/2, 2015 at 6:24 Comment(1)
Despite YAML "mostly working", it should not be abused in this way. That's a Python Dictionary Literal, use literal_eval, the tool designed to parse those structures, specifically, and safely.Pigeonhearted
P
0

First, and most importantly, do not serialize data twice. Your database is itself a serialization of data, with a rich and expressive set of tools to query, explore, manipulate, and present it. Serializing data to be subsequently placed in a database eliminates the possibility for isolated sub-component updates, sub-component querying & indexing, and couples all writes to mandatory initial reads, for a few of the most significant issues.

Next, Java Script Object Notation (JSON) is a limited subset of the JavaScript language suitable for the representation of static data in service of data interchange. As a subset of the language, this means you can naively eval it within JS to reconstruct the original object. It is a simple serialization (no advanced features such as internal references, template definition, type extension) with the limitations of the JavaScript language baked in and penalties for the use of strings requiring large amounts of "escaping". The use of end markers also makes it difficult to utilize in purely streaming scenarios, e.g. you can't "finalize" an object until hitting its paired }, and as such it also has no marker for record separation. Notable examples of other limitations include delivering HTML within JSON requiring excessive escaping, all numbers are floating point (54-bit integer accuracy, rounding errors, …) making it patently unsuitable for the storage or transfer of financial information or use of technologies (e.g. crypto) requiring 64-bit integers, no native date representation, ...

There are some significant differences between JS and Python as languages, and thus in how JSON "JavaScript Object Notation" vs. PLS (Python Literal Syntax) behave. It just so happens that for the purpose of literal definition, most of JavaScript literal syntax is directly compatible with Python, albeit with slightly differing interpretations. The reverse is not true, see the above examples of disparity. If you care about preserving the fidelity of your data for Python, Python literals are more expressive and less "lossy" than their JS equivalents. However, as other answers/comments have noted, repr() is not a reliable way to generate this representation; Python literal syntax is not meant to be used this way. For the greatest type fidelity I generally recommend YAML serialization, of which JSON is a fully valid subset.

FYI, to address the practical concern of storage of dictionary-like mappings associated with entities, there are entity-attribute-value data models. Arbitrary key-value stores in relational databases FTW, but with power comes responsibility. Use this pattern carefully and only when absolutely needed. (If this is a frequent pattern, look into document stores.)

Pigeonhearted answered 11/3, 2019 at 2:42 Comment(0)
C
0

json.loads should strongly be preferred to ast.literal_eval for parsing JSON, for all the reasons below (summarizing other posters).

In your specific example, your input was illegal/malformed JSON exported the wrong way using Python 2.x (all the unwanted and illegal u' prefixes), anyway Python 2.x is itself near-EOL, please move to 3.x. You can simply use a regex to fixup/preprocess that:

>>> import json
>>> import re
>>> malformed_json = u"{u'person': u'Annabelle!', u'csrfmiddlewaretoken': u'wTE9RZGvjCh9RCL00pLloxOYZItQ98JN'}"

>>> legal_json = re.sub(r'u\'([^\']*)\'', r'"\1"', malformed_json)
'{"person": "Annabelle!", "csrfmiddlewaretoken": "wTE9RZGvjCh9RCL00pLloxOYZItQ98JN"}'

>>> json.loads(legal_json)
{'person': 'Annabelle!', 'csrfmiddlewaretoken': 'wTE9RZGvjCh9RCL00pLloxOYZItQ98JN'}
  • (Note: if your architecture has lots of malformed JSON strings exported the wrong way from 2.x, stored in a DB, that's not a legit reason not to use json.loads, but it is to revisit your architecture. At least just run the fixup regex on all your strings, once, and store the legal JSON back))

json.loads Pros/Cons:

  • handles all legal JSON, unlike ast.literal_eval

  • slow. There are much faster JSON libraries like ultrajson, yajl, simplejson etc. Also, on large import jobs you can use multiprocessing/multithreading (which also gives you protection from memory leaks, which is a common issue with all parsers).

  • numerical fields: converts all integers, long integers and floats to double, may lose precision (@amcgregor)

Chartism answered 11/3, 2019 at 6:58 Comment(0)
E
0

In my case, I skipped ast.literal_eval(selected_cell_from_db) which was not making a JSON dict from single quoted dict (even though it could evaluate it, it was type string) that I could use in the package DeepDiff.

I needed to use json.dumps(obj_to_save_to_db) when saving the object instead of str(obj_to_save_to_db). json.dumps creates a double quoted, json readable string; str() does not),

then json.loads(selected_cell_from_db).

Emplacement answered 28/2, 2023 at 16:49 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.