Why is json.loads an order of magnitude faster than ast.literal_eval?
Asked Answered
I

1

17

After answering a question about how to parse a text file containing arrays of floats, I ran the following benchmark:

import timeit
import random

line = [random.random() for x in range(1000)]
n = 10000

json_setup = 'line = "{}"; import json'.format(line)
json_work = 'json.loads(line)'
json_time = timeit.timeit(json_work, json_setup, number=n)
print "json: ", json_time

ast_setup = 'line = "{}"; import ast'.format(line)
ast_work = 'ast.literal_eval(line)'
ast_time = timeit.timeit(ast_work, ast_setup, number=n)
print "ast: ", ast_time

print "time ratio ast/json: ", ast_time / json_time

I ran this code several times and consistently got this kind of results:

$ python json-ast-bench.py 
json: 4.3199338913
ast: 28.4827561378
time ratio ast/json:  6.59333148483

So it appears that json is almost an order of magnitude faster than ast for this use case.

I had the same results with both Python 2.7.5+ and Python 3.3.2+.

Questions:

  1. Why is json.loads so much faster ? This question seems to imply that ast is more flexible regarding the input data (double or single quotes)
  2. Are there use cases where I would prefer to use ast.literal_eval over json.loads although it's slower ?

Edit: Anyway if performance matters, I would recommend using UltraJSON (just what I use at work, ~4 times faster than json using the same mini-benchmark).

Impeccant answered 19/1, 2014 at 22:16 Comment(6)
The JSON grammar is simpler than the Python grammar; the latter supports many different forms of string literals, for example.Pickerelweed
Ok, I was just surprised that you would recommend ast to parse arrays of float while they made me think of JSON. Thanks for your answer !Impeccant
Grammar complexity is a red herring here - yes, Python's literal grammar is more complex, but these things were all designed to be parsed correctly in one pass without backtracking. The real reason is that ast.literal_eval is so lightly used that nobody felt it was worth the time to work (& work, & work) at speeding it. In contrast, the JSON libraries are routinely used to parse gigabytes of data.Gean
@MaximeR.: because the format was probably saved as str(python_list) rather than JSON; JSON just didn't spring to mind immediately.Pickerelweed
@MartijnPieters ok, thanks to you I'm good to go to study the other functions of the ast module!Impeccant
@MaximeR.: There aren't any other "general-purpose helper" type functions there; the rest of the module is all about parsing the full Python grammar into parse trees. Which is very cool, and worth learning—if you've ever wondered how Python parses -2 or 1+2j or [i+1 for i in range(5)], just feed it to ast.dump(ast.parse(s)).Stash
S
15

The two functions are parsing entirely different languages—JSON, and Python literal syntax.* As literal_eval says:

The string or node provided may only consist of the following Python literal structures: strings, bytes, numbers, tuples, lists, dicts, sets, booleans, and None.

JSON, by contrast, only handles double-quoted JavaScript string literals (not quite identical to Python's**), JavaScript numbers (only int and float***), objects (roughly equivalent to dicts), arrays (roughly equivalent to lists), JavaScript booleans (which are different from Python's), and null.

The fact that these two languages happen to have some overlap doesn't mean they're the same language.


Why is json.loads so much faster ?

Because Python literal syntax is a more complex and powerful language than JSON, it's likely to be slower to parse. And, probably more importantly, because Python literal syntax is not intended to be used as a data interchange format (in fact, it's specifically not supposed to be used for that), nobody is likely to put much effort into making it fast for data interchange.****

This question seems to imply that ast is more flexible regarding the input data (double or single quotes)

That, and raw string literals, and Unicode vs. bytes string literals, and complex numbers, and sets, and all kinds of other things JSON doesn't handle.

Are there use cases where I would prefer to use ast.literal_eval over json.loads although it's slower ?

Yes. When you want to parse Python literals, you should use ast.literal_eval. (Or, better yet, re-think your design so you don't want to parse Python literals…)


* This is a bit of a vague term. For example, -2 is not a literal in Python, but an operator expression, but literal_eval can handle it. And of course tuple/list/dict/set displays are not literals, but literal_eval can handle them—except that comprehensions are also displays, and literal_eval cannot handle them. Other functions in the ast module can help you find out what really is and isn't a literal—e.g., ast.dump(ast.parse("expr")).

** For example, "\q" is an error in JSON.

*** Technically, JSON only handles one "number" type, which is floating-point. But Python's json module parses numbers with no decimal point or exponent as integers, and the same is true in many other languages' JSON modules.

**** If you missed Tim Peters's comment on the question: "ast.literal_eval is so lightly used that nobody felt it was worth the time to work (& work, & work) at speeding it. In contrast, the JSON libraries are routinely used to parse gigabytes of data."

Stash answered 19/1, 2014 at 22:18 Comment(6)
What makes you say tuples/lists/dicts/sets are not literals?Musicianship
@BrenBarn: The reference documentation on literals that I linked to. A display is an enclosure, which is a different kind of atom from a literal.Stash
@BrenBarn: PS, although I linked to Python 3 docs, all of the same things are true in 2.x, except that comprehensions were not a kind of display until… 2.6, I think.Stash
Thanks for your detailed answer :) Also thanks @Musicianship because I was going to ask for the definition of Python literals!Impeccant
@MaximeR.: Do you think that needs to be explained more prominently in the answer? The fact that two out of two people looking for that information missed it probably isn't a great sign…Stash
Well I missed it before you added the link, you current answer looks actually good to me.Impeccant

© 2022 - 2024 — McMap. All rights reserved.