Pickle or json? [duplicate]

Y

8

167

I need to save to disk a little dict object whose keys are of the type str and values are ints and then recover it. Something like this:

{'juanjo': 2, 'pedro':99, 'other': 333}

What is the best option and why? Serialize it with pickle or with simplejson?

I am using Python 2.6.

Yoshida answered 13/2, 2010 at 22:12 Comment(7)

convert it to what? Also, in what sense better? – Helenehelenka 13/2, 2010 at 22:24

In 2.6 you wouldn't use simplejson, you'd use the builtin json module (which has the same exact interface). – Alter 13/2, 2010 at 22:33

"best"? Best for what? Speed? Complexity? Flexibility? Cost? – Grisham 13/2, 2010 at 22:39

see also #8969384 – Odense 16/11, 2014 at 21:31

@Trilarion: YAML is a superset of JSON – Decasyllable 19/3, 2016 at 22:5

For posterity: JSON has a problem with tuples as keys. Pickle doesn't. e.g. Pickle can handle {('a','b'):'c'}, not JSON as of mid-2016. So bear that in mind. See: #7002106 – Doubletongue 12/5, 2016 at 2:9

There's much more than that that JSON cannot handle as keys, @Salmonstrikes. Numbers for a start. JSON is a good firm format, especially when compared to something like YAML with all it's formatting and interpretation problems, but it is quite restricted. – Mews 2/12, 2023 at 3:39

R

98

If you do not have any interoperability requirements (e.g. you are just going to use the data with Python) and a binary format is fine, go with cPickle which gives you really fast Python object serialization.

If you want interoperability or you want a text format to store your data, go with JSON (or some other appropriate format depending on your constraints).

Ruination answered 13/2, 2010 at 22:22 Comment(8)

My answer highlights the concerns I think are most important to consider when choosing either solution. I make no claim about either being faster than the other. If JSON is faster AND otherwise suitable, go with JSON! (I.e., there's no reason for your down-vote.) – Closed 4/10, 2012 at 12:12

My point is: there is no real reason for using cPickle (or pickle) based on your premises over JSON. When I first read your answer I thought the reason might have been speed, but since this is not the case... :) – Octamerous 4/10, 2012 at 17:54

The benchmark cited by @Octamerous only tests strings. I tested str, int and float seperately and found out that json is slower than cPickle with float serialization, but faster with float unserialization. For int (and str), json is faster both ways. Data and code: gist.github.com/marians/f1314446b8bf4d34e782 – Warila 3/7, 2014 at 9:20

Given that json is more interoperable, more secure and in many cases faster than cPickle, for simple data structures I would prefer json over cPickle. – Warila 3/7, 2014 at 9:22

cPickle's latest protocol is now faster than JSON. The up-voted comment about JSON being faster is outdated by a few years. https://mcmap.net/q/197351/-pickle-or-json-duplicate – Edrick 22/9, 2016 at 1:34

@JDiMatteo: I suspect cPickle would have been faster even at the time of that comment if the test suite had used protocol 2 (available since 2.3 or so, but not the default for back compat reasons) rather than the default Python 2 protocol, 0. 0 is severely limited, only using 7 of 8 bits in each byte (this hurts a lot for raw binary data, which has to be reencoded, instead of dumped raw), not supporting new-style classes well, etc. Protocol 2 with cPickle (or on Python 3, plain pickle with the default protocol 3 or higher) would likely beat JSON in all but the most contrived cases. – Birkenhead 2/2, 2018 at 23:40

A (might be minor) down side of JSON: JSON don't have tuples. A python tuple will end up being a list after serializing/deserializing. If your data contain tuples and you want to deserialize them as tuples, you need to avoid JSON. – Nadenenader 12/3, 2018 at 18:1

Inter-language portability aside, did someone mention lack of intra-language portability (between minor versions of the same language)? – Barabarabarabas 13/5, 2023 at 7:48

A

132

I prefer JSON over pickle for my serialization. Unpickling can run arbitrary code, and using pickle to transfer data between programs or store data between sessions is a security hole. JSON does not introduce a security hole and is standardized, so the data can be accessed by programs in different languages if you ever need to.

Alter answered 13/2, 2010 at 22:33 Comment(7)

Thanks. Anyway I'll be dumping and loading in the same program. – Yoshida 13/2, 2010 at 22:39

Though the security risks may be low in your current application, JSON allows you to close the whole altogether. – Alter 13/2, 2010 at 23:54

One can create a pickle-virus that pickles itself into everything that is pickled after loaded. With json this is not possible. – Marabou 20/11, 2013 at 11:32

Apart from security, JSON has the additional advantage that it makes migrations easy, so you can load data that was saved by an older version of your application. Meanwhile you could have added a field, or replaced a whole sub structure. Writing such a converter (migration) for dict/list is straight forward, but with Pickle you'll have a hard time loading it in the first place, before you can even think of converting. – Rotenone 16/1, 2017 at 11:25

I hadn't thought about this aspect (security and the ability for pickled objects to run arbitrary code). Thanks for pointing that out! – Jacintajacinth 25/7, 2018 at 12:47

'only unpickle data you trust' - docs.python.org/3/library/pickle.html – Usherette 5/6, 2022 at 16:49

Another argument against pickle format is lack of portability guarantees between python minor versions (builds differences are (most likely) fine). – Barabarabarabas 13/5, 2023 at 7:45