I'm using Python 2.7 and NumPy 1.11.2, as well as the latest versions of dill ( I just did the pip install dill
) , on Ubuntu 16.04.
When storing a NumPy array using pickle, I find that pickle is very slow, and stores arrays at almost three times the 'necessary' size.
For example, in the following code, pickle is approximately 50 times slower (1s versus 50s), and creates a file that is 2.2GB instead of 800MB.
import numpy
import pickle
import dill
B=numpy.random.rand(10000,10000)
with open('dill','wb') as fp:
dill.dump(B,fp)
with open('pickle','wb') as fp:
pickle.dump(B,fp)
I thought dill was just a wrapper around pickle. If this is true, is there a way that I can improve the performance of pickle myself? Is it generally not advisable to use pickle for NumPy arrays?
EDIT: Using Python3, I get the same performance for pickle
and dill
PS: I know about numpy.save
, but I am working in a framework where I store lots of different objects, all residing in a dictionary, to a file.
dill
but it seems to be an extension of pickle plus you can save a session state so it should just work fine. – DhimancPIckle
. That is the standard version in Python 3. – YarboroughcPickle
instead does not make any difference in runtime and memory consumption – Conchiferous