dill vs cPickle speed difference
Asked Answered
N

1

23

I am trying to serialize thousands of objects and some of these objects are lambda objects.

Since cPickle doesn't work for lambdas, I tried using dill. However, the drop in computational speed is more than 10 times when unpickleing (or undilling (?)). Looking through the source, it seems that dill uses pickle internally which might be the reason for the speed drop.

Is there another option for me that combine the best of both modules?

EDIT: The most significant speed drop is during unpickleing.

Northamptonshire answered 19/6, 2016 at 10:17 Comment(4)
check this answerWonted
The problem, as I stated in my question, is that dill is too slow compared to cPickle.Northamptonshire
"PiCloud-serialized objects can be de-serialized using the normal pickle/cPickle load and loads functions.", so i thought it could help you if your serialization / desiralization ratio is << 1Wonted
Ah sorry. I just clarified in the question that the significant speed drop is during desiralization.Northamptonshire
E
50

I'm the dill author. Yes, dill is slower typically, but that's the penalty you pay for more robust serialization. If you are serializing a lot of classes and functions, then you might want to try one of the dill variants in dill.settings If you use byref=True then dill will pickle several objects by reference (which is faster then the default). Other settings trade off picklibility for speed in selected objects.

In [1]: import dill

In [2]: f = lambda x:x

In [3]: %timeit dill.loads(dill.dumps(f))
1000 loops, best of 3: 286 us per loop

In [4]: dill.settings['byref'] = True

In [5]: %timeit dill.loads(dill.dumps(f))
1000 loops, best of 3: 237 us per loop

In [6]: dill.settings
Out[6]: {'byref': True, 'fmode': 0, 'protocol': 2, 'recurse': False}

In [7]: dill.settings['recurse'] = True

In [8]: %timeit dill.loads(dill.dumps(f))
1000 loops, best of 3: 408 us per loop

In [9]: class Foo(object):
   ...:     x = 1
   ...:     def bar(self, y):
   ...:         return y + self.x
   ...:     

In [10]: g = Foo()

In [11]: %timeit dill.loads(dill.dumps(g))
10000 loops, best of 3: 87.6 us per loop

In [12]: dill.settings['recurse'] = False

In [13]: %timeit dill.loads(dill.dumps(g))
10000 loops, best of 3: 87.4 us per loop

In [14]: dill.settings['byref'] = False

In [15]: %timeit dill.loads(dill.dumps(g))
1000 loops, best of 3: 499 us per loop

In [16]: 
Enounce answered 20/7, 2016 at 9:56 Comment(2)
Hi @Mike McKerns, when I started using dill it was for these custom Python classes that I instantiated that had many complicated data types inside and it worked perfectly (when pickle didn't). I've been using dill since then but I'm wondering what types of data-types can I use pickle for that won't break? This might be out of the scope of a comment but I feel like you would be the expert since you made dill for a reason.Pyrexia
Look at github.com/uqfoundation/dill/blob/master/dill/_objects.py. It's my best effort to keep track of what can be pickled and what can't (by dill and/or pickle). There's an associated test for this file as well. Now, putting them inside classes… that's a bit more untested.Enounce

© 2022 - 2024 — McMap. All rights reserved.