Python TypeError on Load Object using Dill
Asked Answered
M

1

1

Trying to render a large and (possibly very) unpicklable object to a file for later use.

No complaints on the dill.dump(file) side:

In [1]: import echonest.remix.audio as audio

In [2]: import dill

In [3]: audiofile = audio.LocalAudioFile("/Users/path/Track01.mp3")
en-ffmpeg -i "/Users/path/audio/Track01.mp3" -y -ac 2 -ar 44100 "/var/folders/X2/X2KGhecyG0aQhzRDohJqtU+++TI/-Tmp-/tmpWbonbH.wav"
Computed MD5 of file is b3820c166a014b7fb8abe15f42bbf26e
Probing for existing analysis

In [4]: with open('audio_object_dill.pkl', 'wb') as f:
   ...:     dill.dump(audiofile, f)
   ...:  

In [5]: 

But trying to load the .pkl file:

In [1]: import dill

In [2]: with open('audio_object_dill.pkl', 'rb') as f:
   ...:     audio_object = dill.load(f)
   ...:  

Returns following error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-203b696a7d73> in <module>()
      1 with open('audio_object_dill.pkl', 'rb') as f:
----> 2     audio_object = dill.load(f)
      3 

/Users/mikekilmer/Envs/GLITCH/lib/python2.7/site-packages/dill-0.2.2.dev-py2.7.egg/dill/dill.pyc in load(file)
    185     pik = Unpickler(file)
    186     pik._main_module = _main_module
--> 187     obj = pik.load()
    188     if type(obj).__module__ == _main_module.__name__: # point obj class to main
    189         try: obj.__class__ == getattr(pik._main_module, type(obj).__name__)

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.pyc in load(self)
    856             while 1:
    857                 key = read(1)
--> 858                 dispatch[key](self)
    859         except _Stop, stopinst:
    860             return stopinst.value

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.pyc in load_newobj(self)
   1081         args = self.stack.pop()
   1082         cls = self.stack[-1]
-> 1083         obj = cls.__new__(cls, *args)
   1084         self.stack[-1] = obj
   1085     dispatch[NEWOBJ] = load_newobj

TypeError: __new__() takes at least 2 arguments (1 given)

The AudioObject is much more complex (and large) than the class object the above calls are made on (from SO answer), and I'm unclear as to whether I need to send a second argument via dill, and if so, what that argument would be or how to tell if any approach to pickling is viable for this specific object.

Examining the object itself a bit:

In [4]: for k, v in vars(audiofile).items():
...:     print k, v
...: 

returns:

is_local False
defer False
numChannels 2
verbose True
endindex 13627008
analysis <echonest.remix.audio.AudioAnalysis object at 0x103c61bd0>
filename /Users/mikekilmer/Envs/GLITCH/glitcher/audio/Track01.mp3
convertedfile /var/folders/X2/X2KGhecyG0aQhzRDohJqtU+++TI/-Tmp-/tmp9ADD_Z.wav
sampleRate 44100
data [[0 0]
 [0 0]
 [0 0]
 ..., 
 [0 0]
 [0 0]
 [0 0]]

And audiofile.analysis seems to contain an attribute called audiofile.analysis.source which contains (or apparently points back to) audiofile.analysis.source.analysis

Malt answered 28/8, 2014 at 0:6 Comment(14)
exploring the docs a bit more in depth - contained at pypi.python.org/pypi/dillMalt
Am reading in docs.python.org/2/library/pickle.html that "file must have two methods". Maybe the file I'm saving only has one method and that's the missing second argument that's breaking cls.__new__(cls, *args)Malt
Is the echonest API something I could grab a hold of to try out if needed? Anyway, there are a few things that you can try to discover what's going on. First, since it's a class, you can try to toggle the byref in dill.dumps, to toggle pickling the class "by reference". If that doesn't work, try turning on dill.detect.trace(True) to see internal checkpoints in the (de)serialization. You can also look at methods in dill.detect, such as badobjects that can help diagnose what going on. It looks like a mismatch in __getstate__ and __setstate__, which would be weird.Tugboat
It is, @MikeMcKerns. echonest.com. There are two relevant modules available via developer.echonest.com and I shared the procedure at: mzoo.org/getting-the-python-echonest-remix-package-running. Are pypi.python.org/pypi/dill, trac.mystic.cacr.caltech.edu/project/pathos/wiki/dill and the pickle docs basically the extent of the reading material I should be looking at (in implementing the above recommendations)?Malt
If using with open('audio_object_dill.pkl', 'wb') as f: byref would be set like this, dill.dump(audiofile, f, byref=True) with False being the default, right? dill.load results are the same. Entered dill.detect.trace(True) prior to dill.dump call results: pastebin.com/V0fA7aVJ. Lastly, dill.detect.badobjects(audiofile) returns <echonest.remix.audio.LocalAudioFile at 0x103ebc710>. Hmph.Malt
First of all, wow is this a cool package. Digging around a bit, dill.detect.children(audiofile, echonest.remix.audio.LocalAudioFile) yields name 'echonest' is not defined - actually simply had to call it with the variable module was imported with: dill.detect.children(audiofile, audio.LocalAudioFile), which yields our old friend [<echonest.remix.audio.LocalAudioFile at 0x103ebc710>]Malt
Wait! Apparently the API has a built-in method: echonest.github.io/remix/apidocs/…Malt
Geez that's a horrendous trace you have in the pastebin. Yes, that's all the reading material on dill, unfortunately. By the way, you should try badobjects(audiofile, depth=1) -- that allows you to dig into each object, even ones that fail. Also check out this as an example of what dill detection can do. #10082741 #25241639Tugboat
Definitely had played with badobjects(audiofile, depth=1), but it hangs giving f(self, obj) # Call unbound method with explicit self in pickle's save method.Malt
So did, the built-in "save" work from the API? It looked like that may be what they expect you use (instead of dump, directly), and that might be why it seems like load expects something different.Tugboat
Yes. Built-in save works and re-loads beautifully using Dill.Malt
then you should answer your own question(s), as others might run into the same thing.Tugboat
or I'll answer it. someone should, so people don't need to dig into the comments.Tugboat
@MikeMcKerns I will answer it an look forward to having the opportunity to. Probably tomorrow and thank you for the reminder. Have been thinking about. I think i might even have the S.O. cred to add echonest as a keyword.Malt
M
1

In this case, the answer lay within the module itself.

The LocalAudioFile class provides (and each of it's instances can therefor utilize) it's own save method, called via LocalAudioFile.save or more likely the_audio_object_instance.save.

In the case of an .mp3 file, the LocalAudioFile instance consists of a pointer to a temporary .wav file which is the decompressed version of the .mp3, along with a whole bunch of analysis data which is returned from the initial audiofile, after it's been interfaced with the (internet-based) Echonest API.

LocalAudioFile.save calls shutil.copyfile(path_to_wave, wav_path) to save the .wav file with same name and path as original file linked to audio object and returns an error if the file already exists. It calls pickle.dump(self, f) to save the analysis data to a file also in the directory the initial audio object file was called from.

The LocalAudioFile object can be reintroduced simply via pickle.load().

Here's an iPython session in which I used the dill, which is a very useful wrapper or interface that offers most of the standard pickle methods plus a bunch more:

audiofile = audio.LocalAudioFile("/Users/mikekilmer/Envs/GLITCH/glitcher/audio/Track01.mp3")

In [1]: import echonest.remix.audio as audio

In [2]: import dill
# create the audio_file object
In [3]: audiofile = audio.LocalAudioFile("/Users/mikekilmer/Envs/GLITCH/glitcher/audio/Track01.mp3")
en-ffmpeg -i "/Users/path/audio/Track01.mp3" -y -ac 2 -ar 44100 "/var/folders/X2/X2KGhecyG0aQhzRDohJqtU+++TI/-Tmp-/tmp_3Ei0_.wav"
Computed MD5 of file is b3820c166a014b7fb8abe15f42bbf26e
Probing for existing analysis
#call the LocalAudioFile save method
In [4]: audiofile.save()
Saving analysis to local file /Users/path/audio/Track01.mp3.analysis.en
#confirm the object is valid by calling it's duration method
In [5]: audiofile.duration
Out[5]: 308.96
#delete the object - there's probably a "correct" way to do this
in [6]: audiofile = 0
#confirm it's no longer an audio_object
In [7]: audiofile.duration
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-12-04baaeda53a4> in <module>()
----> 1 audiofile2.duration

AttributeError: 'int' object has no attribute 'duration'


#open the pickled version (using dill)
In [8]: with open('/Users/path/audio/Track01.mp3.analysis.en') as f:
   ....:     audiofile = dill.load(f)
   ....:     
#confirm it's a valid LocalAudioFile object
In [8]: audiofile.duration
Out[8]: 308.96

Echonest is a very robust API and the remix package provides a ton of functionality. There's a small list of relevant links assembled here.

Malt answered 2/9, 2014 at 20:7 Comment(1)
I keep facing with following error maximum recursion depth exceeded is it normal?Treasurehouse

© 2022 - 2024 — McMap. All rights reserved.