Can Pickle handle files larger than the RAM installed on my machine?
Asked Answered
R

2

16

I'm using pickle for saving on disk my NLP classifier built with the TextBlob library.

I'm using pickle after a lot of searches related to this question. At the moment I'm working locally and I have no problem loading the pickle file (which is 1.5Gb) with my i7 and 16gb RAM machine. But the idea is that my program, in the future, has to run on my server which only has 512Mb RAM installed.

Can pickle handle such a large file or will I face memory issues?

On my server I've got Python 3.5 installed and it is a Linux server (not sure which distribution).

I'm asking because at the moment I can't access my server, so I can't just try and find out what happens, but at the same time I'm doubtful if I can keep this approach or I have to find other solutions.

Rexanna answered 27/11, 2015 at 21:31 Comment(5)
Imho that you cannot even have variables which refer to things larger than RAM, can you?Labana
That's no dumb question. It's a really interesting one! +1Labana
@Labana Honestly i've got no idea, i'm not really expert on how pickle works, or if my machine can use virtual memory if ram ends.Rexanna
Is your solution here? deeplearning.net/software/theano/tutorial/…Labana
Do you want to accept skrrgwasme's answer? Imho it's enought.Labana
C
10

Unfortunately this is difficult to accurately answer without testing it on your machine.

Here are some initial thoughts:

  1. There is no inherent size limit that the Pickle module enforces, but you're pushing the boundaries of its intended use. It's not designed for individual large objects. However, you since you're using Python 3.5, you will be able to take advantage of PEP 3154 which adds better support for large objects. You should specify pickle.HIGHEST_PROTOCOL when you dump your data.

  2. You will likely have a large performance hit because you're trying to deal with an object that is 3x the size of your memory. Your system will probably start swapping, and possibly even thrashing. RAM is so cheap these days, bumping it up to at least 2GB should help significantly.

  3. To handle the swapping, make sure you have enough swap space available (a large swap partition if you're on Linux, or enough space for the swap file on your primary partition on Windows).

  4. As pal sch's comment shows, Pickle is not very friendly to RAM consumption during the pickling process, so you may have to deal with Python trying to get even more memory from the OS than the 1.5GB we may expect for your object.

Given these considerations, I don't expect it to work out very well for you. I'd strongly suggest upgrading the RAM on your target machine to make this work.

Ceilidh answered 27/11, 2015 at 21:50 Comment(1)
Does swapping and thrashing happen automatically on a Linux system? I currently use Windows and my experience is MemoryError whenever RAM runs out.Pinguid
N
3

I don't see how you could load an object into RAM that exceeds the RAM. i.e. bytes(num_bytes_greater_than_ram) will always raise an MemoryError.

Nomarchy answered 27/11, 2015 at 22:1 Comment(4)
Maybe pickle variables refer to the drive or so?Labana
By default pickle does not pickle references. If the object generated by the TextBlob library is disk backed and implements a custom pickling interface, then maybe.Nomarchy
As an example try pickle.loads(pickle.dumps(memoryview(b"abc"))).Nomarchy
Ok. So +1 from me. ;-)Labana

© 2022 - 2024 — McMap. All rights reserved.