multiprocessing.RawArray operation
Asked Answered
T

1

5

I read that RawArray can be shared between proceses without being copied, and wanted to understand how it is possible in Python.

I saw in sharedctypes.py, that a RawArray is constructed from a BufferWrapper from heap.py, then nullified with ctypes.memset.

BufferWrapper is made of an Arena object, which itself is built from an mmap (or 100 mmaps in windows, see line 40 in heap.py)

I read that the mmap system call is actually used to allocate memory in Linux/BSD, and the Python module uses MapViewOfFile for windows.

mmap seems handy then. It seems to be able to work directly with mp.pool-

from struct import pack
from mmap import mmap

def pack_into_mmap(idx_nums_tup):

    idx, ints_to_pack = idx_nums_tup
    pack_into(str(len(ints_to_pack)) + 'i', shared_mmap, idx*4*total//2 , *ints_to_pack)


if __name__ == '__main__':

    total = 5 * 10**7
    shared_mmap = mmap(-1, total * 4)
    ints_to_pack = range(total)

    pool = Pool()
    pool.map(pack_into_mmap, enumerate((ints_to_pack[:total//2], ints_to_pack[total//2:])))

My question is -

How does the multirocessing module know not to copy the mmap based RawArray object between processes, like it does with "regular" python objects?

Trinidad answered 7/6, 2019 at 13:41 Comment(0)
M
3

[Python 3.Docs]: multiprocessing - Process-based parallelism serializes / deserializes data exchanged between processes using a proprietary protocol: [Python 3.Docs]: pickle - Python object serialization (and from here the terms: pickle / unpickle).

According to [Python 3.Docs]: pickle - object.__getstate__():

Classes can further influence how their instances are pickled; if the class defines the method __getstate__(), it is called and the returned object is pickled as the contents for the instance, instead of the contents of the instance’s dictionary. If the __getstate__() method is absent, the instance’s __dict__ is pickled as usual.

As seen in (Win variant of) Arena.__getstate__, (class chain: sharedctypes.RawArray -> heap.BufferWrapper - > heap.Heap -> heap.Arena), only the metadata (name and size) are pickled for the Arena instance, but not the buffer itself.

Conversely, in __setstate__, the buffer is constructed based on the (above) metadata.

Mangrum answered 8/6, 2019 at 8:54 Comment(6)
So it's not about avoiding pickle, rather pickling minimal metadata. Thanks!Trinidad
Yes, indeed. :)Mangrum
follow up - what happens in the example above, where raw mmap is used without a class that defines getstate()? ?Does the entire memory region get pickled?Trinidad
Then the default pickling mechanism is used, which unfortunately I don't know at this time. Did you try pickling such a (big) memory object to see whether the result size is comparable?Mangrum
docs.python.org/3/library/pickle.html#pickle.dump. (or dumps).Mangrum
tried dumps, i get cannot pickle 'mmap.mmap' objectTrinidad

© 2022 - 2024 — McMap. All rights reserved.