Understanding shared_memory in Python 3.8
Asked Answered
F

1

16

I'm trying to understand some of shared_memory's operation.

Looking at the source , it looks like the module uses shm_open() for UNIX environments, and CreateFileMapping \ OpenFileMapping on windows, combined with mmap.

I understand from here, that in order to avoid a thorough serialization / deserialization by pickle, one needs to implement __setstate__() and __getstate__() explicitly for his shared datatype.

I do not see any such implementation in shared_memory.py.

How does shared_memory circumvent the pickle treatment?

Also, on a Windows machine, this alone seems to survive accross interpreters:

from mmap import mmap

shared_size = 12
shared_label = "my_mem"

mmap(-1, shared_size , shared_label)

Why then is CreateFileMapping \ OpenFileMapping needed here?

Footlight answered 3/7, 2019 at 20:39 Comment(1)
I think shared_memory dodges pickling because it only provides a memoryview that wraps the shared buffer. You can read and write raw bytes, but you cannot pass objects to SharedMemory. It has no interface for that. To get an object into memory, you would need to serialize it to raw bytes and blast those into the buffer. Pickling creeps back into the equation because of the serialization step. Got no clue on the second question. Lastly, note that mmap plays a role in both the unix and windows branches in the constructor. See line 111. Disclaimer, not an authority on anything.Apropos
G
15

How does shared_memory circumvent the pickle treatment?

I think you are confusing shared ctypes and shared objects between processes.

First, you don't have to use the sharing mechanisms provided by multiprocessing in order to get shared objects, you can just wrap basic primitives such as mmap / Windows-equivalent or get fancier using any API that your OS/kernel provides you.

Next, the second link you mention regarding how copy is done and how __getstate__ defines the behavior of the pickling is dependent on you — using the sharedctypes module API. You are not forced to perform pickling to share memory between two processes.

In fact, sharedctypes is backed by anonymous shared memory which uses: https://github.com/python/cpython/blob/master/Lib/multiprocessing/heap.py#L31

Both implementations relies on an mmap-like primitive.

Anyway, if you try to copy something using sharedctype, you will hit:

And this function is using ForkingPickler which will make use of pickle and then… ultimately, you'll call __getstate__ somewhere.

But it's not relevant with shared_memory, because shared_memory is not really a ctype-like object.

You have other ways to share objects between processes, using the Resource Sharer / Tracker API: https://github.com/python/cpython/blob/master/Lib/multiprocessing/resource_sharer.py which will rely on pickle serialization/deserialization.

But you don't share shared memory through shared memory, right?

When you use: https://github.com/python/cpython/blob/master/Lib/multiprocessing/shared_memory.py

You create a block of memory with a unique name, and all processes must have the unique name before sharing the memory, otherwise you will not be able to attach it.

Basically, the analogy is:

You have a group of friends and you all have a unique secret base that only you have the location, you will go on errands, be away from each other, but you can all meet at this unique location.

In order for this to work, you must all know the location before going away from each other. If you do not have it beforehand, you are not certain that you will be able to figure out the place to meet them.

That is the same with the shared_memory, you only need its name to open it. You don't share / transfer shared_memory between processes. You read into shared_memory using its unique name from multiple processes.

As a result, why would you pickle it? You can. You can absolutely pickle it. But that might not be built-in, because it's straightforward to just send the unique name to all your processes through another shared memory channel or anything like that.

There is no circumvention required here. ShareableList is just an example of application of SharedMemory class. As you can see it here: https://github.com/python/cpython/blob/master/Lib/multiprocessing/shared_memory.py#L314

It requires something akin to a unique name, you can use anonymous shared memory also and transmit its name later through another channel (write a temporary file, send it back to some API, whatever).

Why then is CreateFileMapping \ OpenFileMapping needed here?

Because it depends on your Python interpreter, here you are might be using CPython, which is doing the following:

https://github.com/python/cpython/blob/master/Modules/mmapmodule.c#L1440

It's already using CreateFileMapping indirectly so that doing CreateFileMapping then attaching it is just duplicating the already-done work in CPython.

But, what about others interpreters? Do all interpreters perform the necessary to make mmap work on non-POSIX platforms? Maybe the rationale of the developer would be this.

Anyway, it is not surprising that mmap would work out of the box.

Getup answered 11/7, 2019 at 23:28 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.