Check if Numpy Array is Stored in Shared Memory
Asked Answered
C

1

6

In Python 3.8+, is it possible to check whether a numpy array is being stored in shared memory?

In the following example, a numpy array sharedArr was created using the buffer of a multiprocessing.shared_memory.SharedMemory object. Will like to know if we can write a function that can detect whether SharedMemory is used.

import numpy as np
from multiprocessing import shared_memory

if __name__ == '__main__':
    # Created numpy array `sharedArr`in shared memory
    arr = np.zeros(5)
    shm = shared_memory.SharedMemory(create=True, size=arr.nbytes)
    sharedArr = np.ndarray(arr.shape, dtype=arr.dtype, buffer=shm.buf)
    sharedArr[:] = arr[:]

    # How to tell if numpy array is stored in shared memory?
    print(type(sharedArr))      # <class 'numpy.ndarray'>
    print(hex(id(sharedArr)))   # 0x7fac99469f30

    shm.close()
    shm.unlink()
Confection answered 3/9, 2020 at 18:30 Comment(1)
hex(id(...)) is a huge red herring. It not a useful way to determine what is going on with underyling buffers. id does not provide the memory address, that is a CPython implementation detail, it is merely a number that is guaranteed unique for the lifetime of an object. In this case, it is the address of the PyObject header of the np.ndarray object, but that has nothing to do with the buffer, indeed, you can create an arbitrary number of numpy.ndarray objects with different id's that share the same bufferReconcilable
B
6

In this particular case, you can use the base attribute of the shared array. The attribute is a reference to the underlying object from which this array derives its memory. This is None for most arrays, to indicate that such an array owns its data. Running this code on my machine indicates that this array's base is a mmap object:

>>> sharedArr.base
<mmap.mmap at 0x11a4aa670>

If you still have a reference to the shared memory object from which the array was allocated, you can compare the array's base to the shared memory segment's memory map:

>>> sharedArr.base is shm._mmap
True

If you don't have the shm object lying around, as you wouldn't in a standalone function which could hypothetically perform this task, I doubt there's a portable and foolproof way to do it.

Since NumPy provides its own memory-map object, it may suffice for your case to do the former check. That is, make the assumption that if the array is backed by a vanilla, builtin Python memory map, it is allocated from shared memory:

import mmap

def array_is_from_shared_memory(arr):
    return isinstance(arr.base, mmap.mmap)

This works in your particular example, but you'd have to be careful with it, clearly document the assumptions that it makes, and test that it provides you with the actual information you need in your exact application.

Blavatsky answered 3/9, 2020 at 19:19 Comment(6)
Amazing answer, thank you! What are some examples of objects that are not backed by mmap but are still held in shared memory?Confection
I'm glad you found the answer helpful! What can go wrong depends entirely on what you're doing with these arrays. If that assumption doesn't hold, there are two cases. Consider an array backed by shared memory doesn't use a mmap under the hood. Then this function would cause you to "miss" this array in whatever code you want to run on shared arrays. OTOH, an array that does use a mmap but isn't actually shared might cause you to inadvertently overwrite some data, or to expect that change to show up somewhere else, even though it won't. How "bad" those are depends on your use case :)Blavatsky
This does not work. If a is a NumPy array, then b = a; b.base is None returns True. The .base attribute is only used if the array is a view over another array.Detrude
@CrisLuengo It's true that b.base is None in that case, but that's not the same thing as the original question. "Normal" assignment, sharedArr = arr, was not used to create the array, it was the ndarray constructor that specifically sets the buffer for sharedArr to be the shared memory segment. Also, the expression sharedArr[:] = arr[:] was used to copy data. That doesn't rebind the variable sharedArr, it literally copies the elements.Blavatsky
Sure, it’s not the context of the question. Still, your first paragraph makes it sound like base always points to the original array, but it doesn’t. I thought it would be important to clarify that for future visitors, hence my comment (I came here through a search for shared data in general).Detrude
Ah I see, I interpreted “does not work” as “does not solve the question”! Yes, the base attribute is only non-None in cases where the array takes its memory from “something else”. It’s a bit fraught in general, which I hopefully made clear in my answer.Blavatsky

© 2022 - 2024 — McMap. All rights reserved.