NumPy Array Copy-On-Write
Asked Answered
B

2

6

I have a class that returns large NumPy arrays. These arrays are cached within the class. I would like the returned arrays to be copy-on-write arrays. If the caller ends up just reading from the array, no copy is ever made. This will case no extra memory will be used. However, the array is "modifiable", but does not modify the internal cached arrays.

My solution at the moment is to make any cached arrays readonly (a.flags.writeable = False). This means that if the caller of the function may have to make their own copy of the array if they want to modify it. Of course, if the source was not from cache and the array was already writable, then they would duplicate the data unnecessarily.

So, optimally I would love something like a.view(flag=copy_on_write). There seems to be a flag for the reverse of this UPDATEIFCOPY which causes a copy to update the original once deallocated.

Thanks!

Bumbailiff answered 20/2, 2014 at 1:6 Comment(0)
P
6

Copy-on-write is a nice concept, but explicit copying seems to be "the NumPy philosophy". So personally I would keep the "readonly" solution if it isn't too clumsy.

But I admit having written my own copy-on-write wrapper class. I don't try to detect write access to the array. Instead the class has a method "get_array(readonly)" returning its (otherwise private) numpy array. The first time you call it with "readonly=False" it makes a copy. This is very explicit, easy to read and quickly understood.

If your copy-on-write numpy array looks like a classical numpy array, the reader of your code (possibly you in 2 years) may have a hard time.

Pedicab answered 20/2, 2014 at 7:7 Comment(1)
I have gone with this method, with the exception that it always read-only and if the caller wants it read-write they can copy it themselves.Bumbailiff
C
4

To implement copy on write, we need to modify base, data, strides of ndarray object. I think this can't be done in pure Python code. I use some Cython code to modify these attributes.

Here is the code in IPython notebook:

%load_ext cythonmagic

use Cython define copy_view():

%%cython
cimport numpy as np

np.import_array()
np.import_ufunc()

def copy_view(np.ndarray a):
    cdef np.ndarray b
    cdef object base
    cdef int i
    base = np.get_array_base(a)
    if base is None or isinstance(base, a.__class__):
        return a
    else:
        print "copy"
        b = a.copy()
        np.set_array_base(a, b)
        a.data = b.data
        for i in range(b.ndim):
            a.strides[i] = b.strides[i]

define a subclass of ndarray:

class cowarray(np.ndarray):
    def __setitem__(self, key, value):
        copy_view(self)
        np.ndarray.__setitem__(self, key, value)

    def __array_prepare__(self, array, context=None):
        if self is array:
            copy_view(self)
        return array

    def __array__(self):
        copy_view(self)
        return self

some test:

a = np.array([1.0, 2, 3, 4])
b = a.view(cowarray)
b[1] = 100 #copy 
print a, b
b[2] = 200 #no copy
print a, b

c = a[::2].view(cowarray)
c[0] = 1000 #copy
print a, c

d = a.view(cowarray)
np.sin(d, d) #copy
print a, d           

the output:

copy
[ 1.  2.  3.  4.] [   1.  100.    3.    4.]
[ 1.  2.  3.  4.] [   1.  100.  200.    4.]
copy
[ 1.  2.  3.  4.] [ 1000.     3.]
copy
[ 1.  2.  3.  4.] [ 0.84147098  0.90929743  0.14112001 -0.7568025 ]
Cumquat answered 20/2, 2014 at 4:4 Comment(1)
I have been trying this out and it looks fairly good. However a few "problems". First I had to add a while loop before the if statement in the Cython code to continually search the base until a match or None was found, other wise doing something like: b=a.view(cowarray); c=b[:2]; c[0]=1000; would not copy. But there are probably other problems, for example with the loop fix it might may more copies than necessary.Bumbailiff

© 2022 - 2024 — McMap. All rights reserved.