Subclassing numpy ndarray problem
Asked Answered
V

3

12

I would like to subclass numpy ndarray. However, I cannot change the array. Why self = ... does not change the array? Thanks.

import numpy as np

class Data(np.ndarray):

    def __new__(cls, inputarr):
        obj = np.asarray(inputarr).view(cls)
        return obj

    def remove_some(self, t):
        test_cols, test_vals = zip(*t)
        test_cols = self[list(test_cols)]
        test_vals = np.array(test_vals, test_cols.dtype)

        self = self[test_cols != test_vals] # Is this part correct?

        print len(self) # correct result

z = np.array([(1,2,3), (4,5,6), (7,8,9)],
    dtype=[('a', int), ('b', int), ('c', int)])
d = Data(z)
d.remove_some([('a',4)])

print len(d)  # output the same size as original. Why?
Villareal answered 1/3, 2011 at 0:38 Comment(4)
please provide your expected output, it is not clear what you want to achieve.Glee
I want remove the rows from the Data instance.Villareal
Ok, you might use a mask, but better if you ask another question as this has not much to do with subclassing ndarrayGlee
[Another question has been posted][1] with the same issue when subclassing a ndarray. [1]: #16049937Twoway
U
5

Perhaps make this a function, rather than a method:

import numpy as np

def remove_row(arr,col,val):
    return arr[arr[col]!=val]

z = np.array([(1,2,3), (4,5,6), (7,8,9)],
    dtype=[('a', int), ('b', int), ('c', int)])

z=remove_row(z,'a',4)
print(repr(z))

# array([(1, 2, 3), (7, 8, 9)], 
#       dtype=[('a', '<i4'), ('b', '<i4'), ('c', '<i4')])

Or, if you want it as a method,

import numpy as np

class Data(np.ndarray):

    def __new__(cls, inputarr):
        obj = np.asarray(inputarr).view(cls)
        return obj

    def remove_some(self, col, val):
        return self[self[col] != val]

z = np.array([(1,2,3), (4,5,6), (7,8,9)],
    dtype=[('a', int), ('b', int), ('c', int)])
d = Data(z)
d = d.remove_some('a', 4)
print(d)

The key difference here is that remove_some does not try to modify self, it merely returns a new instance of Data.

Unsure answered 1/3, 2011 at 13:5 Comment(3)
This is perhaps a helpful response to the question, but is not an answer. Why does self = ... not change the value? Maybe the answer is below? I'll repost otherwise.Communalism
Imagine d, your Data instance. It is pointing to a block of memory which holds the underlying data. To remove a column in-place, you'd have to move the other columns together and then resize the array. When you say self = some_other_array you are redirecting the variable name self to another block of memory. Outside the remove_row method, however, the variable name d is still pointing to the original block of memory. So it fails to modify d.Unsure
All the advice I've ever read about numpy says one should not try to resize numpy arrays. It is possible to do, but all the copying makes it slow. It is better to use slices to create views, or fancy indexing to create new arrays with the desired data. Yes, making a new array also involves copying, but at least you are saved from the complexity of shifting data around in-place.Unsure
C
6

The reason you are not getting the result you expect is because you are re-assigning self within the method remove_some. You are just creating a new local variable self. If your array shape were not to change, you could simply do self[:] = ... and you could keep the reference to self and all would be well, but you are trying to change the shape of self. Which means we need to re-allocate some new memory and change where we point when we refer to self.

I don't know how to do this. I thought it could be achieved by __array_finalize__ or __array__ or __array_wrap__. But everything I've tried is falling short.

Now, there's another way to go about this that doesn't subclass ndarray. You can make a new class that keeps an attribute that is an ndarray and then override all the usual __add__, __mul__, etc.. Something like this:

Class Data(object):
    def __init__(self, inarr):
        self._array = np.array(inarr)
    def remove_some(x):
        self._array = self._array[x]
    def __add__(self, other):
        return np.add(self._array, other)

Well, you get the picture. It's a pain to override all the operators, but in the long run, I think more flexible.

You'll have to read this thoroughly to do it right. There are methods like __array_finalize__ that need to be called a the right time to do "cleanup".

Caravette answered 1/3, 2011 at 0:56 Comment(2)
I thought that __array_finalize__ is used when a new instance is being initiated, for example in adding extra attributes.Villareal
I thought it had to be called any time you re-allocated the array like the OP is doing. But to be honest, this is over my head. __array_wrap__ seems maybe closer to what is wanted, but only returns when called by a ufunc.Caravette
U
5

Perhaps make this a function, rather than a method:

import numpy as np

def remove_row(arr,col,val):
    return arr[arr[col]!=val]

z = np.array([(1,2,3), (4,5,6), (7,8,9)],
    dtype=[('a', int), ('b', int), ('c', int)])

z=remove_row(z,'a',4)
print(repr(z))

# array([(1, 2, 3), (7, 8, 9)], 
#       dtype=[('a', '<i4'), ('b', '<i4'), ('c', '<i4')])

Or, if you want it as a method,

import numpy as np

class Data(np.ndarray):

    def __new__(cls, inputarr):
        obj = np.asarray(inputarr).view(cls)
        return obj

    def remove_some(self, col, val):
        return self[self[col] != val]

z = np.array([(1,2,3), (4,5,6), (7,8,9)],
    dtype=[('a', int), ('b', int), ('c', int)])
d = Data(z)
d = d.remove_some('a', 4)
print(d)

The key difference here is that remove_some does not try to modify self, it merely returns a new instance of Data.

Unsure answered 1/3, 2011 at 13:5 Comment(3)
This is perhaps a helpful response to the question, but is not an answer. Why does self = ... not change the value? Maybe the answer is below? I'll repost otherwise.Communalism
Imagine d, your Data instance. It is pointing to a block of memory which holds the underlying data. To remove a column in-place, you'd have to move the other columns together and then resize the array. When you say self = some_other_array you are redirecting the variable name self to another block of memory. Outside the remove_row method, however, the variable name d is still pointing to the original block of memory. So it fails to modify d.Unsure
All the advice I've ever read about numpy says one should not try to resize numpy arrays. It is possible to do, but all the copying makes it slow. It is better to use slices to create views, or fancy indexing to create new arrays with the desired data. Yes, making a new array also involves copying, but at least you are saved from the complexity of shifting data around in-place.Unsure
G
3

I tried to do the same, but it is really very complex to subclass ndarray.

If you only have to add some functionality, I would suggest to create a class which stores the array as attribute.

class Data(object):

    def __init__(self, array):
        self.array = array

    def remove_some(self, t):
        //operate on self.array
        pass

d = Data(z)
print(d.array)
Glee answered 1/3, 2011 at 2:12 Comment(1)
Documentation on how to sub class ndarray might help make it easier.Yoder

© 2022 - 2024 — McMap. All rights reserved.