Prevent numpy from creating a multidimensional array
Asked Answered
D

4

9

NumPy is really helpful when creating arrays. If the first argument for numpy.array has a __getitem__ and __len__ method these are used on the basis that it might be a valid sequence.

Unfortunatly I want to create an array containing dtype=object without NumPy being "helpful".

Broken down to a minimal example the class would like this:

import numpy as np

class Test(object):
    def __init__(self, iterable):
        self.data = iterable

    def __getitem__(self, idx):
        return self.data[idx]

    def __len__(self):
        return len(self.data)

    def __repr__(self):
        return '{}({})'.format(self.__class__.__name__, self.data)

and if the "iterables" have different lengths everything is fine and I get exactly the result I want to have:

>>> np.array([Test([1,2,3]), Test([3,2])], dtype=object)
array([Test([1, 2, 3]), Test([3, 2])], dtype=object)

but NumPy creates a multidimensional array if these happen to have the same length:

>>> np.array([Test([1,2,3]), Test([3,2,1])], dtype=object)
array([[1, 2, 3],
       [3, 2, 1]], dtype=object)

Unfortunatly there is only a ndmin argument so I was wondering if there is a way to enforce a ndmax or somehow prevent NumPy from interpreting the custom classes as another dimension (without deleting __len__ or __getitem__)?

Danieladaniele answered 4/8, 2016 at 18:33 Comment(0)
A
5

A workaround is of course to create an array of the desired shape and then copy the data:

In [19]: lst = [Test([1, 2, 3]), Test([3, 2, 1])]

In [20]: arr = np.empty(len(lst), dtype=object)

In [21]: arr[:] = lst[:]

In [22]: arr
Out[22]: array([Test([1, 2, 3]), Test([3, 2, 1])], dtype=object)

Notice that in any case I would not be surprised if numpy behavior w.r.t. interpreting iterable objects (which is what you want to use, right?) is numpy version dependent. And possibly buggy. Or maybe some of these bugs are actually features. Anyway, I'd be wary of breakage when a numpy version changes.

On the contrary, copying into a pre-created array should be way more robust.

Actomyosin answered 4/8, 2016 at 20:10 Comment(0)
D
8

This behavior has been discussed a number of times before (e.g. Override a dict with numpy support). np.array tries to make as high a dimensional array as it can. The model case is nested lists. If it can iterate and the sublists are equal in length it will 'drill' on down.

Here it went down 2 levels before encountering lists of different length:

In [250]: np.array([[[1,2],[3]],[1,2]],dtype=object)
Out[250]: 
array([[[1, 2], [3]],
       [1, 2]], dtype=object)
In [251]: _.shape
Out[251]: (2, 2)

Without a shape or ndmax parameter it has no way of knowing whether I want it to be (2,) or (2,2). Both of those would work with the dtype.

It's compiled code, so it isn't easy to see exactly what tests it uses. It tries to iterate on lists and tuples, but not on sets or dictionaries.

The surest way to make an object array with a given dimension is to start with an empty one, and fill it

In [266]: A=np.empty((2,3),object)
In [267]: A.fill([[1,'one']])
In [276]: A[:]={1,2}
In [277]: A[:]=[1,2]   # broadcast error

Another way is to start with at least one different element (e.g. a None), and then replace that.

There is a more primitive creator, ndarray that takes shape:

In [280]: np.ndarray((2,3),dtype=object)
Out[280]: 
array([[None, None, None],
       [None, None, None]], dtype=object)

But that's basically the same as np.empty (unless I give it a buffer).

These are fudges, but they aren't expensive (time wise).

================ (edit)

https://github.com/numpy/numpy/issues/5933, Enh: Object array creation function. is an enhancement request. Also https://github.com/numpy/numpy/issues/5303 the error message for accidentally irregular arrays is confusing.

The developer sentiment seems to favor a separate function to create dtype=object arrays, one with more control over the initial dimensions and depth of iteration. They might even strengthen the error checking to keep np.array from creating 'irregular' arrays.

Such a function could detect the shape of a regular nested iterable down to a specified depth, and build an object type array to be filled.

def objarray(alist, depth=1):
    shape=[]; l=alist
    for _ in range(depth):
        shape.append(len(l))
        l = l[0]
    arr = np.empty(shape, dtype=object)
    arr[:]=alist
    return arr

With various depths:

In [528]: alist=[[Test([1,2,3])], [Test([3,2,1])]]
In [529]: objarray(alist,1)
Out[529]: array([[Test([1, 2, 3])], [Test([3, 2, 1])]], dtype=object)
In [530]: objarray(alist,2)
Out[530]: 
array([[Test([1, 2, 3])],
       [Test([3, 2, 1])]], dtype=object)
In [531]: objarray(alist,3)  
Out[531]: 
array([[[1, 2, 3]],

       [[3, 2, 1]]], dtype=object)
In [532]: objarray(alist,4)
...
TypeError: object of type 'int' has no len()
Dyslalia answered 4/8, 2016 at 20:15 Comment(4)
I tried looking for similar questions but I haven't found any. Maybe I just searched for wrong phrases. If you have any references to earlier questions that would be great. Thank you for the answer but I'm actually not looking for a workaround. I'm more interested in a more general approach how I define the maximum depth (dimensions) of an array without knowing the exact length beforehand or to disable that numpy interprets the custom class instance as sequence.Danieladaniele
By changing your Class to subclass dict I can stop it from iterating on your instances. That's an indication that np.array is testing for more than __getitem__. But I haven't been able to find the code that does that kind of checking.Dyslalia
#36664419 - struggles with the same issue; controlling whether np.array iterates on your custom class or not. Same sort of work arounds.Dyslalia
github.com/numpy/numpy/issues/5933, Enh: Object array creation function. Request for function to make array of objects without iteration.Dyslalia
A
5

A workaround is of course to create an array of the desired shape and then copy the data:

In [19]: lst = [Test([1, 2, 3]), Test([3, 2, 1])]

In [20]: arr = np.empty(len(lst), dtype=object)

In [21]: arr[:] = lst[:]

In [22]: arr
Out[22]: array([Test([1, 2, 3]), Test([3, 2, 1])], dtype=object)

Notice that in any case I would not be surprised if numpy behavior w.r.t. interpreting iterable objects (which is what you want to use, right?) is numpy version dependent. And possibly buggy. Or maybe some of these bugs are actually features. Anyway, I'd be wary of breakage when a numpy version changes.

On the contrary, copying into a pre-created array should be way more robust.

Actomyosin answered 4/8, 2016 at 20:10 Comment(0)
H
0

This workaround may not be the most efficient, but I like it for its clarity:

test_list = [Test([1,2,3]), Test([3,2,1])]
test_list.append(None)
test_array = np.array(test_list, dtype=object)[:-1]

Summary: You take your list, append None, then convert to a numpy array, preventing numpy from converting to a multidimensional array. Finally you just remove the last entry to get the structure you want.

Herwick answered 16/8, 2018 at 14:50 Comment(1)
Very clever, I had the same issue with a collection of lists of list of tuples that I wanted to be represented as ndarray of size 2 containing tuples (and not ndarray of size 3). I needed numpy for the advanced indexing facilities to work on this collection.Repay
B
0

Workaround using pandas

This might not be what OP is looking for. But, just in case if anyone is looking for a way to prevent numpy from constructing multidimensional arrays, this might be useful.


Pass your list to pd.Series and then get the elements as a numpy array using .values.

import pandas as pd

pd.Series([Test([1,2,3]), Test([3,2,1])]).values
# array([Test([1, 2, 3]), Test([3, 2, 1])], dtype=object)

Or, if dealing with numpy arrays:

np.array([np.random.randn(2,2), np.random.randn(2,2)]).shape
(2, 2, 2)

Using pd.Series:

pd.Series([np.random.randn(2,2), np.random.randn(2,2)]).values.shape
#(2,)
Bohlen answered 23/7, 2019 at 5:31 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.