Is there a "bounding box" function (slice with non-zero values) for a ndarray in NumPy?
Asked Answered
I

3

24

I am dealing with arrays created via numpy.array(), and I need to draw points on a canvas simulating an image. Since there is a lot of zero values around the central part of the array which contains the meaningful data, I would like to "trim" the array, erasing columns that only contain zeros and rows that only contain zeros.

So, I would like to know of some native numpy function or even a code snippet to "trim" or find a "bounding box" to slice only the data-containing part of the array.

(since it is a conceptual question, I did not put any code, sorry if I should, I'm very fresh to posting at SO.)

Thanks for reading

Infusive answered 26/1, 2011 at 18:14 Comment(1)
#31401269 see bbox2 function... MUCH faster, if there are many rows / columns entirely filled with zeros and only a small amount of clustered data.Borrego
I
17

The code below, from this answer runs fastest in my tests:

def bbox2(img):
    rows = np.any(img, axis=1)
    cols = np.any(img, axis=0)
    ymin, ymax = np.where(rows)[0][[0, -1]]
    xmin, xmax = np.where(cols)[0][[0, -1]]
    return img[ymin:ymax+1, xmin:xmax+1]

The accepted answer using argwhere worked but ran slower. My guess is, it's because argwhere allocates a giant output array of indices. I tested on a large 2D array (a 1024 x 1024 image, with roughly a 50x100 nonzero region).

Imagine answered 24/6, 2017 at 8:12 Comment(4)
I found this answer way more pythonic! Thanks!Infusive
Caution, this code may generate an error in the edge case of a completely black image. You must verify that neither of the two np.where() calls returns an empty array.Lucchesi
This is great! Any idea on how to extend it with periodic boundary conditions?Amortizement
@Amortizement I'm not sure I understand what you mean by periodic boundary conditions. But if you're looking to find multiple "blobs" of contiguous True values, an approach like this answer probably won't work. Instead, to find arbitrary blobs, I'd use the OpenCV connectedComponents() function: docs.opencv.org/3.4/d3/dc0/…Imagine
S
24

This should do it:

from numpy import array, argwhere

A = array([[0, 0, 0, 0, 0, 0, 0],
           [0, 0, 0, 0, 0, 0, 0],
           [0, 0, 1, 0, 0, 0, 0],
           [0, 0, 1, 1, 0, 0, 0],
           [0, 0, 0, 0, 1, 0, 0],
           [0, 0, 0, 0, 0, 0, 0],
           [0, 0, 0, 0, 0, 0, 0]])

B = argwhere(A)
(ystart, xstart), (ystop, xstop) = B.min(0), B.max(0) + 1 
Atrim = A[ystart:ystop, xstart:xstop]
Snowfield answered 26/1, 2011 at 19:29 Comment(4)
Nice! Just on a readability note, you could do (ystart, xstart), (ystop, xstop) = B.min(0), B.max(0) + 1 and then simply index A with Atrim = a[ystart:ystop, xstart:xstop]. Of course, it's entirely equivalent, but I find it more readable, at any rate.Norfolk
This one was fine, the example you used is exactely the typical array I would be using (just larger). I didn't know the function argwhere, will do my homework now. Thanks!Infusive
is there a way to do it for any array dimension ?Starchy
@Naomi Sure. Just extend the patterns in this example by adding a zstart after the ystart and xstart for 3 dims and keep adding more for higher dimensions.Snowfield
I
17

The code below, from this answer runs fastest in my tests:

def bbox2(img):
    rows = np.any(img, axis=1)
    cols = np.any(img, axis=0)
    ymin, ymax = np.where(rows)[0][[0, -1]]
    xmin, xmax = np.where(cols)[0][[0, -1]]
    return img[ymin:ymax+1, xmin:xmax+1]

The accepted answer using argwhere worked but ran slower. My guess is, it's because argwhere allocates a giant output array of indices. I tested on a large 2D array (a 1024 x 1024 image, with roughly a 50x100 nonzero region).

Imagine answered 24/6, 2017 at 8:12 Comment(4)
I found this answer way more pythonic! Thanks!Infusive
Caution, this code may generate an error in the edge case of a completely black image. You must verify that neither of the two np.where() calls returns an empty array.Lucchesi
This is great! Any idea on how to extend it with periodic boundary conditions?Amortizement
@Amortizement I'm not sure I understand what you mean by periodic boundary conditions. But if you're looking to find multiple "blobs" of contiguous True values, an approach like this answer probably won't work. Instead, to find arbitrary blobs, I'd use the OpenCV connectedComponents() function: docs.opencv.org/3.4/d3/dc0/…Imagine
G
0

Something like:

empty_cols = sp.all(array == 0, axis=0)
empty_rows = sp.all(array == 0, axis=1)

The resulting arrays will be 1D boolian arrays. Loop on them from both ends to find the 'bounding box'.

Giselle answered 26/1, 2011 at 19:11 Comment(3)
looping over numpy arrays should be avoidedSnowfield
The loop is only 1D, so order n, not n^2. Not that big of a deal.Giselle
You are right about the order and you don't even require a loop over the entire array width, but the python loop contains all kinds of extra steps like type-checking. In this 1D example: scipy.org/… The python loop runs 25X slower to accomplish the same task! Without knowing the size or quantity of the images or the application of the algorithm (computer vision?), I can't say how big a deal that kind of speedup is.Snowfield

© 2022 - 2024 — McMap. All rights reserved.