m Smallest values from upper triangular matrix with their indices as a list of tuples

Asked 1/2, 2017 at 1:25 Answered 1/2, 2017 at 2:40

I have a np.ndarray as follows:

[[ inf   1.   3.   2.   1.]
 [ inf  inf   2.   3.   2.]
 [ inf  inf  inf   5.   4.]
 [ inf  inf  inf  inf   1.]
 [ inf  inf  inf  inf  inf]]

Is there a way to get the indices and values of the m smallest items in that nd array? So, if I wanted the 4 smallest it would be

[(0,1,1),(0,4,1),(3,4,1),(0,3,2)]

where (row,col,val) is the notation above.

If there are multiple values then one of them is just randomly chosen. For instance, there were 3 ones and then next smallest is a value 2 but (0,3,2), (1,2,2),(1,4,2) were all possible choices.

Essentially, Can I extract the k smallest values in that format from the upper triangular matrix efficiently (the matrix is much larger than the example above). I tried flattening it, using square form, nsmallest, but am having trouble getting the indices and values to align. Thanks!

See answered 1/2, 2017 at 1:25 Comment(4)

Possible duplicate of #30577875 np.dstack(np.unravel_index(np.argsort(tri.ravel()), arr.shape)) All that's left is zipping the values on. – Confessedly 1/2, 2017 at 1:58

This might help: https://mcmap.net/q/55121/-a-fast-way-to-find-the-largest-n-elements-in-an-numpy-array ... though it's finding the largest K items rather than the smallest. Another, fairly crude, approach would be to use numpy.ndenumerate() to generate a flat list of co-ordinates and values which you feed into a heap before taking the heapq.nsmallest() items. – Vanda 1/2, 2017 at 2:26

Did either of the posted solutions work for you? – Overuse 2/2, 2017 at 6:29

yes just tried yours, nice! – See 2/2, 2017 at 6:36

For an Inf filled array -

r,c = np.unravel_index(a.ravel().argsort()[:4], a.shape)
out = zip(r,c,a[r,c])

For performance, consider using np.argpartition. So, replace a.ravel().argsort()[:4] with np.argpartition(a.ravel(), range(4))[:4].

Sample run -

In [285]: a
Out[285]: 
array([[ inf,   1.,   3.,   2.,   1.],
       [ inf,  inf,   2.,   3.,   2.],
       [ inf,  inf,  inf,   5.,   4.],
       [ inf,  inf,  inf,  inf,   1.],
       [ inf,  inf,  inf,  inf,  inf]])

In [286]: out
Out[286]: [(0, 1, 1.0), (0, 4, 1.0), (3, 4, 1.0), (0, 3, 2.0)]

For a generic case -

R,C = np.triu_indices(a.shape[1],1)
idx = a[R,C].argsort()[:4]
r,c = R[idx], C[idx]
out = zip(r,c,a[r,c])

Sample run -

In [351]: a
Out[351]: 
array([[ 68.,  67.,  81.,  23.,  16.],
       [ 84.,  83.,  20.,  66.,  48.],
       [ 58.,  72.,  98.,  63.,  30.],
       [ 61.,  40.,   1.,  86.,  22.],
       [ 29.,  95.,  38.,  22.,  95.]])
In [352]: out
Out[352]: [(0, 4, 16.0), (1, 2, 20.0), (3, 4, 22.0), (0, 3, 23.0)]

For performance, consider using np.argpartition. So, replace a[R,C].argsort()[:4] with np.argpartition(a[R,C], range(4))[:4].

Overuse answered 1/2, 2017 at 1:54 Comment(1)

list(zip(...)) in python3 – Platyhelminth 24/8, 2021 at 10:12

Something like this works:

import numpy as np
a = np.random.rand(4,4)
tuples = [(ix,iy, a[ix,iy]) for ix, row in enumerate(a) for iy, i in enumerate(row)]
sorted(tuples,key=lambda x: x[2])[:10]

Where k=10 ([:10]) from your question.

If you only want the upper triangular elements you can add a condition to the list comprehension:

a = np.random.rand(4,4)
tuples = [(ix,iy, a[ix,iy]) for ix, row in enumerate(a) for iy, i in enumerate(row) if ix<=iy]
sorted(tuples,key=lambda x: x[2])

Tinhorn answered 1/2, 2017 at 1:44 Comment(0)

If my np.array() is n I could get the n smallest values from it by flattening it (with *np.ndenumerate()), and using the heapq module's .heapify() and .smallest() methods like so:

#!python
flattened = [(y,x) for x,y in np.ndenumerate(n)]
# tuples reversed for natural sorting on values rather than co-ords
heapq.heapify(flattened)
results = heapq.nsmallest(4, flattened)

But this will use plenty of extra memory and will extract the data and co-ordinates out of Numpy's efficient arrays into Python's native lists. So there's probably much better ways to do it more natively in Python.

Vanda answered 1/2, 2017 at 2:40 Comment(2)

I tried this but if the matrix is huge its really slow because of the loop – See 1/2, 2017 at 3:16

Exactly as I said, Thus my other suggestion, https://mcmap.net/q/53753/-how-do-i-get-indices-of-n-maximum-values-in-a-numpy-array ... bottleneck is a compiled extension to Numpy for partial sorting. – Vanda 2/2, 2017 at 5:53

Recommended topics

Hot tags