Is there a way to get the top k values per row of a numpy array (Python)?
Asked Answered
A

5

6

Given a numpy array of the form below:

x = [[4.,3.,2.,1.,8.],[1.2,3.1,0.,9.2,5.5],[0.2,7.0,4.4,0.2,1.3]]

is there a way to retain the top-3 values in each row and set others to zero in python (without an explicit loop). The result in the case of the example above would be

x = [[4.,3.,0.,0.,8.],[0.,3.1,0.,9.2,5.5],[0.0,7.0,4.4,0.0,1.3]]

Code for one example

import numpy as np
arr = np.array([1.2,3.1,0.,9.2,5.5,3.2])
indexes=arr.argsort()[-3:][::-1]
a = list(range(6))
A=set(indexes); B=set(a)
zero_ind=(B.difference(A)) 
arr[list(zero_ind)]=0

The output:

array([0. , 0. , 0. , 9.2, 5.5, 3.2])

Above is my sample code (with many lines) for a 1-D numpy array. Looping through each row of a numpy array and performing this same computation repeatedly would be quite expensive. Is there a simpler way?

Albacore answered 19/12, 2019 at 2:29 Comment(2)
What is the problem? Where is your code?Semitic
Does the following help? #13070961Java
N
4

Here is a fully vectorized code without third party outside numpy. It is using numpy's argpartition to efficiently find the k-th values. See for instance this answer for other use cases.

def truncate_top_k(x, k, inplace=False):
    m, n = x.shape
    # get (unsorted) indices of top-k values
    topk_indices = numpy.argpartition(x, -k, axis=1)[:, -k:]
    # get k-th value
    rows, _ = numpy.indices((m, k))
    kth_vals = x[rows, topk_indices].min(axis=1)
    # get boolean mask of values smaller than k-th
    is_smaller_than_kth = x < kth_vals[:, None]
    # replace mask by 0
    if not inplace:
        return numpy.where(is_smaller_than_kth, 0, x)
    x[is_smaller_than_kth] = 0
    return x    
Nicolnicola answered 19/12, 2019 at 7:28 Comment(0)
I
1

Use np.apply_along_axis to apply a function to 1-D slices along a given axis

import numpy as np

def top_k_values(array):
    indexes = array.argsort()[-3:][::-1]
    A = set(indexes)
    B = set(list(range(array.shape[0])))
    array[list(B.difference(A))]=0
    return array

arr = np.array([[4.,3.,2.,1.,8.],[1.2,3.1,0.,9.2,5.5],[0.2,7.0,4.4,0.2,1.3]])
result = np.apply_along_axis(top_k_values, 1, arr)
print(result)

Output

[[4.  3.  0.  0.  8. ]
 [0.  3.1 0.  9.2 5.5]
 [0.  7.  4.4 0.  1.3]]
Imogen answered 19/12, 2019 at 3:32 Comment(0)
L
1
def top_k(arr, k, axis = 0):
    top_k_idx =  = np.take_along_axis(np.argpartition(arr, -k, axis = axis), 
                                      np.arange(-k,-1), 
                                      axis = axis)  # indices of top k values in axis
    out = np.zeros.like(arr)                        # create zero array
    np.put_along_axis(out, top_k_idx,               # put idx values of arr in out
                      np.take_along_axis(arr, top_k_idx, axis = axis), 
                      axis = axis)
    return out

This should work for arbitrary axis and k, but does not work in-place. If you want in-place it's a bit simpler:

def top_k(arr, k, axis = 0):
    remove_idx =  = np.take_along_axis(np.argpartition(arr, -k, axis = axis), 
                                           np.arange(arr.shape[axis] - k), 
                                           axis = axis)    # indices to remove
    np.put_along_axis(out, remove_idx, 0, axis = axis)     # put 0 in indices
Ladin answered 19/12, 2019 at 8:43 Comment(0)
A
0

Here is an alternative that use a list comprehension to look thru your array and applying the keep_top_3 function

import numpy as np
import heapq

def keep_top_3(arr): 
    smallest = heapq.nlargest(3, arr)[-1]  # find the top 3 and use the smallest as cut off
    arr[arr < smallest] = 0 # replace anything lower than the cut off with 0
    return arr 

x = [[4.,3.,2.,1.,8.],[1.2,3.1,0.,9.2,5.5],[0.2,7.0,4.4,0.2,1.3]]
result = [keep_top_3(np.array(arr)) for arr  in x]

I hope this helps :)

Adscititious answered 19/12, 2019 at 3:15 Comment(0)
B
0

I was lead here looking for a function that does not retain equal values in the top-k result. So top-2 for the input would be:

input: [[1, 4, 0, 4, 0],
        [1, 0, 2, 1, 3],
        [1, 0, 1, 2, 1],
        [1, 3, 3, 2, 4],
        [4, 0, 3, 1, 0],
        [2, 3, 1, 4, 3],
        [4, 1, 0, 0, 4],
        [0, 2, 0, 1, 0],
        [2, 2, 1, 3, 2],
        [0, 2, 0, 1, 1]])

output: [[0., 4., 0., 4., 0.],
         [0., 0., 2., 0., 3.],
         [0., 0., 0., 2., 1.], # <- notice 1's set to 0
         [0., 3., 0., 0., 4.],
         [4., 0., 3., 0., 0.],
         [0., 3., 0., 4., 0.],
         [4., 0., 0., 0., 4.],
         [0., 2., 0., 1., 0.],
         [0., 0., 0., 3., 2.],
         [0., 2., 0., 0., 1.]])

I slightly modified the solution from Emile to result in:

def truncate_top_k(x, k):
    m, n = x.shape
    # get (unsorted) indices of top-k values
    topk_indices = numpy.argpartition(x, -k, axis=1)[:, -k:]
    # get the indices for the topk values (ties broken)
    rows, _ = numpy.indices((m, k))
    # mask out the unselected indices
    mask = numpy.zeros((m, n))
    mask[rows, topk_indices] = 1
    x = x * mask
    return x
Binnie answered 22/4 at 19:24 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.