Efficient way of computing second min value
Asked Answered
D

4

6

Given a matrix, it's easy to compute the value and index of the min value:

A = rand(10);
[value, index] = min(A(:));

However I would also like to recover the second min value (idem for max).

I can of course take any of this two approaches:

  1. Converting A to a vector and sorting it.

    PROS: I can then recover the second, third... n minimum value

    CONS: If A is large, sorting is expensive

  2. Once the min location of A is located, I can replace this value by a large one (eg: Inf) and then run min again.

    PROS: Cheaper than sort

    CONS: I must modify my matrix (and save the modified value in an aux variable). Also re-running min is costly on a large matrix.

I'm wondering if there is a better solution:

When computing min the algorithm has to keep track of the min value found so far, until a new value has a lower value (then we update the value). If instead we keep track of the last n min values found so far will allow to recover the minimum n values.

I can implement this, but I'm wondering if it's the best approach or if it's already implemented.

Diaz answered 27/2, 2017 at 10:58 Comment(5)
if you use sort you can have your 2x lowest min and 2x largest max in one single expensive sort operation. Otherwise you have to run 2x min and 2x max. Make a test and find which approach is less expensive.Evangelicalism
True, but if min is able to keep track of the n lower values it won't be expensive at all. I think this would be a nice feature. You can give how many min values you want to recover as a optional parameter.Diaz
How big are we talking? sort is quite fast. Also, you can always (if you have it) gpuarray it. Reduces are very fast in GPUsTrap
Have a look at thisSweet
Also check Bruno's solutions (one is mex): mathworks.com/matlabcentral/newsreader/view_thread/309300Disallow
R
3

I don't know in which case it would be less expensive than sorting, but an easy, but not so fast way would be to use the following code. I may be wrong, but I don't think you can get faster with build-in functions if you just want the first and the second min.

A = rand(10);
[firstMin, firstMinIndex] = min(A(:));
secondMin = min(A(A~=firstMin));
secondMinIndex = find(A==secondMin); % slow, but use only if you need the index

Here, you go through the matrix two times more, one for the boolean operation, and one for the second min.

After some testing on 2000x2000 and 4000x4000 random matrix, it seems that this code snipset is around 3.5 time faster than the sort function applied on the same matrix.

If you really need more efficiency, you'd have to write your own mex routine, with which you can theoretically get the two values in n+log n-2 comparison, as explained in the link provided by @luismendotomas.

Hope this help !

Ruel answered 27/2, 2017 at 13:10 Comment(3)
If you want only the 2nd min, you can use the index from the first call without going again through the matrix, like this: [2nd_min,2nd_ind]=min(A(setdiff(1:numel(A),1st_ind)));. but if you want n minimum values it gets more complicated, and you better use an appropriate algorithm...Elevenses
Adiel, In principle you should be right. But in practice matlab is way faster with logical indexing.... setdiff is also rather slow. On my computer logical indexing around 6 times faster than the code you propose !Ruel
Ok, nice to know.Elevenses
T
0

In a single pass:

a = [53 53 49 49 97 75 4 22 4 37];

first = Inf;
second = Inf;

for i = 1:1:numel(a)
    if (a(i) < first)
        second = first;
        first = a(i);
    elseif (a(i) < second && a(i) ~= first)
        second = a(i);
    end
end

fprintf('First smallest %d\n', first);
fprintf('Second smallest %d\n', second);

You can remove the a(i) ~= first condition if you rather have 4, 4 as output instead of 4, 23

Also, see this SO question

Tanbark answered 27/2, 2017 at 14:17 Comment(1)
Yes, this is what I proposed at the end of the question, but I wanted to avoid using loops since they are slow.Diaz
F
0

As already mentioned I suppose the best (read: "most efficient") method is to implement the methods from @luismendotomas link.

However, if you want to avoid doing too much programming yourself, then you could apply some k-nearest neighbours algorithm, given you have a lower bound on your data, e.g. if all your data points are positive, you can find the 2 nearest neighbours to 0. Though I am not sure whether this is faster than your initial suggestions or not.

For one k-nearest neighbour algorithm see e.g. this

Fungi answered 27/2, 2017 at 14:22 Comment(0)
T
0

beesleep has already pointed out that method 2 (by computing the minimum twice) is more efficient that method 1 (by sorting). However the implementation provided in the answer to compute the index of the second minimum via find is, as mentioned, very inefficient.

In fact, to get the index of the second minimum, it is ca. 10x faster to set the first minimum value to inf (as suggested in the question) and then get the index of the second minimum from the min function (as opposed to using find)

[firstMin, firstMinIndex] = min(A(:));
A(firstMinIndex) = inf;
[secondMin, secondMinIndex] = min(A(:));

Here is the code which I used to compare this implementation to the one suggested by beesleep:

for i = 1:10

    A = rand(10000);

    tic
    [firstMin, firstMinIndex] = min(A(:));
    secondMin = min(A(A~=firstMin));
    secondMinIndex = find(A==secondMin); % slow, but use only if you need the index
    t1(i) = toc;

    tic
    [firstMin, firstMinIndex] = min(A(:));
    A(firstMinIndex) = inf;
    [secondMin, secondMinIndex] = min(A(:));
    t2(i) = toc;

end

disp(mean(t1) / mean(t2))
Telethermometer answered 3/3, 2017 at 2:1 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.