Find largest and second largest element in a range
Asked Answered
B

12

8

How do I find the above without removing the largest element and searching again? Is there a more efficient way to do this? It does not matter if the these elements are duplicates.

Besom answered 11/9, 2009 at 19:7 Comment(0)
R
23
for (e: all elements) {
 if (e > largest) {
   second = largest;
   largest = e;
 } else if (e > second) {
   second = e;
 }
}

You could either initialize largest and second to an appropriate lower bound, or to the first two items in the list (check which one is bigger, and don't forget to check if the list has at least two items)

Ridgepole answered 11/9, 2009 at 19:13 Comment(4)
Simple to code ... what more can you ask for? Admittedly doing a search is less "complex" but will almost certainly be slower due to memory access fun.Grogshop
sorting is per definition slower, because you have to inspect elements at least O(n log n) times. This is guaranteed to only inspect elements O(n) times (where n is #elements of course). Only if the operation is to be repeated on the same list sorting becomes more efficient.Ridgepole
This is what I ended up implementing (should've probably mentioned that in the OP) but I was looking for something different (and possibly faster).Besom
Sorting can be done in less lines, and retrieving largest/smallest elements is very efficient on an already sorted list, but you'll have to pay a larger cost up front to get it sorted. This however, is the most efficient possible solution if you only ever search for a set (in this case 2) number of largest items.Ridgepole
P
23

using partial_sort ?

std::partial_sort(aTest.begin(), aTest.begin() + 2, aTest.end(), Functor);

An Example:

std::vector<int> aTest;

    aTest.push_back(3);
    aTest.push_back(2);
    aTest.push_back(4);
    aTest.push_back(1);


    std::partial_sort(aTest.begin(), aTest.begin()+2,aTest.end(), std::greater<int>());

    int Max = aTest[0];
int SecMax = aTest[1];
Phage answered 11/9, 2009 at 19:12 Comment(8)
+1 I didn't know about partial_Sort. BUT doesn't work if he wants to keep the order of the original list.Eddra
You can use partial_sort_copy for thatPhage
won't this get you the smallest elements? It's easy enough to fix by using a different comparison operator. +1Conference
And if you're streaming a large number of values through, say from an input_iterator and don't want to store the entire range, only the top elements, then the heap operations -- make_heap, push_heap and sort_heap -- are the way to go.Amerigo
What's the difference from just std::sort(aTest.begin(), aTest.end());? aTest[0] - largest, aTest[1] - second largest.Grus
partial_sort has the special ability to stop sorting when only the first n elements need to be sorted. (Second argument)Phage
Any idea about the time complexity?Besom
From Josuttis --- Partial_sort() guarantees n * log(n) complexity in any case. Normally between linear and n-log-n (approximately numberOfElements*log(numberOfSortedElements) comparisons).Phage
R
23
for (e: all elements) {
 if (e > largest) {
   second = largest;
   largest = e;
 } else if (e > second) {
   second = e;
 }
}

You could either initialize largest and second to an appropriate lower bound, or to the first two items in the list (check which one is bigger, and don't forget to check if the list has at least two items)

Ridgepole answered 11/9, 2009 at 19:13 Comment(4)
Simple to code ... what more can you ask for? Admittedly doing a search is less "complex" but will almost certainly be slower due to memory access fun.Grogshop
sorting is per definition slower, because you have to inspect elements at least O(n log n) times. This is guaranteed to only inspect elements O(n) times (where n is #elements of course). Only if the operation is to be repeated on the same list sorting becomes more efficient.Ridgepole
This is what I ended up implementing (should've probably mentioned that in the OP) but I was looking for something different (and possibly faster).Besom
Sorting can be done in less lines, and retrieving largest/smallest elements is very efficient on an already sorted list, but you'll have to pay a larger cost up front to get it sorted. This however, is the most efficient possible solution if you only ever search for a set (in this case 2) number of largest items.Ridgepole
L
7

nth_element(begin, begin+n,end,Compare) places the element that would be nth (where "first" is "0th") if the range [begin, end) were sorted at position begin+n and makes sure that everything from [begin,begin+n) would appear before the nth element in the sorted list. So the code you want is:

nth_element(container.begin(),
            container.begin()+1,
            container.end(),
            appropriateCompare);

This will work well in your case, since you're only looking for the two largest. Assuming your appropriateCompare sorts things from largest to smallest, the second largest element with be at position 1 and the largest will be at position 0.

Lee answered 11/9, 2009 at 19:27 Comment(0)
B
2

Lets assume you mean to find the two largest unique values in the list.

If the list is already sorted, then just look at the second last element (or rather, iterate from the end looking for the second last value).

If the list is unsorted, then don't bother to sort it. Sorting is at best O(n lg n). Simple linear iteration is O(n), so just loop over the elements keeping track:

v::value_type second_best = 0, best = 0;
for(v::const_iterator i=v.begin(); i!=v.end(); ++i)
   if(*i > best) {
     second_best = best;
     best = *i;
   } else if(*i > second_best) {
     second_best = *i;
   }

There are of course other criteria, and these could all be put into the test inside the loop. However, should you mean that two elements that both have the same largest value should be found, you have to consider what happens should three or more elements all have this largest value, or if two or more elements have the second largest.

Baal answered 11/9, 2009 at 19:13 Comment(3)
Except that this does not handle duplicates in the listTorques
Need an else block if second_best < *i < bestVermicelli
I don't need to find unique values, duplicates are fine - I've updated this in the question.Besom
C
2

The optimal algorithm shouldn't need more than 1.5 * N - 2 comparisons. (Once we've decided that it's O(n), what's the coefficient in front of N? 2 * N comparisons is less than optimal).

So, first determine the "winner" and the "loser" in each pair - that's 0.5 * N comparisons.

Then determine the largest element by comparing winners - that's another 0.5 * N - 1 comparisons.

Then determine the second-largest element by comparing the loser of the pair where the largest element came from against the winners of all other pairs - another 0.5 * N - 1 comparisons.

Total comparisons = 1.5 N - 2.

Conciseness answered 11/9, 2009 at 19:39 Comment(0)
V
1

The answer depends if you just want the values, or also iterators pointing at the values.

Minor modification of @will answer.

v::value_type second_best = 0, best = 0;
for(v::const_iterator i=v.begin(); i!=v.end(); ++i)
{
   if(*i > best)
   {
     second_best = best;
     best = *i;
   }
   else if (*i > second_best)
   {
     second_best = *i;
   }
}
Vermicelli answered 11/9, 2009 at 19:25 Comment(0)
E
0

Create a sublist from n..m, sort it descending. Then grab the first two elements. Delete these elements from the orginal list.

Eddra answered 11/9, 2009 at 19:11 Comment(0)
E
0

You can scan the list in one pass and save the 1st and 2nd values, that has a O(n) efficiency while sorting is O(n log n).

EDIT:
I think that partial sort is O(n log k)

Excusatory answered 11/9, 2009 at 19:18 Comment(0)
M
0

Untested but fun:

template <typename T, int n>
class top_n_functor : public unary_function<T, void>
{

  void operator() (const T& x) {
     auto f = lower_bound(values_.begin(), values_.end(), x);

     if(values_.size() < n) {
         values_.insert(f, x);
         return;
     }

     if(values_.begin() == f)
          return;

     auto removed = values_.begin();
     values_.splice(removed, values_, removed+1, f);

     *removed = x;
  }

  std::list<T> values() {
     return values_;
  }
private:
   std::list<T> values_;
};

int main()
{
  int A[] = {1, 4, 2, 8, 5, 7};
  const int N = sizeof(A) / sizeof(int);

  auto vals = for_each(A, A + N, top_n_functor<int,2>()).values();

  cout << "The top is " << vals.front()
       << " with second place being " << *(vals.begin()+1) << endl;
}
Minuscule answered 11/9, 2009 at 19:40 Comment(0)
T
0

If the largest is the first element, search for the second largest in [largest+1,end). Otherwise search in [begin,largest) and [largest+1,end) and take the maximum of the two. Of course, this has O(2n), so it's not optimal.

If you have random-access iterators, you could do as quick sort does and use the ever-elegant recursion:

template< typename T >
std::pair<T,T> find_two_largest(const std::pair<T,T>& lhs, const std::pair<T,T>& rhs)
{
  // implementation finding the two largest of the four values left as an exercise :) 
}

template< typename RAIter >
std::pair< typename std::iterator_traits<RAIter>::value_type
         , typename std::iterator_traits<RAIter>::value_type > 
find_two_largest(RAIter begin, RAIter end)
{
  const ptr_diff_t diff = end-begin;
  if( diff < 2 )
    return std::make_pair(*begin, *begin);
  if( diff < 3 )
    return std::make_pair(*begin, *begin+1);
  const RAIter middle = begin + (diff)/2;
  typedef std::pair< typename std::iterator_traits<RAIter>::value_type
                   , typename std::iterator_traits<RAIter>::value_type > 
    result_t;
  const result_t left = find_two_largest(begin,middle);
  const result_t right = find_two_largest(middle,end);

  return find_two_largest(left,right);
}

This has O(n) and shouldn't make more comparisons than NomeN's implementation.

Trinitrophenol answered 11/9, 2009 at 20:1 Comment(0)
R
0

top k is usually a bit better than n(log k)

 template <class t,class ordering>
 class TopK {
 public:
    typedef std::multiset<t,ordering,special_allocator> BEST_t;
    BEST_t best;
    const size_t K;
    TopK(const size_t k)
        : K(k){
    } 
    const BEST_t& insert(const t& item){
        if(best.size()<k){
            best.insert(item);
            return best;
        }
        //k items in multiset now
        //and here is why its better - because if the distribution is random then
        //this and comparison above are usually the comparisons that is done; 
        if(compare(*best.begin(),item){//item better than worst
           erase(begin());//the worst
           best.insert(item); //log k-1 average as only k-1 items in best
        } 
        return best;
    } 
    template <class it>
    const BEST_t& insert(it i,const it last){
        for(;i!=last;++i){
            insert(*i);    
        }
        return best;
    }
  };

Of course the special_allocator can in essence be just an array of k multiset value_types and a list of those nodes (which typically has nothing on it as the other k are in use in the multiset until its time to put a new one in and we erase and then immediate ly reuse it. Good to have this or else the memory alloc/free in std::multiset and the cache line crap kills ya. Its a (very) tiny bit of work to give it static state without violating STL allocator rules.

Not as good as a specialized algo for exactly 2 but for fixed k<<n, I would GUESS (2n+delta*n) where delta is small - my DEK ACP vol3 S&S is packed away and an estimate on delta is a bit more work that I want to do.

average worst is I would guess n(log(k-1) + 2) when in opposite order and all distinct.

best is 2n + k(log k) for the k best being the first

Romanaromanas answered 25/9, 2009 at 20:50 Comment(0)
P
0

I think you could implement a custom array and overload the indexed get/set methods of elements. Then on every set call, compare the new value with two fields for the result. While this makes setter slower, it benefits from caching or even registers. Then its a no op to get the result. This must be faster if you populate array only once per finding maximums. But if array is modified frequently, then it is slower.

If array is used in vectorized loops, then it gets harder to implement as you have to use avx/sse optimized max methods inside setter.

Pupil answered 1/3, 2022 at 13:22 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.