Performance of std::partial_sort() versus std::sort() when sorting the whole range?

Asked 2/8, 2017 at 8:21 Answered 2/8, 2017 at 10:8

Solved c++performance sorting partial-sort

Is there a significant difference between the following two approaches? Way 1 uses sort or partial_sort, depending on the size of the vector while way 2 always uses partial_sort. I find way 2 more appealing because my predicate is a bit more complicated than in the example, so I don't want to repeat it. But I wonder if partial_sort performs worse than sort because it is not meant to be used to sort the whole range, which is why I tend to use way 1.

int main()
{
  std::vector<double> vec;
  vec.push_back(1.0);
  vec.push_back(3.0);
  vec.push_back(2.0);
  vec.push_back(5.0);
  vec.push_back(4.0);
  vec.push_back(9.0);
  const size_t numBest = 3;
  const size_t numTotal= vec.size();

#if WAY1
  if (numTotal < numBest)
  {
    std::sort(vec.begin(), vec.end(), std::not2(std::less<double>()));
  }
  else
  {
    std::partial_sort(vec.begin(), vec.begin() + numBest, vec.end(), std::not2(std::less<double>()));
    vec.resize(numBest);
  }
#elif WAY2
  {
    const size_t numMiddle = numTotal < numBest ? numTotal : numBest;
    std::partial_sort(vec.begin(), vec.begin() + numMiddle, vec.end(), std::not2(std::less<double>()));
    vec.resize(numMiddle);
  }
#endif

  // now vec contains the largest numBest results.
  return 0;
}

Some testing yielded that partial_sort is significantly worse (factor of 4 in my usecase) than sort if if has to sort the whole range. This indicates that way 1 is to be preferred. It seems that partial_sort is only meant for sorting a small fraction of the whole range. I tested in Visual Studio 2010.

Koss answered 2/8, 2017 at 8:21 Comment(1)

1) You don't have to repeat the predicate, just store it in some variable. 2) Use the simpler, more readable version and complain to your compiler vendor if that's slower. 3) visual studio 2010 is not really recent... – Lowermost 2/8, 2017 at 8:35

According to sgi doc, partial_sort uses heapsort, sort uses introsort:

partial_sort(first, last, last) has the effect of sorting the entire range [first, last), just like sort(first, last). They use different algorithms, however: sort uses the introsort algorithm (a variant of quicksort), and partial_sort uses heapsort. See section 5.2.3 of Knuth (D. E. Knuth, The Art of Computer Programming. Volume 3: Sorting and Searching. Addison-Wesley, 1975.), and J. W. J. Williams (CACM 7, 347, 1964). Both heapsort and introsort have complexity of order N log(N), but introsort is usually faster by a factor of 2 to 5.

So, it is normal partial_sort is 4 times slower than sort.

I have checked my VS2017 library, and found the implementation of partial_sort and sort. And it is similar with SGI.

partial_sort

template<class _RanIt,
    class _Pr> inline
void _Partial_sort_unchecked(_RanIt _First, _RanIt _Mid, _RanIt _Last,
        _Pr& _Pred)
{       // order [_First, _Last) up to _Mid, using _Pred
    if (_First == _Mid)
        return; // nothing to do, avoid violating _Pop_heap_hole_unchecked preconditions
    _Make_heap_unchecked(_First, _Mid, _Pred);
    for (_RanIt _Next = _Mid; _Next < _Last; ++_Next)
        if (_DEBUG_LT_PRED(_Pred, *_Next, *_First))
        {       // replace top with new largest
            _Iter_value_t<_RanIt> _Val = _STD move(*_Next);
            _Pop_heap_hole_unchecked(_First, _Mid, _Next, _STD move(_Val), _Pred);
        }
    _Sort_heap_unchecked(_First, _Mid, _Pred);
}

sort

template<class _RanIt,
    class _Diff,
    class _Pr> inline
void _Sort_unchecked1(_RanIt _First, _RanIt _Last, _Diff _Ideal, _Pr& _Pred)
{       // order [_First, _Last), using _Pred
    _Diff _Count;
    while (_ISORT_MAX < (_Count = _Last - _First) && 0 < _Ideal)
    {   // divide and conquer by quicksort
        pair<_RanIt, _RanIt> _Mid =
            _Partition_by_median_guess_unchecked(_First, _Last, _Pred);
        _Ideal /= 2, _Ideal += _Ideal / 2;      // allow 1.5 log2(N) divisions

        if (_Mid.first - _First < _Last - _Mid.second)
        {       // loop on second half
            _Sort_unchecked1(_First, _Mid.first, _Ideal, _Pred);
            _First = _Mid.second;
        }
        else
        {       // loop on first half
            _Sort_unchecked1(_Mid.second, _Last, _Ideal, _Pred);
            _Last = _Mid.first;
        }
    }

    if (_ISORT_MAX < _Count)
    {   // heap sort if too many divisions
        _Make_heap_unchecked(_First, _Last, _Pred);
        _Sort_heap_unchecked(_First, _Last, _Pred);
    }
    else if (2 <= _Count)
        _Insertion_sort_unchecked(_First, _Last, _Pred);        // small
}

Dulcinea answered 2/8, 2017 at 8:25 Comment(5)

Interesting - I wonder if this is true across the board (i.e. all implementations). I recall a talk about variations between sort's performance between compilers. – Pecoraro 2/8, 2017 at 8:31

Except whatever the old SGI implementation of the STL does isn't necessarily the same as standard library used by OP's implementation. – Expeditionary 2/8, 2017 at 8:45

I find it strange that a compiler vendor does not check for partial_sort(first, last, last) and uses sort in this case. Presumably, they want to avoid the branching. – Koss 2/8, 2017 at 8:48

Actually, heap-sort is a suboptimal choice for partial_sort: It's inner loop is roughly twice as long as the inner loop of quicksort, and half of its work is expended into building the heap, which includes all the elements in the entire range. I. e., the time a heap-sort base partial_sort takes is roughly (1 + n/N)*T where T is the time quicksort takes to sort the entire range, n is the length of the output range, and N is the length of the input range. It would have been better to just sort the entire range with quicksort instead. – Billi 2/8, 2017 at 11:23

sgi doc link takes me to HP site? – Loaf 18/1, 2023 at 4:22

Nothing requires partial_sort to be implemented in a certain way, except the guarantees of complexity

25.4.1.3 partial_sort [partial.sort]

template void partial_sort(RandomAccessIterator first, RandomAccessIterator middle, RandomAccessIterator last); template void partial_sort(RandomAccessIterator first, RandomAccessIterator middle, RandomAccessIterator last, Compare comp);

1 Effects: Places the first middle - first sorted elements from the range [first,last) into the range [first,middle). The rest of the elements in the range [middle,last) are placed in an unspecified order.

2 Requires: RandomAccessIterator shall satisfy the requirements of ValueSwappable (17.6.3.2). The type of *first shall satisfy the requirements of MoveConstructible (Table 20) and of MoveAssignable (Table 22).

3 Complexity: It takes approximately (last - first) * log(middle - first) comparisons

An alternative implementation could be

std::nth_element - average linear time
followed by
std::sort - on the reduced range begin()-nth (n log n)

Besmear answered 2/8, 2017 at 10:8 Comment(2)

Why do you suppose heap sort is chosen over this alternative? If std::nth_element is fast, it seems that this would be simpler. Also this shows that std::partial_sort is not orthogonal. – Loaf 18/1, 2023 at 4:19

@Loaf heap_sort is slower than intro_sort, but if you just want to get the K highest sorted heap_sort would be faster. There is a cppcon(?) video on this topic that is worth seeing. – Besmear 19/1, 2023 at 17:27

partial_sort

sort

Recommended topics

Hot tags