The other answers are good. I wanted to add a little about how minmax_element
necessarily works, however, which also helps to explain why it (usually) works better than running min_element
and max_element
separately, and talk about some specific cases where it doesn't perform better.
If we think about a naive implementation, you would maintain a max-value and min-value (and their corresponding iterators) and simply iterate through the range, comparing each value with both the min-value and max-value and adjusting either as necessary. However, this would give you a total of 2N comparisons; while it might well perform better than iterating through the list twice (due to better use of locality), the specification calls for (roughly) 3/2 N comparisons. How is that possible then?
It works by processing pairs rather than individual items. Taking the first two items in the range (#0 and #1), we can compare them and assign the largest to max-value and the smallest to min-value. Then, we compare the next two items (#3 and #4) to decide which of them is larger; we compare the larger one to max-value, and the smaller one to min-value, and update max-value/min-value as necessary. We then repeat this process with each additional pair (#5 and #6, then #7 and #8 and so on).
So, each pair requires three comparisons - with each other, then the highest with the current maximum, and the lowest with the current minimum. This cuts down the number of comparisons required to 3/2 N!
As per comments below, however, it should be noted that this "improved" algorithm tends to produce worse performance on modern processors than the naive version when using a type (or comparator) where the comparison is cheap - notably, with a range over a vector<int>
or similar: the comparison between the two elements of each pair has an unpredictable result, leading to branch prediction failure in the processor (though this is only true if the data is more-or-less randomly ordered); current compilers will not always convert branches into conditional transfers, as they potentially could. Also, it is harder for a compiler to vectorise the more complex algorithm.
In theory, I think, a C++ library implementation could provide an overload for the minmax_element
function which used the naive algorithm for primitive (int
etc) element types with the default comparator. While the standard mandates a bound to the number of comparisons, if the effect of those comparisons cannot be observed then the number actually computed is unimportant, so long as the time complexity is the same (which it is - O(N)
in both cases). However, while this might give better performance with randomly ordered data, it might produce worse performance when the data is ordered.
With all the above in mind, a simple test case (below) shows an interesting result: for randomly ordered data, using min_element
and max_element
separately can in fact be slightly faster than using minmax_element
. However, for sorted data, minmax_element
is much faster than using min_element
and max_element
separately. On my system (Haswell processor) the below (compiled with gcc -O3 -std=c++11 -march=native
, GCC version 5.4) a sample run shows 692 milliseconds for min/max separately and 848 for minmax combined. There is of course some variation between runs but these values seem typical.
Note that:
- The difference in performance is small enough that it is unlikely to be a dominating factor in a real-world program;
- The difference is dependent on compiler optimisation; in the future, the results may well reverse;
- for more complex data types (or rather for more complex comparators) the results will likely reverse, since in that case the fewer comparisons will likely be a significant improvement.
- when the sample data is ordered instead of random (replace
v.push_back(r(gen))
with v.push_back(i)
in the program below) the performance is very different: for separate min/max, it's around 728 milliseconds, whereas for combined minmax, it goes down to 246 milliseconds.
The code:
#include <iostream>
#include <vector>
#include <algorithm>
#include <random>
#include <chrono>
constexpr int numEls = 100000000;
void recresult(std::vector<int> *v, int min, int max)
{
// Make sure the compiler doesn't optimize out the values:
__asm__ volatile (
""
:
: "rm"(v), "rm"(min), "rm"(max)
);
}
int main(int argc, char **argv)
{
using namespace std;
std::mt19937 gen(0);
uniform_int_distribution<> r(0, 100000);
vector<int> v;
for (int i = 0; i < numEls; i++) {
v.push_back(r(gen));
}
// run once for warmup
int min = *min_element(v.begin(), v.end());
int max = *max_element(v.begin(), v.end());
recresult(&v, min, max);
// min/max separately:
{
auto starttime = std::chrono::high_resolution_clock::now();
for (int i = 0; i < 5; i++) {
int min = *min_element(v.begin(), v.end());
int max = *max_element(v.begin(), v.end());
recresult(&v, min, max);
}
auto endtime = std::chrono::high_resolution_clock::now();
auto millis = std::chrono::duration_cast<std::chrono::milliseconds>(endtime - starttime).count();
cout << "min/max element: " << millis << " milliseconds." << endl;
}
// run once for warmup
auto minmaxi = minmax_element(v.begin(), v.end());
recresult(&v, *(minmaxi.first), *(minmaxi.second));
// minmax together:
{
auto starttime = std::chrono::high_resolution_clock::now();
for (int i = 0; i < 5; i++) {
minmaxi = minmax_element(v.begin(), v.end());
recresult(&v, *(minmaxi.first), *(minmaxi.second));
}
auto endtime = std::chrono::high_resolution_clock::now();
auto millis = std::chrono::duration_cast<std::chrono::milliseconds>(endtime - starttime).count();
cout << "minmax element: " << millis << " milliseconds." << endl;
}
return 0;
}