Selecting 2 largest elements out of 4, preserving order

Asked 6/8, 2024 at 14:14 Answered 11/8, 2024 at 20:39

I need an efficient way to select 2 largest integers from an array of 4, preserving the order.
E.g. 1,4,3,5 -> 4,5; 1,5,3,4 -> 5,4

What's the efficient way to do it (C/C++)?

I.e., with minimum conditionals (comparisons), or using min/max (assuming min/max have dedicated HW instructions and are not conditionals).

In ambiguous cases like 1,3,4,3 both 3,4 and 4,3 are acceptable answers.

Photographic answered 6/8, 2024 at 14:14 Comment(10)

Do you have a specific language in mind? What about 1,3,4,3? – Weighted 6/8, 2024 at 14:27

@Weighted c/c++ (CUDA in case of GPU). 3,4 and 4,3 are acceptable answers. – Photographic 6/8, 2024 at 14:36

Do not tag both C and C++ except when asking about differences or interactions between the two languages. They are different languages, and the answers for them may be different. Pick one language and delete the other tag. – Radiothermy 6/8, 2024 at 15:14

I need an efficient way - do you have own solution which works? Did you write performance test to verify which solution is best? – Ing 6/8, 2024 at 15:25

Do you need to preserve the input array? Or is an in-place algorithm acceptable? – Carlocarload 6/8, 2024 at 16:8

In-place acceptable – Photographic 6/8, 2024 at 17:7

@Photographic Are all integer inputs known to be non-negative, or perhaps even all greater than zero? – Yablon 6/8, 2024 at 19:1

I don't get the problem... FOUR elements to iterate and store two indexes of the highest and second highest numbers? Or use just if-elses to compare four elements? Why would you even bother optimize such thing? That is no critical code... or am I missing something? Is this some kind of homework or job-interview question? – Nullification 7/8, 2024 at 1:38

@Nullification "Why would you even bother optimize such thing?" - if this is something the program does a lot and it has shown up as a bottleneck when profiling, then optimizing it seems reasonable. – Ser 7/8, 2024 at 9:48

Could you clarify what is required when the largest value appears more than once in the list. The solutions below appear to assume that the maximum value should be returned in both positions, but it wasn't 100% clear to me that was what you meant. – Straightway 7/8, 2024 at 14:7

This is summary of some others answers. I've changed code so it all have common API.

Basically when trying to provide optimal code there are two basic rules:

Code must work (so must be tested) https://godbolt.org/z/8hMKoGh5s

TEMPLATE_TEST_CASE("stable_largest_two", "[template]", 
    SimonGoater,
    trincot2,
    trincot1,
    chqrlie,
    isrnick,
    lastchance,
    TedLyngmo_1,
    TedLyngmo_2)
    /*, nicky_eyes fails) */ 
{
    SECTION("single solution")
    {
        auto [in, expected] = GENERATE(table<ArgType, ResultType>({
            { {}, {} },
            { { 0, 0, 1, 1 }, { 1, 1 } },
            { { 1, 0, 0, 1 }, { 1, 1 } },

            { { 1, 2, 3, 4 }, { 3, 4 } },
            { { 1, 2, 4, 3 }, { 4, 3 } },
            { { 1, 3, 2, 4 }, { 3, 4 } },
            { { 1, 3, 4, 2 }, { 3, 4 } },
            { { 1, 4, 2, 3 }, { 4, 3 } },
            { { 1, 4, 3, 2 }, { 4, 3 } },
            { { 2, 1, 3, 4 }, { 3, 4 } },
            { { 2, 1, 4, 3 }, { 4, 3 } },
            { { 2, 3, 1, 4 }, { 3, 4 } },
            { { 2, 3, 4, 1 }, { 3, 4 } },
            { { 2, 4, 1, 3 }, { 4, 3 } },
            { { 2, 4, 3, 1 }, { 4, 3 } },
            { { 3, 1, 2, 4 }, { 3, 4 } },
            { { 3, 1, 4, 2 }, { 3, 4 } },
            { { 3, 2, 1, 4 }, { 3, 4 } },
            { { 3, 2, 4, 1 }, { 3, 4 } },
            { { 3, 4, 1, 2 }, { 3, 4 } },
            { { 3, 4, 2, 1 }, { 3, 4 } },
            { { 4, 1, 2, 3 }, { 4, 3 } },
            { { 4, 1, 3, 2 }, { 4, 3 } },
            { { 4, 2, 1, 3 }, { 4, 3 } },
            { { 4, 2, 3, 1 }, { 4, 3 } },
            { { 4, 3, 1, 2 }, { 4, 3 } },
            { { 4, 3, 2, 1 }, { 4, 3 } },
        }));
        REQUIRE(TestType::stable_largest_two(in) == expected);
    }
    SECTION("mutiple solutions")
    {
        using Catch::Matchers::UnorderedEquals;
        auto [in, expected] = GENERATE(table<ArgType, ResultType>({
            { { 1, 3, 4, 3 }, { 3, 4 } },
            { { 3, 3, 4, 3 }, { 3, 4 } },
            { { 3, 4, 3, 1 }, { 3, 4 } },
            { { 3, 4, 1, 3 }, { 3, 4 } },
        }));
        auto result = TestType::stable_largest_two(in);
        REQUIRE_THAT((std::vector<int>{result.begin(), result.end()}), 
            UnorderedEquals(std::vector<int>{expected.begin(), expected.end()}));
    }
}

You need some measurement to make responsible decision which solution is fast:

constexpr auto TestData = std::to_array<std::array<int, 4>>({
    {},
    { 0, 0, 1, 1 },
    { 1, 0, 0, 1 },

    { 1, 2, 3, 4 },
    { 1, 2, 4, 3 },
    { 1, 3, 2, 4 },
    { 1, 3, 4, 2 },
    { 1, 4, 2, 3 },
    { 1, 4, 3, 2 },
    { 2, 1, 3, 4 },
    { 2, 1, 4, 3 },
    { 2, 3, 1, 4 },
    { 2, 3, 4, 1 },
    { 2, 4, 1, 3 },
    { 2, 4, 3, 1 },
    { 3, 1, 2, 4 },
    { 3, 1, 4, 2 },
    { 3, 2, 1, 4 },
    { 3, 2, 4, 1 },
    { 3, 4, 1, 2 },
    { 3, 4, 2, 1 },
    { 4, 1, 2, 3 },
    { 4, 1, 3, 2 },
    { 4, 2, 1, 3 },
    { 4, 2, 3, 1 },
    { 4, 3, 1, 2 },
    { 4, 3, 2, 1 },
});

template <typename Solution>
static void PerfCheck(benchmark::State& state)
{
    for (auto _ : state) {
        for (const auto& arr : TestData) {
            benchmark::DoNotOptimize(arr);
            auto result = Solution::stable_larget_two(arr);
            benchmark::DoNotOptimize(result);
        }
    }
}
BENCHMARK(PerfCheck<trincot>);
BENCHMARK(PerfCheck<lastchance>);
BENCHMARK(PerfCheck<TedLyngmo>);

Now you have tools to do more experiments and select best solution.

Ing answered 6/8, 2024 at 16:4 Comment(14)

Oups ... my solution sucks :-D Oh well, at least it's easy to maintain :-) – Ser 6/8, 2024 at 16:13

@TedLyngmo It might be result how I've adopted API of your solution. Also your solution assumes arbitrary size of input data and others used fixed size data. – Ing 6/8, 2024 at 16:21

True - I was expecting it to be a little slower than the highly specialized versions, but I wasn't expecting it to be that slow :-) Performance measurements are always key when performance matters. – Ser 6/8, 2024 at 16:24

@MarekR: can you please benchmark my solution(s)? – Headmaster 6/8, 2024 at 17:57

I've added my solution to the benchmarks: Clang-> quick-bench.com/q/03JW3Bjf_WeYGjStPYHs29NJQiI - - GCC-> quick-bench.com/q/FeFHguErPGOVwgO0fj0Tl5A4BDA - - It's slower than trincot's solution in Clang but faster in GCC. I'm not sure why... – Liquefacient 6/8, 2024 at 19:19

And here is the tests link with my solution added: godbolt.org/z/rdM3zcK6c – Liquefacient 6/8, 2024 at 20:47

I added my updated solution to the benchmark: quick-bench.com/q/-oUWjTF-OR0iQUHRvfImEaUmcSs - it's still slower than those not using standard algorithms but not as horribly as before :-) – Ser 7/8, 2024 at 5:29

@Headmaster I added your version to the benchmark too: quick-bench.com/q/D_YBWdWmauUK6aAQtuiwKkqzi6M - it wins in gcc and is on par with trincot's version in clang. – Ser 7/8, 2024 at 8:56

@MarekR I wonder if the DoNotOptimize(arr) thingy actually manages to "hide" the test data from the algorithms that is supposed to use it? If I replace the fixed data set with a set that is generated randomly at program start, the results are slightly different. Demo – Ser 7/8, 2024 at 9:30

I have added the C++ version I'd use to my C-based answer. Could you replace the body in your benchmark tests? – Adinaadine 7/8, 2024 at 10:40

Will update answer in future (will try provide own implementation). Meanwhile clang gcc. – Ing 7/8, 2024 at 13:43

I added an alternative version and a benchmark for that: quick-bench.com/q/S5gY0CPga_zzPVbCOJXD5-4L2Oc – Ser 7/8, 2024 at 17:10

Could you include the compile commands you used, Marek? Always important for benchmarking, and may help others reproduce your results. Thanks. – Annulose 8/8, 2024 at 11:47

@TobySpeight for details you have to check documentation of qiuck-bench (or its sources). Basic optimization flags are exposed by UI, but there are probably more of them. – Ing 8/8, 2024 at 12:53

Here is a simple approach to get the largest 2 elements n1 and n2 of an array of at least 2 values, preserving order:

start with the first 2 elements.
for each value x of the remaining elements:
- if x >= n2, the new pair is { max(n1, n2), x }
- otherwise, if x > n1, the new pair is { n2, x }.

typedef struct pair {
    int n1, n2;
} pair;

pair max_pair(int arr[static 2], size_t len) {
    pair res = { arr[0], arr[1] };
    for (size_t i = 2; i < len; i++) {
        int x = arr[i];
        if (x >= res.n2) {
            if (res.n1 < res.n2)
                res.n1 = res.n2;
            res.n2 = x;
        } else {
            if (x > res.n1) {
                res.n1 = res.n2;
                res.n2 = x;
            }
        }
    }
    return res;
}

If you can assume dedicated HW for the min and max functions, this translates to an even simpler loop:

start with the first 2 elements.
for each value x of the remaining elements:
- if x > min(n1, n2), the new pair is { max(n1, n2), x }

typedef struct pair {
    int n1, n2;
} pair;

static inline int min(int a, int b) { return a <= b ? a : b; }
static inline int max(int a, int b) { return a >= b ? a : b; }

pair max_pair(int arr[static 2], size_t len) {
    pair res = { arr[0], arr[1] };
    for (size_t i = 2; i < len; i++) {
        int x = arr[i];
        if (x > min(res.n1, res.n2)) {
            res.n1 = max(res.n1, res.n2);
            res.n2 = x;
        }
    }
    return res;
}

For a set of 4 argument values, this simplifies into:

typedef struct pair {
    int n1, n2;
} pair;

static inline int min(int a, int b) { return a <= b ? a : b; }
static inline int max(int a, int b) { return a >= b ? a : b; }

pair max_pair4(int a, int b, int c, int d) {
    if (c > min(a, b)) {
        a = max(a, b);
        b = c;
    }
    if (d > min(a, b)) {
        a = max(a, b);
        b = d;
    }
    return (pair){ a, b };
}

Alternately, here are in place solutions:

static inline int min(int a, int b) { return a <= b ? a : b; }
static inline int max(int a, int b) { return a >= b ? a : b; }

void max_pair(int arr[static 2], size_t len) {
    int a = arr[0];
    int b = arr[1];
    for (size_t i = 2; i < len; i++) {
        int x = arr[i];
        if (x > min(a, b)) {
            a = max(a, b);
            b = x;
        }
    }
    arr[0] = a;
    arr[1] = b;
}

void max_pair4(int arr[static 4]) {
    if (arr[2] > min(arr[0], arr[1])) {
        arr[0] = max(arr[0], arr[1]);
        arr[1] = arr[2];
    }
    if (arr[3] > min(arr[0], arr[1])) {
        arr[0] = max(arr[0], arr[1]);
        arr[1] = arr[3];
    }
}

Headmaster answered 6/8, 2024 at 17:36 Comment(2)

From what I can see so far, this is the most concise solution. It maps to a short branchless code sequence on common processor architectures (x86-64, AArch64, RISC-V with zicond extension). – Yablon 6/8, 2024 at 21:8

@Yablon Yes, as I just mentioned under Marek's benchmarking post, chqrlie's version wins in gcc and is on par with trincot's version in clang. – Ser 7/8, 2024 at 9:1

For getting the two greatest values (in order) from an array of exactly four values, you'll need 4 comparisons in the worst case.

Here is a naive function that just performs the comparisons necessary to drill down to a state where those two greatest are determined. I have added a main function which executes the function for all permutations of {1,2,3,4}:

#include <stdio.h>

void larger_two(int *arr, int res[2]) {
    int i, j;
    if (arr[0] > arr[1]) {
        if (arr[2] > arr[3]) {
            if (arr[1] > arr[2]) {
                i = 0, j = 1;
            } else if (arr[0] < arr[3]) {
                i = 2, j = 3;
            } else {
                i = 0, j = 2;
            }
        } else if (arr[1] > arr[3]) {
            i = 0, j = 1;
        } else if (arr[0] < arr[2]) {
            i = 2, j = 3;
        } else {
            i = 0, j = 3;
        }
    } else if (arr[2] > arr[3]) {
        if (arr[0] > arr[2]) {
            i = 0, j = 1;
        } else if (arr[1] < arr[3]) {
            i = 2, j = 3;
        } else {
            i = 1, j = 2;
        }
    } else if (arr[0] > arr[3]) {
        i = 0, j = 1;
    } else if (arr[1] < arr[2]) {
        i = 2, j = 3;
    } else {
        i = 1, j = 3;
    }
    res[0] = arr[i];
    res[1] = arr[j];
}

int main(void) {
    int tests[][2][4] = {
        { { 0, 0, 1, 1 }, { 1, 1 } },
        { { 1, 0, 0, 1 }, { 1, 1 } },
        { { 0, 0, 1, 0 }, { 0, 1 } },
        { { 0, 1, 0, 2 }, { 1, 2 } },
    
        { { 1, 2, 3, 4 }, { 3, 4 } },
        { { 1, 2, 4, 3 }, { 4, 3 } },
        { { 1, 3, 2, 4 }, { 3, 4 } },
        { { 1, 3, 4, 2 }, { 3, 4 } },
        { { 1, 4, 2, 3 }, { 4, 3 } },
        { { 1, 4, 3, 2 }, { 4, 3 } },
        { { 2, 1, 3, 4 }, { 3, 4 } },
        { { 2, 1, 4, 3 }, { 4, 3 } },
        { { 2, 3, 1, 4 }, { 3, 4 } },
        { { 2, 3, 4, 1 }, { 3, 4 } },
        { { 2, 4, 1, 3 }, { 4, 3 } },
        { { 2, 4, 3, 1 }, { 4, 3 } },
        { { 3, 1, 2, 4 }, { 3, 4 } },
        { { 3, 1, 4, 2 }, { 3, 4 } },
        { { 3, 2, 1, 4 }, { 3, 4 } },
        { { 3, 2, 4, 1 }, { 3, 4 } },
        { { 3, 4, 1, 2 }, { 3, 4 } },
        { { 3, 4, 2, 1 }, { 3, 4 } },
        { { 4, 1, 2, 3 }, { 4, 3 } },
        { { 4, 1, 3, 2 }, { 4, 3 } },
        { { 4, 2, 1, 3 }, { 4, 3 } },
        { { 4, 2, 3, 1 }, { 4, 3 } },
        { { 4, 3, 1, 2 }, { 4, 3 } },
        { { 4, 3, 2, 1 }, { 4, 3 } },
    };
    int num_tests = sizeof(tests) / sizeof(tests[0]);
    int res[2];
    for (int i = 0; i < num_tests; i++) {
        printf("%d\n", i);
        int *arr = tests[i][0];
        int *expected = tests[i][1];
        larger_two(arr, res);
        if (res[0] != expected[0] || res[1] != expected[1]) {
            printf("test failed!\n");
        }
    }
    printf("%d tests executed.\n", num_tests);
}

The tests verify the expected output and they pass.

C++

When using C++ std::array:

    static std::array<int, 2> larger_two(const std::array<int, 4>& arr)
    {
        if (arr[0] > arr[1]) {
            if (arr[2] > arr[3]) {
                if (arr[1] > arr[2]) {
                    return { arr[0], arr[1] };
                } else if (arr[0] < arr[3]) {
                    return { arr[2], arr[3] };
                } else {
                    return { arr[0], arr[2] };
                }
            } else if (arr[1] > arr[3]) {
                return { arr[0], arr[1] };
            } else if (arr[0] < arr[2]) {
                return { arr[2], arr[3] };
            } else {
                return { arr[0], arr[3] };
            }
        } else if (arr[2] > arr[3]) {
            if (arr[0] > arr[2]) {
                return { arr[0], arr[1] };
            } else if (arr[1] < arr[3]) {
                return { arr[2], arr[3] };
            } else {
                return { arr[1], arr[2] };
            }
        } else if (arr[0] > arr[3]) {
            return { arr[0], arr[1] };
        } else if (arr[1] < arr[2]) {
            return { arr[2], arr[3] };
        } else {
            return { arr[1], arr[3] };
        }
    }

Adinaadine answered 6/8, 2024 at 15:21 Comment(7)

I think fewer comparisons should be enough, max 3 in a row and only 8 cases... was working out the details, but have to go now... – Weighted 6/8, 2024 at 15:30

@tobias, looking forward to your post. – Adinaadine 6/8, 2024 at 15:34

It would be nice to shake this solution to generate branchless code :) – Headmaster 6/8, 2024 at 18:1

@Adinaadine Na, I was barely out of the door when I realized that while the first condition always eliminates 2 candidate solutions, the second may eliminate another 2 or just 1 solution, depending on the outcome, but never 2 in both cases. – Weighted 6/8, 2024 at 18:30

Not what I originally had in mind, but I think I now found a solution that has way fewer comparisons... – Weighted 6/8, 2024 at 18:54

It has way fewer lines of code, but can have more executed comparisons (5). – Adinaadine 6/8, 2024 at 19:57

@chux, I just added some test cases that have duplicate values. – Adinaadine 13/8, 2024 at 16:26

Here's another solution for exactly four numbers, that's very simple and easy to follow, has no loops and should use only 4 to at most 6 individual comparisons. Basically, assume the first two numbers to be the biggest, then check those against the third and fourth and swap those in if they are bigger.

def max2(arr):
    a, b, c, d = arr
    x, y = a, b
    if c > x and x < y:
        x, y = y, c
    elif c > y:
        y = c
    if d > x and x < y:
        x, y = y, d
    elif d > y:
        y = d
    return [x, y]


from itertools import permutations
nums = [1, 2, 3, 4]
for arr in permutations(nums):
    ref = [x for x in arr if x >= 3]
    res = max2(arr)
    assert res == ref

Almost too simple to be correct, please let me know if you see a flaw... (I am aware that the question is tagged C, but my C skills are not so good. Please see this as "executable pseudo-code" and not Python...

Or in C, using a loop for testing the 3rd and 4th element, otherwise the same logic:

void max2(int arr[4], int res[2]) {
    res[0] = arr[0];
    res[1] = arr[1];
    for (int i = 2; i <= 3; i++) {
        if (arr[i] > res[0] && res[0] < res[1]) {
            res[0] = res[1];
            res[1] = arr[i];
        } else if (arr[i] > res[1]) {
            res[1] = arr[i];
        }
    }
}

Weighted answered 6/8, 2024 at 18:51 Comment(0)

This should work:


#include <iostream>

int max(int a, int b){return a>b?a:b;}
int min(int a, int b){return a<b?a:b;}

int main() {
    int arr[4] = {1,2,4,3};
    int hi1 = arr[0], hi2=arr[1]; // highest two in order
    int peak = max(hi1, hi2);
    int next = min(hi1, hi2);

    for(int i = 2; i<4; ++i){
    std::cout << "pek,next " << peak << ", " << next << std::endl;
        int a = arr[i];
        if(a>peak){
            hi1=max(hi1, hi2); hi2=a;
            peak=a;
            next=hi1;
        }
        else if(a>next)
        {
            if(hi1==next){hi1=hi2;}
            hi2=a;
            next=a;
            
        }
    std::cout << "h1, h2 " << hi1 << ", " << hi2 << std::endl;
    }

    std::cout << "pek,next " << peak << ", " << next << std::endl;
    std::cout << "h1, h2 " << hi1 << ", " << hi2 << std::endl;

    return 0;
}

Think of walking along a mountain and remember the current peak and the highest point below that peak. When you travel, and either peak or next change, the new point has to be the right one, so you either must move hi2 to hi1 or just overwrite hi2.

Electrobiology answered 6/8, 2024 at 15:28 Comment(2)

Why not use std::min and std::max? – Mashie 6/8, 2024 at 17:8

It was just a quick shot after work. Yes, using std::min/max from <algorithm> or even a hi1 = hi2>hi1 ? hi2 : hi1 might be better. Also you should unroll the 'i' loop. – Electrobiology 7/8, 2024 at 11:21

You could use an index array and std::partial_sort to make sure that the N indices for the largest elements are placed at the N first positions in the array.

This returns a std::array with the N largest elements in the order they appeared in the original array.

template <size_t N, class T, std::size_t S>
auto get_the_biggest(const std::array<T, S>& in) {
    static_assert(N <= S, "N can't be larger than the array");

    auto idx = []<std::size_t... Is>(std::index_sequence<Is...>) {
        return std::array{Is...};
    }(std::make_index_sequence<S>{});

    if constexpr (N == 0) return std::array<T, N>{};

    auto comp = [&](size_t lhs, size_t rhs) { return in[rhs] < in[lhs]; };
    std::partial_sort(idx.begin(), idx.begin() + N, idx.end(), comp);
    std::sort(idx.begin(), idx.begin() + N); // restore index order

    return [&]<std::size_t... Is>(std::index_sequence<Is...>) {
        return std::array{in[idx[Is]]...};
    }(std::make_index_sequence<N>{});
}

Example:

int main() {
    std::array arr{1, 4, 3, 5};
    auto [first, second] = get_the_biggest<2>(arr);
    std::cout << first << ',' << second << '\n';  // 4,5
}

Demo

An alternative that is much faster than the above (at least for small arrays), that doesn't use std::partial_sort + std::sort (so it's a little bit harder to follow), could look like this:

template <size_t N, class T, std::size_t S>
static auto get_the_biggest(const std::array<T, S>& in) {
    static_assert(N <= S, "N can't be larger than the array");

    // populate the result array with the N first elements
    auto res = [&]<std::size_t... Is>(std::index_sequence<Is...>) {
        return std::array<T, N>{in[Is]...};
    }(std::make_index_sequence<N>{});

    if constexpr (N == 0) return res;

    auto min = std::min_element(res.begin(), res.end());
    for (size_t i = N; i < S; ++i) {
        if (*min < in[i]) {  // current is bigger than the min element           
            // move everything after the min element down:
            std::move(std::next(min), res.end(), min);

            res[N - 1] = in[i];  // put the new value last
            min = std::min_element(res.begin(), res.end());
        }
    }
    return res;
}

Demo

Ser answered 6/8, 2024 at 15:32 Comment(0)

Here is my solution:

#include <stdio.h>

void larger_two(int arr[4], int res[2]) {
    //At first assume that the first two are the 2 largest numbers:
    res[0] = arr[0];
    res[1] = arr[1];
    
    //Then for the remaining numbers in the array:
    for (int i = 2; i < 4; i++) { 
        
        //If the new number is larger than any of the 2 largest until now:
        if (arr[i] > res[0] || arr[i] > res[1]) { 
            
            //If the first result until now is smaller than the second result
            //until now, replace the first for the second. This way the largest
            //of the two stays as the first result:
            if (res[0] < res[1]) {
                res[0] = res[1];
            }
            
            //And then store the new largest as the second result:
            res[1] = arr[i];
        }
    }
}

struct testsample {
    int arr[4];
    int res[2];
};

int main()
{
    struct testsample table[] ={
        { { 0, 0, 1, 1 }, { 1, 1 } },
        { { 1, 0, 0, 1 }, { 1, 1 } },
        { { 1, 2, 3, 4 }, { 3, 4 } },
        { { 1, 2, 4, 3 }, { 4, 3 } },
        { { 1, 3, 2, 4 }, { 3, 4 } },
        { { 1, 3, 4, 2 }, { 3, 4 } },
        { { 1, 4, 2, 3 }, { 4, 3 } },
        { { 1, 4, 3, 2 }, { 4, 3 } },
        { { 2, 1, 3, 4 }, { 3, 4 } },
        { { 2, 1, 4, 3 }, { 4, 3 } },
        { { 2, 3, 1, 4 }, { 3, 4 } },
        { { 2, 3, 4, 1 }, { 3, 4 } },
        { { 2, 4, 1, 3 }, { 4, 3 } },
        { { 2, 4, 3, 1 }, { 4, 3 } },
        { { 3, 1, 2, 4 }, { 3, 4 } },
        { { 3, 1, 4, 2 }, { 3, 4 } },
        { { 3, 2, 1, 4 }, { 3, 4 } },
        { { 3, 2, 4, 1 }, { 3, 4 } },
        { { 3, 4, 1, 2 }, { 3, 4 } },
        { { 3, 4, 2, 1 }, { 3, 4 } },
        { { 4, 1, 2, 3 }, { 4, 3 } },
        { { 4, 1, 3, 2 }, { 4, 3 } },
        { { 4, 2, 1, 3 }, { 4, 3 } },
        { { 4, 2, 3, 1 }, { 4, 3 } },
        { { 4, 3, 1, 2 }, { 4, 3 } },
        { { 4, 3, 2, 1 }, { 4, 3 } },
    };
    
    int len = sizeof(table) / sizeof(struct testsample);
    
    int res[2];
    
    for (int i = 0; i < len; i++) {
        larger_two(table[i].arr, res);
        if (res[0] != table[i].res[0] || res[1] != table[i].res[1])
            printf("Error-> ");
        printf("Array: %d,%d,%d,%d - Expected Result: %d,%d - Actual Result: %d,%d\n",
            table[i].arr[0],
            table[i].arr[1],
            table[i].arr[2],
            table[i].arr[3],
            table[i].res[0],
            table[i].res[1],
            res[0],
            res[1]
        );
    }

    return 0;
}

It's very simple, and I added comments to explain the logic in detail.

And the main function tests several examples.

Liquefacient answered 6/8, 2024 at 18:25 Comment(0)

Since you mentioned CUDA, I made a SIMD version. I chose the EVE SIMD library because it's the first hit result in godbolt 😂 and because I like modern C++ 😎. But the solution is easily adaptable to any SIMD framework.

#include <utility>
#include <limits>
#include <algorithm>

#include <eve/wide.hpp>
#include <eve/conditional.hpp>
#include <eve/module/core.hpp>

using i4 = eve::wide<int, eve::fixed<4>>;

auto get_two_largest(i4 a) -> std::pair<int, int>
{
    // get largest integer
    int largest = eve::maximum(a);
    auto largest_idx = *eve::first_true(a == largest);

    // make a copy without the largest integer
    i4 a_without_largest = a;
    a_without_largest.set(largest_idx, std::numeric_limits<int>::min());

    // get the 2nd largest integer
    int second_largest = eve::maximum(a_without_largest);
    auto second_largest_idx = *eve::first_true(a_without_largest == second_largest);

    // get them preserving order
    auto [first_idx, second_idx] = std::minmax(largest_idx, second_largest_idx);
    return {a.get(first_idx), a.get(second_idx)};
}

Both gcc and clang generate completely branchless code. (-mavx2 gives better code and -march=native gives even better code). Unfortunately I couldn't quickly profile it because quick-bench.com doesn't support external libraries 😢. Also I haven't really done any SIMD coding since university so I am sure there are more optimal versions.

https://godbolt.org/z/93Pq6a3TG

Molybdous answered 6/8, 2024 at 20:39 Comment(9)

TBH none of these instructions look particularly SIMD-efficient. It's very much linear code with cross-lane comparisons. I've been thinking about a SIMD solution, and the first thing I noticed was that you can create 3 rotations of the input and use those to compare every input against every other input in parallel. – Carlocarload 8/8, 2024 at 7:17

I didn't benchmark this, but even as it is I would expect to be efficient because as far as I've checked it's the only solution without any branches whatsoever. Also since the op mentions CUDA, it could be that he has a lot of these to do at the same time and then using SIMDs either in CPU or in GPU it will scale insanely efficiently. – Molybdous 8/8, 2024 at 19:4

As for the 3 rotations alg, it looks premising. The trick is to get them in the original order. I am very rusty on my SIMD algorithms, so I am sure there are better way of doing it then my solution. I'll give your idea a try if only to challenge myself. – Molybdous 8/8, 2024 at 19:7

if you have a lot of them you put them in an array and then instead of my trick of copying the original, you can use compress on the array and you get all of them in one go. – Molybdous 8/8, 2024 at 19:8

Hi, EVE co-author here. A better EVE version maybe to use eve::sort on a wide : godbolt.org/z/4es19Tecc Now, it is still a lot of stuff going one for only 4 element. I think a partial swap network on scalars woudl be better. – Shakespeare 9/8, 2024 at 13:38

@JoelFalcou from my quick encounter with EVE it looks awesome. I really appreciate the modern C++ in it. The conditional stuff is ingenious. Congratulations for it! – Molybdous 9/8, 2024 at 17:54

@JoelFalcou as for the eve::sort solution it gives wrong result for {1, 5, 3, 4}. It gives {4, 5} but it should be {5, 4} (it should preserve their order in the original array. – Molybdous 9/8, 2024 at 17:56

@JoelFalcou one solutions I tried was to get the biggest two values (i got that) and then conceptually compress[a >= second_largest] would give the correct answer. However I couldn't get compress to works. on just one wide. It looks like it needs an array for either input or output and that seems overly complicated for this task. Is there an operation in EVE like this: wide res = my_compress(a >= largest_int)? – Molybdous 9/8, 2024 at 18:0

compress is a bit complicated. I think I can have a look and tell you which compress_* is OK to use. – Shakespeare 9/8, 2024 at 19:54

Walk the list, recording the index of the 2 highest values.

// Psuedo-code
size_t high_index[2] = { 0, 1 }; // Highest value's index stored in high_index[0]
if (list[high_index[1]] > list[high_index[0]]) {
  swap(high_index[1], high_index[0]);
}

for (i = 2 i < n; i++) {
  if (list[i] > list[high_index[1]]) { // More than 2nd highest?
    high_index[1] = i;
    if (list[high_index[1]] > list[high_index[0]]) {
      swap(high_index[1], high_index[0]);
    }
  }
}

Form the result by sorting the indexes

if (high_index[0] < high_index[1])
  reuslt = { list[high_index[0]], list[high_index[1]] };
else
  reuslt = { list[high_index[1]], list[high_index[0]] };

On 2nd thought, One could use pointers rather than indexes.
The algorithm remains the same.

Talbot answered 6/8, 2024 at 15:26 Comment(0)

Straight loops (which might be replaceable by stuff from <algorithm>).

#include <iostream>
#include <vector>
#include <algorithm>
using namespace std;

void bestTwo( const vector<int> &V, int &p, int &q )    // assume V has at least two elements
{
   int N = V.size();
   int a = 0, b = 0;
   for ( int i = a + 1; i < N; i++ ) if ( V[i] > V[a] ) a = i;
   if ( a == 0 ) b = 1;
   for ( int i = 0    ; i < a; i++ ) if ( V[i] > V[b] ) b = i;
   for ( int i = a + 1; i < N; i++ ) if ( V[i] > V[b] ) b = i;
   if ( a > b ) swap( a, b );
   p = V[a];   q = V[b];
}


int main()
{
    vector< vector<int> > tests = { 
                                    { 1, 4, 3, 5 },
                                    { 1, 5, 3, 4 },
                                    { 1, 3, 4, 3 },
                                    { 1, 2, 3, 6, 5, 4 },
                                    { 1, 2, 5, 4 }
                                  };
    int p, q;

    for ( auto &V : tests ) 
    {
       bestTwo( V, p, q );
       cout << p << " " << q << '\n';
    }
}

Alternative Method

Just to be different, here's another one that works on the principle that the two largest must also be the pair with the largest sum. No idea how doing 6 sums compares with simple logical comparisons for timing though.

#include <iostream>
#include <vector>
#include <algorithm>
#include <array>
using namespace std;

array<int,2> bestTwo( int V[4] )           // tried to put it in Marek's format for testing
{
   const static int out1[] = { 0, 0, 0, 1, 1, 2 };
   const static int out2[] = { 1, 2, 3, 2, 3, 3 };
   static int sums[6];
   sums[0] = V[0]+V[1];   sums[1] = V[0]+V[2];   sums[2] = V[0]+V[3];   sums[3] = V[1]+V[2];   sums[4] = V[1]+V[3];  sums[5] = V[2]+V[3];
   int i = std::max_element( sums, sums+6 ) - sums;
   return { V[out1[i]], V[out2[i]] };
}

int main()
{
    vector< vector<int> > tests = { 
                                    { 1, 4, 3, 5 },
                                    { 1, 5, 3, 4 },
                                    { 1, 3, 4, 3 },
                                    { 1, 2, 5, 4 },
                                    { 4, 4, 4, 4 },
                                    { 3, 3, 4, 4 },
                                    { 4, 3, 4, 4 },
                                    { 4, 3, 4, 3 }
                                  };
    for ( auto &V : tests ) 
    {
       auto result = bestTwo( V.data() );
       cout << result[0] << " " << result[1] << '\n';
    }
}

Harrod answered 6/8, 2024 at 15:41 Comment(7)

This works but takes twice as long as the best of the rest posted here according to my tests. – Straightway 7/8, 2024 at 11:4

Yes, there are some great solutions posted by others in this topic, and a good test system. I was erring on the side of safety (at the time of writing there were several suggestions that gave the wrong answer) and "future-proofing" (not fixing the size of the array). – Harrod 7/8, 2024 at 11:33

I've added a second approach (relies on fixed-size 4). It uses the premise that the two largest must also give the largest sum of pairs. – Harrod 7/8, 2024 at 12:12

(For the 1st alternative, iterate the full range and skip a. In the 2nd why convert the result of max_element to an index?) – Dimaggio 8/8, 2024 at 4:4

@greybeard, in the first code in order to skip ‘a’ I would have to do an extra logical comparison (for equality/inequality) on every pass of the loop. In the second code I’m not after the sum but the pair of values giving rise to maximum sum, so I need the index to identify them. Unexpectedly, (to me anyway), the second code is slightly faster, despite having to do some arithmetic work. – Harrod 8/8, 2024 at 5:19

I need the index to identify them an iterator should do nicely. – Dimaggio 8/8, 2024 at 6:33

But an iterator here would be pointing to the wrong array for the output. I don't need to return one sum: I need to return two individual values from a different array. So I use an index. – Harrod 8/8, 2024 at 8:55

This is heavily based on chqrlie's answer. I found making his function branchless reduces the time it takes in my single threaded CPU based test by almost a third. I expect that on a GPU, the improvement could be much greater, assuming the architecture supports the conditional moves needed for efficient MAX and MIN evaluations.

typedef struct pair {
    int32_t n1, n2;
} pair;

#define MAX(x,y) ((x) < (y) ? (y) : (x))
#define MIN(x,y) ((x) > (y) ? (y) : (x))

pair max_pair4_gpu(int32_t input[4]) {
  _Bool test = (input[2] > MIN(input[0], input[1]));
  input[0] = test*MAX(input[0], input[1]) + (!test)*input[0];
  input[1] = test*input[2] + (!test)*input[1];
  test = (input[3] > MIN(input[0], input[1]));
  input[0] = test*MAX(input[0], input[1]) + (!test)*input[0];
  input[1] = test*input[3] + (!test)*input[1];
  return (pair){ input[0], input[1]};
}

My testing program is below.

#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <unistd.h>
#include <stdbool.h>
#include <string.h>
#include <time.h>
#include <immintrin.h> 

// gcc largest2of4test.c -o largest2of4test.bin -O3 -march=native -Wall 

// See https://mcmap.net/q/842058/-selecting-2-largest-elements-out-of-4-preserving-order
uint64_t  get_cycles () {
  uint32_t lo,hi;
  asm  volatile("rdtsc":"=a"(lo),"=d"(hi));
  return  (( uint64_t)hi<<32 | lo);
}

typedef struct pair {
    int32_t n1, n2;
} pair;

static inline int32_t min(int32_t a, int32_t b) { return a <= b ? a : b; }
static inline int32_t max(int32_t a, int32_t b) { return a >= b ? a : b; }

void max2(int arr[4], pair *res) {
    res->n1 = arr[0];
    res->n2 = arr[1];
    for (int i = 2; i <= 3; i++) {
        if (arr[i] > res->n1 && res->n1 < res->n2) {
            res->n1 = res->n2;
            res->n2 = arr[i];
        } else if (arr[i] > res->n2) {
            res->n2 = arr[i];
        }
    }
}

pair max_pair4_sse(int32_t input[4]) {
  __m128i vin = _mm_lddqu_si128((void*)input);
  __m128i temp = _mm_shuffle_epi32(vin, 177); // 0b10110001
  __m128i vinmax = _mm_max_epi32(temp, vin);
  temp = _mm_shuffle_epi32(vinmax, 90); // 0b01011010
  vinmax = _mm_max_epi32(temp, vinmax);
  //int32_t max = _mm_extract_epi32(vinmax, 0);
  temp = _mm_cmpeq_epi32(vin, vinmax);
  int32_t maxix = __builtin_ctz(_mm_movemask_epi8(temp))/4;
  __m128i vin2 = _mm_or_si128(_mm_andnot_si128(temp, vin), _mm_and_si128(temp, _mm_set1_epi32(0x80000000)));
  __m128i temp2 = _mm_shuffle_epi32(vin2, 177); 
  __m128i vinmax2 = _mm_max_epi32(temp2, vin2);
  temp2 = _mm_shuffle_epi32(vinmax2, 90); 
  vinmax2 = _mm_max_epi32(temp2, vinmax2);
  //int32_t max2 = _mm_extract_epi32(vinmax2, 0);
  temp2 = _mm_cmpeq_epi32(vin2, vinmax2);
  int32_t maxix2 = __builtin_ctz(_mm_movemask_epi8(temp2))/4;
  //printf("Input = %i %i %i %i\nMax1 = %i @ %i\nMax2 = %i @ %i\n", input[0], input[1], input[2], input[3], max, maxix, max2, maxix2);
  if (maxix > maxix2) {
    return (pair){input[maxix2], input[maxix]};
  } else {
    return (pair){input[maxix], input[maxix2]};
  }
}
#define MAX(x,y) ((x) < (y) ? (y) : (x))
#define MIN(x,y) ((x) > (y) ? (y) : (x))

pair max_pair4_gpu(int32_t input[4]) {
  _Bool test = (input[2] > MIN(input[0], input[1]));
  input[0] = test*MAX(input[0], input[1]) + (!test)*input[0];
  input[1] = test*input[2] + (!test)*input[1];
  test = (input[3] > MIN(input[0], input[1]));
  input[0] = test*MAX(input[0], input[1]) + (!test)*input[0];
  input[1] = test*input[3] + (!test)*input[1];
  return (pair){ input[0], input[1]};
}

pair max_pair4_gpu2(int32_t input[4]) {
  _Bool test = (input[2] > MIN(input[0], input[1]));
  input[0] = (test ? MAX(input[0], input[1]) : input[0]);
  input[1] = (test ? input[2] : input[1]);
  test = (input[3] > MIN(input[0], input[1]));
  input[0] = (test ? MAX(input[0], input[1]) : input[0]);
  input[1] = (test ? input[3] : input[1]);
  return (pair){ input[0], input[1]};
}

pair max_pair4(int32_t input[4]) {
    if (input[2] > min(input[0], input[1])) {
        input[0] = max(input[0], input[1]);
        input[1] = input[2];
    }
    if (input[3] > min(input[0], input[1])) {
        input[0] = max(input[0], input[1]);
        input[1] = input[3];
    }
    return (pair){ input[0], input[1] };
}
pair larger_two(int32_t arr[4]) {
    int i, j;
    if (arr[0] > arr[1]) {
        if (arr[2] > arr[3]) {
            if (arr[1] > arr[2]) {
                i = 0, j = 1;
            } else if (arr[0] < arr[3]) {
                i = 2, j = 3;
            } else {
                i = 0, j = 2;
            }
        } else if (arr[1] > arr[3]) {
            i = 0, j = 1;
        } else if (arr[0] < arr[2]) {
            i = 2, j = 3;
        } else {
            i = 0, j = 3;
        }
    } else if (arr[2] > arr[3]) {
        if (arr[0] > arr[2]) {
            i = 0, j = 1;
        } else if (arr[1] < arr[3]) {
            i = 2, j = 3;
        } else {
            i = 1, j = 2;
        }
    } else if (arr[0] > arr[3]) {
        i = 0, j = 1;
    } else if (arr[1] < arr[2]) {
        i = 2, j = 3;
    } else {
        i = 1, j = 3;
    }
    return (pair){ arr[i], arr[j] };
    //res[0] = arr[i];
    //res[1] = arr[j];
}

void larger_two_ismick(int arr[4], pair *res) {
    //At first assume that the first two are the 2 largest numbers:
    res->n1 = arr[0];
    res->n2 = arr[1];
    
    //Then for the remaining numbers in the array:
    for (int i = 2; i < 4; i++) { 
        
        //If the new number is larger than any of the 2 largest until now:
        if (arr[i] > res->n1 || arr[i] > res->n2) { 
            
            //If the first result until now is smaller than the second result
            //until now, replace the first for the second. This way the largest
            //of the two stays as the first result:
            if (res->n1 < res->n2) {
                res->n1 = res->n2;
            }
            
            //And then store the new largest as the second result:
            res->n2 = arr[i];
        }
    }
}

void bestTwo(int32_t *V, pair *res)    // assume V has at least two elements
{
   int N = 4;
   int a = 0, b = 0, temp;
   for ( int i = a + 1; i < N; i++ ) if ( V[i] > V[a] ) a = i;
   if ( a == 0 ) b = 1;
   for ( int i = 0    ; i < a; i++ ) if ( V[i] > V[b] ) b = i;
   for ( int i = a + 1; i < N; i++ ) if ( V[i] > V[b] ) b = i;
   if ( a > b ) {
     temp = a;
     a = b;
     b = temp;
   }
   res->n1 = V[a];   res->n2 = V[b];
}

int main(int argc, char*argv[]) {
  uint64_t cyclesstart, cyclesend;
  uint64_t cycles = 0;
  int64_t sum = 0;
  uint32_t itermax = 10000000;
  int32_t input[4];
  pair res;
  srand(4321);
  for (uint32_t i=0; i<itermax; i++) {
    input[0] = rand() % 0x10000;
    input[1] = rand() % 0x10000;
    input[2] = rand() % 0x10000;
    input[3] = rand() % 0x10000;
    cyclesstart = get_cycles(); // Westmere Core i5 rdtsc cycles gcc 9.4 -O3
    //res = max_pair4_sse(input); // 47 cycles - Doesn't return same integers when largest element appears more than once. 
    //res = max_pair4_gpu(input); // 23-24 cycles
    //res = max_pair4(input); // 36 cycles
    //res = larger_two(input); // 37-38 cycles
    larger_two_ismick(input, &res); // 38-39 cycles
    //bestTwo(input, &res); // 76-77 cycles
    //max2(input, &res); // 39-43 cycles
    cyclesend = get_cycles();
    sum += res.n1 + 3*res.n2;
    cycles += cyclesend - cyclesstart;
  }
  printf("Sum is %li, (%f cycles/call)\n", sum, (float)cycles/(itermax));
}

Straightway answered 7/8, 2024 at 14:47 Comment(3)

quick-bench.com/q/YvKgyraAtItr9N2AVMyhAtsCnZI – Ing 7/8, 2024 at 16:13

@MarekR Thanks for the link. It paints a very different picture. I don't know how much I trust a browser based benchmarking tool for accurate micro-benchmarking. – Straightway 7/8, 2024 at 17:10

@Simon Goater NVIDIA's GPUs provide integer min/max instructions (IMNMX in recent architectures), no need to emulate it. – Yablon 8/8, 2024 at 0:18

From the utility library the std::pair can be used to select two largest integers from an array of 4 elements.

Overall, the aim is to minimize conditionals and min/max operations can serve this purpose.

for example:

#include <iostream>
#include <utility> // For std::pair

std::pair<int, int> findTwoLargest(int arr[4]) {
  // Initialize the two largest values
  int first = std::max(arr[0], arr[1]);
  int second = std::min(arr[0], arr[1]);

  // Compare with the third element
  if (arr[2] > first) {
    second = first;
    first = arr[2];
  } else if (arr[2] > second) {
    second = arr[2];
  }

  // Compare with the fourth element
  if (arr[3] > first) {
    second = first;
    first = arr[3];
  } else if (arr[3] > second) {
    second = arr[3];
  } else if (arr[3] > second) {
    second = arr[3];
  }

  return {first, second};
}

int main() {
  int arr[4] = {1, 5, 3, 4};
  auto [first, second] = findTwoLargest(arr);

  std::cout << "The two largest numbers are: " << first << " and " << second
            << std::endl;

  return 0;
}

To avoid the overhead of using min/max we can use pointers

#include <iostream>

void findTwoLargest(int arr[4], int &first, int &second) {
  if (arr[0] > arr[1]) {
    first = arr[0];
    second = arr[1];
  } else {
    first = arr[1];
    second = arr[0];
  }

  if (arr[2] > first) {
    second = first;
    first = arr[2];
  } else if (arr[2] > second) {
    second = arr[2];
  }

  if (arr[3] > first) {
    second = first;
    first = arr[3];
  } else if (arr[3] > second) {
    second = arr[3];
  }
}

int main() {
  int arr[4] = {1, 4, 3, 5};
  int first, second;
  findTwoLargest(arr, first, second);

  std::cout << "The two largest numbers are: " << first << " and " << second
            << std::endl;

  return 0;
}

Designer answered 11/8, 2024 at 20:39 Comment(0)

My First Instinct is to use the two-pointer approach.

std::pair<int,int> getTopTwoInOrder(int data[], int length) {
    int p1 = 0;
    int p2 = length - 1;

    int h1 = 0;
    int h2 = 0;

    while(p1 < p2) {
        if(data[p1] > h1) {
            h1 = data[p1];
        }
        if(data[p2] > h2) {
            h2 = data[p2];
        }
        p1++;
        p2--;
    }
    return {h1,h2};
}

Ridenhour answered 6/8, 2024 at 14:58 Comment(3)

What about 1,2,5,4? Won't your code return 2,5 instead of 5,4? – Liquefacient 6/8, 2024 at 15:9

at least have the decency to use std::span<int> this is not "C" and data[], length is fragile "C" style code – Endospore 6/8, 2024 at 15:33

godbolt.org/z/n6YPaW5Tv – Ing 6/8, 2024 at 16:24

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

C++

Recommended topics

Hot tags