Why is sorting a std::vector of std::tuple's faster than sorting a vector of std::arrays?
Asked Answered
G

1

7

I was curious to see whether sorting a vector <vector<int>> would be slower than sorting a vector <array <int, 3>>. The dimensions of the vector is 1000000 by 3, and below is my driver code implementing this:

#include <iostream>
#include <fstream>
#include <cmath>
#include <algorithm>
#include <vector>

using namespace std;

int main()
{
    vector <vector<int>> v(1000000, vector <int> (3));

    srand(time(nullptr));
    for(int i = 0; i < 1000000; ++i){
        for(int j = 0; j < 3; ++j){
            v[i][j] = rand();
        }
    }

    double start = clock();
    sort(v.begin(), v.end());
    cout << (clock()-start)/(CLOCKS_PER_SEC/1000) << endl;

    return 0;
}

Compiling with g++ -O3 sorting_test.cxx with gcc 7.5.0, I get a runtime of around 300 ms. Declaring v as a vector <array <int, 3>> halved the runtime to around 149 ms.

However, declaring v as a vector <tuple<int, int, int>> beat out both of the above options, with an average runtime of approximately 100 ms.

I can somewhat understand why the array option is faster than the vector option (array size is a constant expression, unlike the vector), but I have no idea why the tuple would beat both of them. Can somebody please explain this to me?

The code that fills the tuple <int, int, int>s is

srand(time(nullptr));
for(int i = 0; i < 1000000; ++i){
    get <0> (v[i]) = rand();
    get <1> (v[i]) = rand();
    get <2> (v[i]) = rand();
}
Ground answered 6/10, 2020 at 19:1 Comment(8)
My guess it has to do with how operator < is defined for vector, array, and tuple. With vector and array you need a loop. tuple probably uses a fold operation which while it has the same number of comparisons, does not have the loop overhead.Doody
Show the code that fills the vector of tuples. Also it might be better to srand(0) to have repeatable results.Stephenson
@Doody The reducing loop overhead makes sense, but I don't know what a fold operation is. Can you post a link or two that explains these operations?Ground
See this for what a fold expression is.Doody
Also, a vector points into a dynamically allocated memory which is much worse for cache utilization. Vector of arrays stores all the data contiguously. Moreover, swapping two vectors involve 48 bytes (on 64bit arch), while swapping of arrays only a half in this case.Show
Internally swaps are performed. Maybe the amount of memory to be swapped is just lower in the tuple caseAlkahest
@Doody You need a loop for an array, but I would also expect this loop to be unrolled in this case.Show
@DanielLangr I'm talking in the more general case. For larger arrays you wouldn't be able to, or the compiler wont, unroll the loop. For the tuple, there is never a loop.Doody
W
8

While the disassembly for the whole programs are too large, this demonstrates the core difference between operator< for array and tuple: https://godbolt.org/z/h1Y33e

Essentially, in the tuple version, you have a fixed comparison of 3 elements whereas in the array version, you have a loop.

Though I'm surprised that the compiler did not unroll the loop.

Edit: looks like clang does optimize them to both, non-loop code: https://godbolt.org/z/cMExTb (I did not fully read it, but I only see forward jumps)

Wagon answered 6/10, 2020 at 19:12 Comment(3)
Might be interesting also to compare assembly for swap operations: godbolt.org/z/sGsK7Y.Show
array should be equal to tuple in perf in this specific case (small array) but may lose if the array is larger.Ibson
For the clang asm, the array and tuple versions have the exact same instructions, but because libstdc++'s std::tuple's layout is backwards, they read the three ints in a different order.Ixtle

© 2022 - 2024 — McMap. All rights reserved.