Why is sorting a std::vector of std::tuple's faster than sorting a vector of std::arrays? - McMap

About

Why is sorting a std::vector of std::tuple's faster than sorting a vector of std::arrays?

Asked 6/10, 2020 at 19:1 Answered 6/10, 2020 at 19:12

Solved c++arrays sorting vector tuples

G

1

7

I was curious to see whether sorting a vector <vector<int>> would be slower than sorting a vector <array <int, 3>>. The dimensions of the vector is 1000000 by 3, and below is my driver code implementing this:

#include <iostream>
#include <fstream>
#include <cmath>
#include <algorithm>
#include <vector>

using namespace std;

int main()
{
    vector <vector<int>> v(1000000, vector <int> (3));

    srand(time(nullptr));
    for(int i = 0; i < 1000000; ++i){
        for(int j = 0; j < 3; ++j){
            v[i][j] = rand();
        }
    }

    double start = clock();
    sort(v.begin(), v.end());
    cout << (clock()-start)/(CLOCKS_PER_SEC/1000) << endl;

    return 0;
}

Compiling with g++ -O3 sorting_test.cxx with gcc 7.5.0, I get a runtime of around 300 ms. Declaring v as a vector <array <int, 3>> halved the runtime to around 149 ms.

However, declaring v as a vector <tuple<int, int, int>> beat out both of the above options, with an average runtime of approximately 100 ms.

I can somewhat understand why the array option is faster than the vector option (array size is a constant expression, unlike the vector), but I have no idea why the tuple would beat both of them. Can somebody please explain this to me?

The code that fills the tuple <int, int, int>s is

srand(time(nullptr));
for(int i = 0; i < 1000000; ++i){
    get <0> (v[i]) = rand();
    get <1> (v[i]) = rand();
    get <2> (v[i]) = rand();
}

Ground answered 6/10, 2020 at 19:1 Comment(8)

My guess it has to do with how operator < is defined for vector, array, and tuple. With vector and array you need a loop. tuple probably uses a fold operation which while it has the same number of comparisons, does not have the loop overhead. – Doody 6/10, 2020 at 19:7

Show the code that fills the vector of tuples. Also it might be better to srand(0) to have repeatable results. – Stephenson 6/10, 2020 at 19:8

@Doody The reducing loop overhead makes sense, but I don't know what a fold operation is. Can you post a link or two that explains these operations? – Ground 6/10, 2020 at 19:9

See this for what a fold expression is. – Doody 6/10, 2020 at 19:10

Also, a vector points into a dynamically allocated memory which is much worse for cache utilization. Vector of arrays stores all the data contiguously. Moreover, swapping two vectors involve 48 bytes (on 64bit arch), while swapping of arrays only a half in this case. – Show 6/10, 2020 at 19:10

Internally swaps are performed. Maybe the amount of memory to be swapped is just lower in the tuple case – Alkahest 6/10, 2020 at 19:12

@Doody You need a loop for an array, but I would also expect this loop to be unrolled in this case. – Show 6/10, 2020 at 19:19

@DanielLangr I'm talking in the more general case. For larger arrays you wouldn't be able to, or the compiler wont, unroll the loop. For the tuple, there is never a loop. – Doody 6/10, 2020 at 19:20

W

8

While the disassembly for the whole programs are too large, this demonstrates the core difference between operator< for array and tuple: https://godbolt.org/z/h1Y33e

Essentially, in the tuple version, you have a fixed comparison of 3 elements whereas in the array version, you have a loop.

Though I'm surprised that the compiler did not unroll the loop.

Edit: looks like clang does optimize them to both, non-loop code: https://godbolt.org/z/cMExTb (I did not fully read it, but I only see forward jumps)

Wagon answered 6/10, 2020 at 19:12 Comment(3)

Might be interesting also to compare assembly for swap operations: godbolt.org/z/sGsK7Y. – Show 6/10, 2020 at 19:15

array should be equal to tuple in perf in this specific case (small array) but may lose if the array is larger. – Ibson 6/10, 2020 at 19:28

For the clang asm, the array and tuple versions have the exact same instructions, but because libstdc++'s std::tuple's layout is backwards, they read the three ints in a different order. – Ixtle 6/10, 2020 at 20:47

Recommended topics

#Godot #Unity #Godot 4.X #Mongodb

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

© 2022 - 2024 — McMap. All rights reserved.