Why is iterating an std::array much faster than iterating an std::vector?
Asked Answered
B

3

3

Editor's note:
Followup question with optimization enabled that times only the loop:
Why is iterating though `std::vector` faster than iterating though `std::array`?
where we can see the effect of lazy-allocation page faults in reading uninitialized BSS memory vs. dynamically-allocated + written memory that's initialized outside the timed loop.


I tried profiling this code:

#include <vector>
#include <array>
#include <stdio.h>

using namespace std;

constexpr int n = 400'000'000;
//vector<int> v(n);
array<int, n> v;

int main()
{
    int res = 0;
    for(int x : v)
        res += x;
    printf("%d\n", res);
}

On my machine, the array version is faster than vector.

The memory allocation is irrelevant in this case as it's only once.

$ g++ arrVsVec.cpp -O3
$ time ./a.out
0

real    0m0,445s
user    0m0,203s
sys 0m0,238s
$ g++ arrVsVec.cpp -O3
$ time ./a.out
0

real    0m0,749s
user    0m0,273s
sys 0m0,476s

I have found the disassembly is much more complicated for std::vector: https://godbolt.org/z/111L5G

Beating answered 20/7, 2019 at 12:10 Comment(8)
What OS, compiler version? What do you mean by "release mode", what compile flags did you use? Most importantly, how are you measuring the time?Jessen
You are measuring the time needed to allocate the vector, which is not insignificant. Allocating a std::array is free (takes zero time).Callender
compile with g++ -O3 arrVsVec.cpp. Non-optimized "benchmarks" are completely worthlessKristoforo
Have you even read the question? I mentioned memory allocation is irrelevant. It's only once. reservedoesn't make any sense in this example as I'm allocating all the memory at once alreadyBeating
This is a good benchmarkMe
Performing bench tests can be tricky. It is better to time the specific code in the program itself and you need to make sure you use the results (or the result of some test that relies on the tested code) so the compiler doesn't optimize the code away. You also need to compile with optimizations enabled. Here is an example stackoverflow.com/questions/33333363/…Callender
@Beating How do you know memory allocation is irrelevant? Did you measure how long it takes?Callender
@Callender Yes, I didBeating
J
7

Answer for optimization on (g++ -O2):

You're not using the end result, so the compiler is free to optimize the entire loop out.

Which is what happens in the array case.

main:
        xor     eax, eax
        ret

But the vector allocates and deallocates heap memory, which complicates the optimization and the compilers tend to play it safe and leave it in place:

main:
        xor     eax, eax
        ret
_GLOBAL__sub_I_v:
        sub     rsp, 8
        mov     edi, 400000000
        mov     QWORD PTR v[rip], 0
        mov     QWORD PTR v[rip+8], 0
        mov     QWORD PTR v[rip+16], 0
        call    operator new(unsigned long)
        lea     rdx, [rax+400000000]
        mov     QWORD PTR v[rip], rax
        mov     QWORD PTR v[rip+16], rdx
.L6:
        mov     DWORD PTR [rax], 0
        add     rax, 4
        cmp     rdx, rax
        jne     .L6
        mov     QWORD PTR v[rip+8], rdx
        mov     esi, OFFSET FLAT:v
        mov     edx, OFFSET FLAT:__dso_handle
        mov     edi, OFFSET FLAT:_ZNSt6vectorIiSaIiEED1Ev
        add     rsp, 8
        jmp     __cxa_atexit

So that's why the array version is faster in this particular case. In a more real-life application the difference wouldn't be that dramatic, and will mostly come down to the heap allocation/deallocation time for the vector case.

Answer for optimization off (g++):

Don't benchmark stuff compiled without optimization.

The difference is probably due to the vector iterator not being inlined. So accessing vector elements in debug more incurs an extra indirection compared to array access.

Jessen answered 20/7, 2019 at 12:24 Comment(4)
The point being the version with std::array is actually an undefined behavior. You (the OP) are summing a bunch of uninitialized value. The result, this benchmark is completely useless.Me
@BiagioFesta actually not. It has static storage and is therefore zero initialised.Yunyunfei
You are right, in the disassembly it was showing that functions were not inlined.Beating
@Beating and rustyx: clang++ with -stdlib=libc++ instead of the default libstdc++ can optimize away new/delete or construction/destruction of a std::vector. Not sure if it will for a vector at global scope, though. It's a tricky optimization anyway because new is "replaceable" in ISO C++, so new can be considered a visible side-effect unless language in the standard removes that possibility. But yeah, +1 for not benchmarking debug-mode. C++ standard libraries heavily depend on aggressive inlining from optimizing compilers.Superabundant
Y
6

How I compile:

g++ arrVsVec.cpp

Why is iterating an std::array much faster than iterating an std::vector?

Because you did not compile with optimisations enabled.

Furthermore, you don't use the result of the iteration for anything, and the entire computation uses compile time constant inputs, so even if you did enable optimisation, then the loop would probably be optimised away, and you would then only be measuring the dynamic allocation versus not dynamic allocation. Hint: Performing a dynamic allocation is infinitely slower than doing nothing.

So, to conclude:

  • Unoptimised binaries are slow because of lack of optimisation; Measure optimised binaries
  • If you intend to measure the speed of iteration, then measure the iteration only; Don't measure memory allocation.
  • Prefer to avoid compile time constant input.
  • Use the result of the measured code so that it cannot be optimised away.
Yunyunfei answered 20/7, 2019 at 12:24 Comment(0)
I
6

You're not using result, you're initializing vector to zeroes and you didn't enable optimizations. Once you do:

int main()
{
    unsigned int res = 0;
    for(int &x : v)
        x = rand();

    for(int i = 0; i < 128; ++i) {
        for(int x : v)
            res += (unsigned int)x;
    }
    std::cout << res;
}

times are identical:

Success #stdin #stdout 0.62s 54296KB
-2043861760
Success #stdin #stdout 0.62s 54296KB
-2043861760

EDIT: changed res type to be unsigned int to avoid undefined behaviour.

Isreal answered 20/7, 2019 at 12:29 Comment(2)
Arithmetic overflow has undefined behaviour. That's a possibility with your random values.Yunyunfei
You're right about undefined behaviour, but when changed to unsigned int results stay the same.Hydr

© 2022 - 2024 — McMap. All rights reserved.