C++11 vector<bool> performance issue (with code example)
Asked Answered
G

3

26

I notice that vector is much slower than bool array when running the following code.

int main() 
{
    int count = 0;
    int n = 1500000;
    // slower with c++ vector<bool>
    /*vector<bool> isPrime;
    isPrime.reserve(n);
    isPrime.assign(n, true);
    */
    // faster with bool array 
    bool* isPrime = new bool[n];

    for (int i = 0; i < n; ++i)
        isPrime[i] = true;


    for (int i = 2; i< n; ++i) {
        if (isPrime[i])
            count++;
        for (int j =2; i*j < n; ++j )
            isPrime[i*j] = false;
    }

    cout <<  count << endl;
    return 0;
}

Is there some way that I can do to make vector<bool> faster ? Btw, both std::vector::push_back and std::vector::emplace_back are even slower than std::vector::assign.

Goings answered 29/4, 2016 at 7:52 Comment(4)
you are accessing isPrime beyond its end, it should be new bool[n]Zeba
Don't use vector<bool> if you're super-concerned about performance. It's required by the standard to be very space efficient, and that has a performance cost.Baloney
How much of a slowdown are you talking about? You might want to add some timing examples to make this question more appealing.Iatric
You did compile with optimisation enabled?Irvinirvine
H
22

std::vector<bool> can have various performance issues (e.g. take a look at https://isocpp.org/blog/2012/11/on-vectorbool).

In general you can:

  • use std::vector<std::uint8_t> instead of std::vector<bool> (give a try to std::valarray<bool> also).

    This requires more memory and is less cache-friendly but there isn't a overhead (in the form of bit manipulation) to access a single value, so there are situations in which it works better (after all it's just like your array of bool but without the nuisance of memory management)

  • use std::bitset if you know at compile time how large your boolean array is going to be (or if you can at least establish a reasonable upper bound)
  • if Boost is an option try boost::dynamic_bitset (the size can be specified at runtime)

But for speed optimizations you have to test...

With your specific example I can confirm a performance difference only when optimizations are turned off (of course this isn't the way to go).

Some tests with g++ v4.8.3 and clang++ v3.4.5 on an Intel Xeon system (-O3 optimization level) give a different picture:

                    time (ms)
                 G++      CLANG++
array of bool    3103     3010
vector<bool>     2835     2420    // not bad!
vector<char>     3136     3031    // same as array of bool
bitset           2742     2388    // marginally better

(time elapsed for 100 runs of the code in the answer)

std::vector<bool> doesn't look so bad (source code here).

Hoke answered 29/4, 2016 at 8:29 Comment(7)
"Xeon" can be anything from P4 to Skylake. Saying which Xeon (e.g. Haswell Xeon, or Exxxx v3) would be much more informative. Those are pretty old compiler versions for modern hardware, too (not as big a deal if you aren't auto-vectorizing or using -march=native).Hardandfast
@PeterCordes you're right. It's a Xeon e3-1230v3 (using -march=native switch). In my defence I would add that it wasn't meant to be a complete examination, but I'll add add test code and some more results as soon as I can.Hoke
You don't have to make a big deal out of timing this. Just any time you post any perf numbers, the specific microarch and compiler version + options are essential. e.g. clang++ 3.4.5 on a Haswell Xeon, -O3 -march=native would cover it. BTW, clang 3.7.1 (current stable) auto-vectorizes for AVX2 significantly better than 3.4, in terms of the quality of the asm. IDK how often that makes a perf diff, since memory bottlenecks often hide CPU bottlenecks. llvm.org/apt has current versions of clang for debian-based Linux distros.Hardandfast
I confirm that the performance benefit of std::vector<bool> is not only observed on x86 processors, but also on my raspberry pi with gcc 8.3, with just a simple -O2 switch. It's like 0.5s vs. 2.5s difference, with std::vector<bool> being 0.5s.Grating
Just a benchmark that shows this performance discrepancy between std::vector<bool>, std::vector<char>, and std::vector<uint8_t>: quick-bench.com/q/xQOGMNDPvzT1dU2OF_MQAZBXNzUPentalpha
@Pentalpha Thank you for sharing this tool, I wasn't aware of it. The code you tested manipulates smaller vectors and has a different memory access pattern. I tried with the original code and vector<bool> seems good: quick-bench.com/q/iJYcweMEPr69QTvSqEQcUGd8dnMHoke
@Hoke vector<bool> seems faster than vector<char> in your example probably due to caching. vector<bool> is a (slower but) "space-efficient" specialization of vector. In your example with n=1500000, vector<bool> (~190KB) probably fits in Li cache while vector<char> (~1.5MB) not. Thus, even though vector<bool> is slower, it uses a much faster memory and in the end seems faster. Rerun your benchmark with n=1500, 15000, 150000, .... and you'll see that when both containers fit in cache, vector<bool> is slower.Pentalpha
S
10

vector<bool> may have a template specialization and may be implemented using bit array to save space. Extracting and saving a bit and converting it from / to bool may cause the performance drop you are observing. If you use std::vector::push_back, you are resizing the vector which will cause even worse performance. Next performance killer may be assign (Worst complexity: Linear of first argument), instead use operator [] (Complexity: constant).

On the other hand, bool [] is guaranteed to be array of bool.

And you should resize to n instead of n-1 to avoid undefined behaviour.

Swerve answered 29/4, 2016 at 7:54 Comment(5)
It not only "may have" such a specialization, this is actually in the standard!Iatric
One workaround is to use deque<bool>. Or vector<My_bool>. :)Rochkind
@Mohit Jain: template specialization should happen at compile time. It should affect run time performance, isn't it ?Goings
@Goings Implication of specialization would affect run time performance (for good or bad). In this case it may be good for space requirement and bad for execution time.Swerve
@Goings You choose between vector and list before compile time, don't you? Yet it affects runtime performance.Blowout
C
8

vector<bool> can be high performance, but isn't required to be. For vector<bool> to be efficient, it needs to operate on many bools at a time (e.g. isPrime.assign(n, true)), and the implementor has had to put loving care into it. Indexing individual bools in a vector<bool> is slow.

Here is a prime finder that I wrote a while back using vector<bool> and clang + libc++ (the libc++ part is important):

#include <algorithm>
#include <chrono>
#include <iostream>
#include <vector>

std::vector<bool>
init_primes()
{
    std::vector<bool> primes(0x80000000, true);
    primes[0] = false;
    primes[1] = false;
    const auto pb = primes.begin();
    const auto pe = primes.end();
    const auto sz = primes.size();
    size_t i = 2;
    while (true)
    {
        size_t j = i*i;
        if (j >= sz)
            break;
        do
        {
            primes[j] = false;
            j += i;
        } while (j < sz);
        i = std::find(pb + (i+1), pe, true) - pb;
    }
    return primes;
}

int
main()
{
    using namespace std::chrono;
    using dsec = duration<double>;
    auto t0 = steady_clock::now();
    auto p = init_primes();
    auto t1 = steady_clock::now();
    std::cout << dsec(t1-t0).count() << "\n";
}

This executes for me in about 28s (-O3). When I change it to return a vector<char> instead, the execution time goes up to about 44s.

If you run this using some other std::lib, you probably won't see this trend. On libc++ algorithms such as std::find have been optimized to search a word of bits at a time, instead of bit at a time.

See http://howardhinnant.github.io/onvectorbool.html for more details on what std algorithms could be optimized by your vendor.

Cocainism answered 29/4, 2016 at 18:11 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.