Is using a vector of boolean values slower than a dynamic bitset?
Asked Answered
C

5

25

Is using a vector of boolean values slower than a dynamic bitset?

I just heard about boost's dynamic bitset, and I was wondering is it worth the trouble. Can I just use vector of boolean values instead?

Churchy answered 24/5, 2013 at 15:29 Comment(1)
Since it depends on how you use it, the relative size of your container and CPU cache(s), and probably various other factors ... what happened when you benchmarked it?Frasco
F
50

A great deal here depends on how many Boolean values you're working with.

Both bitset and vector<bool> normally use a packed representation where a Boolean is stored as only a single bit.

On one hand, that imposes some overhead in the form of bit manipulation to access a single value.

On the other hand, that also means many more of your Booleans will fit in your cache.

If you're using a lot of Booleans (e.g., implementing a sieve of Eratosthenes) fitting more of them in the cache will almost always end up a net gain. The reduction in memory use will gain you a lot more than the bit manipulation loses.

Most of the arguments against std::vector<bool> come back to the fact that it is not a standard container (i.e., it does not meet the requirements for a container). IMO, this is mostly a question of expectations -- since it says vector, many people expect it to be a container (other types of vectors are), and they often react negatively to the fact that vector<bool> isn't a container.

If you're using the vector in a way that really requires it to be a container, then you probably want to use some other combination -- either deque<bool> or vector<char> can work fine. Think before you do that though -- there's a lot of (lousy, IMO) advice that vector<bool> should be avoided in general, with little or no explanation of why it should be avoided at all, or under what circumstances it makes a real difference to you.

Yes, there are situations where something else will work better. If you're in one of those situations, using something else is clearly a good idea. But, be sure you're really in one of those situations first. Anybody who tells you (for example) that "Herb says you should use vector<char>" without a lot of explanation about the tradeoffs involved should not be trusted.

Let's give a real example. Since it was mentioned in the comments, let's consider the Sieve of Eratosthenes:

#include <vector>
#include <iostream>
#include <iterator>
#include <chrono>

unsigned long primes = 0;

template <class bool_t>
unsigned long sieve(unsigned max) {
    std::vector<bool_t> sieve(max, false);
    sieve[0] = sieve[1] = true;

    for (int i = 2; i < max; i++) {
        if (!sieve[i]) {
            ++primes;
            for (int temp = 2 * i; temp < max; temp += i)
                sieve[temp] = true;
        }
    }
    return primes;
}

// Warning: auto return type will fail with older compilers
// Fine with g++ 5.1 and VC++ 2015 though.
//
template <class F>
auto timer(F f, int max) {
    auto start = std::chrono::high_resolution_clock::now();
    primes += f(max);
    auto stop = std::chrono::high_resolution_clock::now();

    return stop - start;
}

int main() {
    using namespace std::chrono;

    unsigned number = 100000000;

    auto using_bool = timer(sieve<bool>, number);
    auto using_char = timer(sieve<char>, number);

    std::cout << "ignore: " << primes << "\n";
    std::cout << "Time using bool: " << duration_cast<milliseconds>(using_bool).count() << "\n";
    std::cout << "Time using char: " << duration_cast<milliseconds>(using_char).count() << "\n";
}

We've used a large enough array that we can expect a large portion of it to occupy main memory. I've also gone to a little pain to ensure that the only thing that changes between one invocation and the other is the use of a vector<char> vs. vector<bool>. Here are some results. First with VC++ 2015:

ignore: 34568730
Time using bool: 2623
Time using char: 3108

...then the time using g++ 5.1:

ignore: 34568730
Time using bool: 2359
Time using char: 3116

Obviously, the vector<bool> wins in both cases--by around 15% with VC++, and over 30% with gcc. Also note that in this case, I've chosen the size to show vector<char> in quite favorable light. If, for example, I reduce number from 100000000 to 10000000, the time differential becomes much larger:

ignore: 3987474
Time using bool: 72
Time using char: 249

Although I haven't done a lot of work to confirm, I'd guess that in this case, the version using vector<bool> is saving enough space that the array fits entirely in the cache, while the vector<char> is large enough to overflow the cache, and involve a great deal of main memory access.

Fulmination answered 24/5, 2013 at 15:46 Comment(14)
I'd say that if you're using a std::vector<bool> correctly and you're aware of how it differs from other std::vectors, then it's a good idea to at least add a typedef to give it some more appropriate name. This could avoid confusion for future readers who may not be aware of this. Do you agree? (+1 for bringing up a very good alternate view)Spang
@Agentlien: Whether a typedef makes sense will depend on whether you really have a better name to give to the type. For a sieve of Eratosthenes, for example, I think just vector<bool> is fine.Fulmination
@JeffreyCoffin: Yes, for something like that I wouldn't bother either.Spang
The first link shows for multiple algorithms, that when written in single access fashion, they are significantly faster with booleans than bits. Here is Hinnant's concluding quote: "If such attention to detail is not given, then the space optimization leads to a very significant speed pessimization." The "attention to detail" he is referring to is quite specifically about writing these algorithms so that they use bit operations and not single access. I don't know how this could be more clear. The second link I did not look at carefully, so you may well be right.Gereld
Nice. The exact ratio between 1 byte vs. 1 bit elements is very dependent on the problem size vs. cache size, but also on exactly how the compiler implements the bit updates and on the CPU microarchitecture. (e.g. x86's bts reg, reg is an efficient way to set a bit in a register, but bts [mem], reg sucks because of crazy CISC semantics. or [mem], reg is efficient too, but requires a shift to set up the source.) I like that your answer highlights that a different compiler makes a big difference.Mcgray
@NirFriedman: The main point of the article you linked (Howard Hinnant's "On vector<bool>") is that a dynamic bitset / bitmap is a useful data structure, especially when the standard library provides an implementation with platform-specific optimizations. The only mistake was in naming it vector<bool>. I totally agree with both of these points, but we're kind of stuck with vector<bool> being a bitmap now, so at this point it's just one of C++'s warts. Use it when it's the right choice, and don't avoid it because of the name.Mcgray
and they often react negatively to the fact that vector<bool> isn't a container. vector<bool> is a real problem for generic code that creates a vector<T> and breaks when instantiated with T=bool. Rare, but definitely a wart.Mcgray
The question was about the difference between vector<bool> and bitset, not vector<char>. As much as your considerations are extremely relevant for applications, as much they are completely irrelevant for the precise question about the difference between vector<bool> and bitset.Kassel
@Max: The question talks about a "dynamic bitset". std::bitset isn't dynamic (its size is a template parameter), leaving std::vector<bool> as the only dynamic buttery type available. At least in my testing, Body dynamic bitset is fairly similar to vector<bool> other than the name. The same basic considerations apply to both.Fulmination
@JerryCoffin The question explicitly talks about "boost's dynamic bitset". Admittedly, adding two colons and an underscore would have made that clearer: boost::dynamic_bitset . This answer does not mention boost::dynamic_bitset (or any dynamic bitset) a single time, so I'm having trouble understanding why it was accepted. I came here because I have the same question. I stay uneducated.Inborn
@sebrockm: As the very beginning of its documentation says: "The dynamic_bitset class is nearly identical to the std::bitset class." In terms of speed (which is what the question was really about) substantial differences between the two would be quite unexpected (though implementations of std::bitset could vary, of course). And as noted in the comment immediately above yours, testing indicates that vector<bool> is similar in speed as well.Fulmination
@JerryCoffin Don't get me wrong, your answer is great in terms of detail and analysis. The issue is that it answers the wrong question. The question is not "Which one is faster, std::vector<bool> or std::vector<char>?". The question is "Which one is faster, std::vector<bool> or boost::dynamic_bitset?" If you are saying "substantial differences between the two would be quite unexpected", great, add that to your answer and it technically becomes a valid answer to the OP.Inborn
@sebrockm: Well, no. What wasked was: "Is using a vector of boolean values slower than a dynamic bitset?" std::vector<bool> is, in fact, a dynamic bitset, so I answered precisely the question he asked. He did mention Boost's dynamic bitset as well, but never actually asked how it compared to anything.Fulmination
@JerryCoffin Well, then we are interpreting the question differently. For me, it's obvious that OP was not aware that vector<bool> is a dynamic bitset and hence asked for a comparison. In particular, he also asked "Can I just use vector of boolean values instead [of boost::dynamic_bitset]?". I don't see how you answered that question. Also, I don't see how a lengthy comparison of vector<bool> and vector<char> (which is not mentioned in the question at all) is "precisely" answering the question. But then, OP accepted your answer, so apparently your interpretation is correct.Inborn
S
14

You should usually avoid std::vector<bool> because it is not a standard container. It's a packed version, so it breaks some valuable guarantees usually given by a vector. A valid alternative would be to use std::vector<char> which is what Herb Sutter recommends.

You can read more about it in his GotW on the subject.

Update:

As has been pointed out, vector<bool> can be used to good effect, as a packed representation improves locality on large data sets. It may very well be the fastest alternative depending on circumstances. However, I would still not recommend it by default since it breaks many of the promises established by std::vector and the packing is a speed/memory tradeoff which may be beneficial in both speed and memory.

If you choose to use it, I would do so after measuring it against vector<char> for your application. Even then, I'd recommend using a typedef to refer to it via a name which does not seem to make the guarantees which it does not hold.

Spang answered 24/5, 2013 at 15:32 Comment(3)
There are some warts to vector<bool>, e.g. that a reference to an element isn't a simple bool&. There was a debate about deprecating vector<bool> and/or moving the dynamic-bitmap functionality to another name. One interesting idea was to add vector<std::real_bool> for cases where you really want a vector of proper bool elements. (groups.google.com/a/isocpp.org/d/msg/std-proposals/8kQzWI61ROU/…), which would allow us to have that type without de-optimizing existing code that uses vector<bool> for what it's good for.Mcgray
The question was to compare vector<bool> to bitset.Kassel
@Kassel boost::dynamic_bitset*Coldblooded
A
2
#include "boost/dynamic_bitset.hpp"
#include <chrono>
#include <iostream>
#include <random>
#include <vector>

int main(int, char*[])
{
    auto gen = std::bind(std::uniform_int_distribution<>(0, 1), std::default_random_engine());
    std::vector<char> randomValues(1000000);
    for (char & randomValue : randomValues)
    {
        randomValue = static_cast<char>(gen());
    }

    // many accesses, few initializations
    auto start = std::chrono::high_resolution_clock::now();
    for (int i = 0; i < 500; ++i)
    {
        std::vector<bool> test(1000000, false);
        for (int j = 0; j < test.size(); ++j)
        {
            test[j] = static_cast<bool>(randomValues[j]);
        }
    }
    auto end = std::chrono::high_resolution_clock::now();
    std::cout << "Time taken1: " << std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count()
              << " milliseconds" << std::endl;
    auto start2 = std::chrono::high_resolution_clock::now();
    for (int i = 0; i < 500; ++i)
    {
        boost::dynamic_bitset<> test(1000000, false);
        for (int j = 0; j < test.size(); ++j)
        {
            test[j] = static_cast<bool>(randomValues[j]);
        }
    }
    auto end2 = std::chrono::high_resolution_clock::now();
    std::cout << "Time taken2: " << std::chrono::duration_cast<std::chrono::milliseconds>(end2 - start2).count()
              << " milliseconds" << std::endl;
    auto start3 = std::chrono::high_resolution_clock::now();
    for (int i = 0; i < 500; ++i)
    {
        std::vector<char> test(1000000, false);
        for (int j = 0; j < test.size(); ++j)
        {
            test[j] = static_cast<bool>(randomValues[j]);
        }
    }
    auto end3 = std::chrono::high_resolution_clock::now();
    std::cout << "Time taken3: " << std::chrono::duration_cast<std::chrono::milliseconds>(end3 - start3).count()
              << " milliseconds" << std::endl;

    // few accesses, many initializations
    auto start4 = std::chrono::high_resolution_clock::now();
    for (int i = 0; i < 1000000; ++i)
    {
        std::vector<bool> test(1000000, false);
        for (int j = 0; j < 500; ++j)
        {
            test[j] = static_cast<bool>(randomValues[j]);
        }
    }
    auto end4 = std::chrono::high_resolution_clock::now();
    std::cout << "Time taken4: " << std::chrono::duration_cast<std::chrono::milliseconds>(end4 - start4).count()
              << " milliseconds" << std::endl;
    auto start5 = std::chrono::high_resolution_clock::now();
    for (int i = 0; i < 1000000; ++i)
    {
        boost::dynamic_bitset<> test(1000000, false);
        for (int j = 0; j < 500; ++j)
        {
            test[j] = static_cast<bool>(randomValues[j]);
        }
    }
    auto end5 = std::chrono::high_resolution_clock::now();
    std::cout << "Time taken5: " << std::chrono::duration_cast<std::chrono::milliseconds>(end5 - start5).count()
              << " milliseconds" << std::endl;
    auto start6 = std::chrono::high_resolution_clock::now();
    for (int i = 0; i < 1000000; ++i)
    {
        std::vector<char> test(1000000, false);
        for (int j = 0; j < 500; ++j)
        {
            test[j] = static_cast<bool>(randomValues[j]);
        }
    }
    auto end6 = std::chrono::high_resolution_clock::now();
    std::cout << "Time taken6: " << std::chrono::duration_cast<std::chrono::milliseconds>(end6 - start6).count()
              << " milliseconds" << std::endl;
    return EXIT_SUCCESS;
}

Time taken1: 1821 milliseconds

Time taken2: 1722 milliseconds

Time taken3: 25 milliseconds

Time taken4: 1987 milliseconds

Time taken5: 1993 milliseconds

Time taken6: 10970 milliseconds

dynamic_bitset = std::vector<bool>

if you allocate many times but you only access the array that you created few times, go for std::vector<bool> because it has lower allocation/initialization time.

if you allocate once and access many times, go for std::vector<char>, because of faster access

Also keep in mind that std::vector<bool> is NOT safe to be used is in multithreading because you might write to different bits but it might be the same byte.

Alkalinity answered 15/4, 2022 at 17:14 Comment(1)
For multithreading it should be an atomic type anyway. It's not like std::vector<char> for instance is thread-safe.Undulation
B
1

It appears that the size of a dynamic bitset cannot be changed: "The dynamic_bitset class is nearly identical to the std::bitset class. The difference is that the size of the dynamic_bitset (the number of bits) is specified at run-time during the construction of a dynamic_bitset object, whereas the size of a std::bitset is specified at compile-time through an integer template parameter." (from http://www.boost.org/doc/libs/1_36_0/libs/dynamic_bitset/dynamic_bitset.html) As such, it should be slightly faster since it will have slightly less overhead than a vector, but you lose the ability to insert elements.

Bolen answered 24/5, 2013 at 15:37 Comment(2)
The statement in the library description is misleading, boost::dynamic_bitset can be resized.Radicle
...and you can also push_back or append elements to it.Radicle
S
0

UPDATE: I just realize that OP was asking about vector<bool> vs bitset, and my answer does not answer the question, but I think I should leave it, if you search for c++ vector bool slow, you end up here.

vector<bool> is terribly slow. At least on my Arch Linux system (you can probably get a better implementation or something... but I was really surprised). If anybody has any suggestions why this is so slow, I'm all ears! (Sorry for the blunt beginning, here's the more professional part.)

I've written two implementations of the SOE, and the 'close to metal' C implementation is 10 times faster. sievec.c is the C implementation, and sievestl.cpp is the C++ implementation. I just compiled with make (implicit rules only, no makefile): and the results were 1.4 sec for the C version, and 12 sec for the C++/STL version:

sievecmp % make -B sievec && time ./sievec 27
cc     sievec.c   -o sievec
aa 1056282
./sievec 27  1.44s user 0.01s system 100% cpu 1.455 total

and

sievecmp % make -B sievestl && time ./sievestl 27
g++     sievestl.cpp   -o sievestl
1056282./sievestl 27  12.12s user 0.01s system 100% cpu 12.114 total

sievec.c is as follows:

#include <stdio.h>
#include <stdlib.h>

typedef unsigned long prime_t;
typedef unsigned long word_t;
#define LOG_WORD_SIZE 6

#define INDEX(i) ((i)>>(LOG_WORD_SIZE))
#define MASK(i) ((word_t)(1) << ((i)&(((word_t)(1)<<LOG_WORD_SIZE)-1)))
#define GET(p,i) (p[INDEX(i)]&MASK(i))
#define SET(p,i) (p[INDEX(i)]|=MASK(i))
#define RESET(p,i) (p[INDEX(i)]&=~MASK(i))
#define p2i(p) ((p)>>1) // (((p-2)>>1)) 
#define i2p(i) (((i)<<1)+1) // ((i)*2+3)

unsigned long find_next_zero(unsigned long from,
                    unsigned long *v,
                    size_t N){
  size_t i;
  for (i = from+1; i < N; i++) {
    if(GET(v,i)==0) return i;
  }

  return -1;
}

int main(int argc, char *argv[])
{

  size_t N = atoi(argv[1]);
  N = 1lu<<N;
  // printf("%u\n",N);
  unsigned long *v = malloc(N/8);
  for(size_t i = 0; i < N/64; i++) v[i]=0;

  unsigned long p = 3;
  unsigned long pp = p2i(p * p);

  while( pp <= N){

    for(unsigned long q = pp; q < N; q += p ){
      SET(v,q);
    }
    p = p2i(p);
    p = find_next_zero(p,v,N);
    p = i2p(p);
    pp = p2i(p * p);
  }

  unsigned long sum = 0;
  for(unsigned long i = 0; i+2 < N; i++)
    if(GET(v,i)==0 && GET(v,i+1)==0) {
      unsigned long p = i2p(i);
      // cout << p << ", " << p+2 << endl;
      sum++;
    }
  printf("aa %lu\n",sum);
  // free(v);
  return 0;
}

sievestl.cpp is as follows:

#include <iostream>
#include <vector>
#include <sstream>

using namespace std;

inline unsigned long i2p(unsigned long i){return (i<<1)+1; }
inline unsigned long p2i(unsigned long p){return (p>>1); }
inline unsigned long find_next_zero(unsigned long from, vector<bool> v){
  size_t N = v.size();
  for (size_t i = from+1; i < N; i++) {
    if(v[i]==0) return i;
  }

  return -1;
}

int main(int argc, char *argv[])
{
  stringstream ss;
  ss << argv[1];
  size_t N;
  ss >> N;
  N = 1lu<<N;
  // cout << N << endl;
  vector<bool> v(N);

  unsigned long p = 3;
  unsigned long pp = p2i(p * p);

  while( pp <= N){

    for(unsigned long q = pp; q < N; q += p ){
      v[q] = 1;
    }
    p = p2i(p);
    p = find_next_zero(p,v);
    p = i2p(p);
    pp = p2i(p * p);
  }

  unsigned sum = 0;
  for(unsigned long i = 0; i+2 < N; i++)
    if(v[i]==0 and v[i+1]==0) {
      unsigned long p = i2p(i);
      // cout << p << ", " << p+2 << endl;
      sum++;
    }
  cout << sum;
  return 0;
}
Shortly answered 9/2, 2016 at 21:51 Comment(4)
We're also facing possible performance issues with libstdc++'s vector<bool>. The profile build + callgrind suggests copying them is very slow. Looking at the source code, it seems that it's not optimized and the copying is done "bit by bit". I don't know if GCC is smart enough to optimize this at -O3, though (our release setting). We're currently looking into that. I certainly understand if the lib devs don't want to invest a lot of effort into a type that is considered – under this name – deprecated by many well-known community figures.Officialism
vector<bool> is slow if you compile without optimization. Nothing to see here. g++ sievestl.cpp -o sievestl defaults to -O0: compile fast, no function inlining, and spill to memory after every statement to support modifying variables with a debugger. -O0 hurts some code more than others, so comparing without optimization tells you very little about what will be faster when compiled properly.Mcgray
I just checked with -O3... didn't help much. Any suggestions? (Profile pic of @PeterCordes relevant)Shortly
Profiling (perf record and perf report) showed that your STL version spent 74.3% of its time in memmove. Passing the vector by reference instead of by value made the run-times of C and STL the same, as expected, with gcc7.1 -O3 on skylake. Change to find_next_zero(unsigned long from, const vector<bool> &v)Mcgray

© 2022 - 2024 — McMap. All rights reserved.