edit: I am specifically comparing std::vector
's linear search operations to the std::map
binary search operations because that is what Herb's claim seemed to relate to. I know that using a binary search will move the performance from O(N) to O(log N) but that would not be testing Herb's claim
Bjarne Stroustrup and Herb Sutter have both recently talked about how awesome std::vector
is in situations one would expect std::list
to be used, owing to the cost of cache misses during linked list traversal. (see http://channel9.msdn.com/Events/Build/2014/2-661 at the 48 minute mark)
Herb made a further statement however that operations on an ordered vector were even faster than std::map
, (see https://i.sstatic.net/FDjpi.png taken from the 51:30 mark of the above channel9 video) which I found difficult to fathom. So I created a small test to demonstrate this and had difficulty reproducing these results: https://ideone.com/MN7DYK
This is the test code:
template <typename C>
void test(std::string name, std::vector<int> shuffledInputValues, C & c)
{
// fill container 'c' with values from 'shuffledInputValues' then erase them all
{
std::cout << "testing " << name << "..." << std::endl;
timer t;
for (auto val : shuffledInputValues) insert(c, val);
for (auto val : shuffledInputValues) remove(c, val);
}
}
// output:
// testing vector...99.189ms
// testing deque...120.848ms
// testing set...4.077ms
Notice how the std::vector
performs an order of magnitude slower than std::set
. Of course this is the result I expected, but I am confused about the claim that Herb was trying to make.
What am I doing wrong? Or am I misunderstanding Herb's claim?
Notes on my test app:
- it must use linear operations - the whole point of the exercise is to demonstrate CPU cache magic, these are the constraints Herb and Bjarne put on the exercise
- I didn't try any tricky loop-unravelling for my vector iteration, but I believe that the iteration isn't affecting performance much anyway
- I limited the loop to 10K items because ideone times out on larger sets, but increasing the size doesn't change the results
edit: see https://ideone.com/916fVd for a modified example that only compares the performance of lookups. Linear searching exhibits the same performance.
insert
&remove
is not going to out scalemap
for a vector, that's impossible due to the packing behavior. But find ( which you don't seem to time ) is going to be a lot faster. You can't usestd::find
however since that will do a linear search, you have to do a binary search. If you haveboost
profileboost::container::flat_map
againststd::map
and you'll see the results. Loki also haveLoki::AssocVector
which is pretty much the same if you have that available. – Clostridiumstd::binary_search
instead. – Questionstd::find
forvector
when trying the linear search comparison. For one you're computingend()
every iteration through the loop even if it can't change. Cont due to length restrictions... – Clostridiumstd::map
is cache locality, it's reasonable to assume that your map in this simple example will have terrific locality since the whole map is filled straight up with no allocations in between. A more plausible case would perhaps be that there's "stuff" happening between every insertion into the map. In a test case such as this I would suggest filling the map with lets say 20 times as many elements as you're going to use, then randomly remove elements until you're at the same size as the vector to more closely simulate the expected "real life" usage. – Clostridiumboost::container::flat_map
againststd::map
for curiosity, and did find as expected theflat_map
was faster, and of course used less memory. As there's a constant penalty payed for every cache miss however I will say that for large setsstd::map
will always beatstd::vector
when the vector uses a linear search, which might be what you're experiencing. – Clostridiummalloc
overhead). I was programming in C and the tree implementation was mine. These were small records ~16 bytes plus the identifier string, which was also the comparison key. Symbol table operations are interspersed with lots of code and data activity that ought to challenge the cache. – Tachymetrystd::list
should never be used, in practice I've found absolutely zero cases where it's characteristics have been preferable. If you care about order and don't update the collections often, go forboost::container::flat_map/flat_set
. I you don't care about order, go forstd::vector
and use the swap trick on last element. – Clostridiumboost::container::flat_map
and the ilk isn't always a drop in replacement. In my experience it's rather seldom you rely on iterator validity in the presence ofinsert
&erase
. Often you fill your collection, and after construction it stays fairly stable, in such cases a vector like storage is often a big net win. If iterator validity is importantstd::map/set
might be the best fit. – Clostridiumlist
1 time in my life, so I guess it has its uses haha. – Exurbstd::list
doesn't invalidate iterators, that's whyboost::stable_vector
was made! All the pros ofstd::list
, with few of the cons! – Lovesickvector
beating theset
by a landslide. ideone.com/sSh2vg – Rives