With CPU caches becoming better and better std::vector
usually outperforms std::list
even when it comes to testing the strengths of a std::list
. For this reason, even for situations where I need to delete/insert in the middle of the container I usually pick std::vector
but I realized I had never tested this to make sure assumptions were correct. So I set up some test code:
#include <iostream>
#include <chrono>
#include <list>
#include <vector>
#include <random>
void TraversedDeletion()
{
std::random_device dv;
std::mt19937 mt{ dv() };
std::uniform_int_distribution<> dis(0, 100000000);
std::vector<int> vec;
for (int i = 0; i < 100000; ++i)
{
vec.emplace_back(dis(mt));
}
std::list<int> lis;
for (int i = 0; i < 100000; ++i)
{
lis.emplace_back(dis(mt));
}
{
std::cout << "Traversed deletion...\n";
std::cout << "Starting vector measurement...\n";
auto now = std::chrono::system_clock::now();
auto index = vec.size() / 2;
auto itr = vec.begin() + index;
for (int i = 0; i < 10000; ++i)
{
itr = vec.erase(itr);
}
std::cout << "Took " << std::chrono::duration_cast<std::chrono::microseconds>(std::chrono::system_clock::now() - now).count() << " μs\n";
}
{
std::cout << "Starting list measurement...\n";
auto now = std::chrono::system_clock::now();
auto index = lis.size() / 2;
auto itr = lis.begin();
std::advance(itr, index);
for (int i = 0; i < 10000; ++i)
{
auto it = itr;
std::advance(itr, 1);
lis.erase(it);
}
std::cout << "Took " << std::chrono::duration_cast<std::chrono::microseconds>(std::chrono::system_clock::now() - now).count() << " μs\n";
}
}
void RandomAccessDeletion()
{
std::random_device dv;
std::mt19937 mt{ dv() };
std::uniform_int_distribution<> dis(0, 100000000);
std::vector<int> vec;
for (int i = 0; i < 100000; ++i)
{
vec.emplace_back(dis(mt));
}
std::list<int> lis;
for (int i = 0; i < 100000; ++i)
{
lis.emplace_back(dis(mt));
}
std::cout << "Random access deletion...\n";
std::cout << "Starting vector measurement...\n";
std::uniform_int_distribution<> vect_dist(0, vec.size() - 10000);
auto now = std::chrono::system_clock::now();
for (int i = 0; i < 10000; ++i)
{
auto rand_index = vect_dist(mt);
auto itr = vec.begin();
std::advance(itr, rand_index);
vec.erase(itr);
}
std::cout << "Took " << std::chrono::duration_cast<std::chrono::microseconds>(std::chrono::system_clock::now() - now).count() << " μs\n";
std::cout << "Starting list measurement...\n";
now = std::chrono::system_clock::now();
for (int i = 0; i < 10000; ++i)
{
auto rand_index = vect_dist(mt);
auto itr = lis.begin();
std::advance(itr, rand_index);
lis.erase(itr);
}
std::cout << "Took " << std::chrono::duration_cast<std::chrono::microseconds>(std::chrono::system_clock::now() - now).count() << " μs\n";
}
int main()
{
RandomAccessDeletion();
TraversedDeletion();
std::cin.get();
}
All results are compiled with /02 (Maximize speed)
.
The first, RandomAccessDeletion()
, generates a random index and erases this index 10.000 times. My assumptions were right and the vector is indeed a lot faster than the list:
Random access deletion...
Starting vector measurement...
Took 240299 μs
Starting list measurement...
Took 1368205 μs
The vector is about 5.6x faster than the list. We can most likely thank our cache overlords for this performance benefit, even though we need to shift the elements in the vector on every deletion it's impact is less than the lookup time of the list as we can see in the benchmark.
So then I added another test, seen in the TraversedDeletion()
. It doesn't use randomized positions to delete but rather it picks an index in the middle of the container and uses that as base iterator, then traverse the container to erase 10.000 times.
My assumptions were that the list would outperform the vector only slightly or as fast as the vector.
The results for the same execution:
Traversed deletion...
Starting vector measurement....
Took 195477 μs
Starting list measurement...
Took 581 μs
Wow. The list is about 336x faster. This is really far off from my expectations. So having a few cache misses in the list doesn't seem to matter at all here as cutting the lookup time for the list weighs in way more.
So the list apparently still has a really strong position when it comes to performance for corner/unusual cases, or are my test cases flawed in some way?
Does this mean that the list nowadays is only a reasonable option for lots of insertions/deletions in the middle of a container when traversing or are there other cases?
Is there a way I could change the vector access & erasure in TraversedDeletion()
to make it at least a bit more competition vs the list?
In response to @BoPersson's comment:
vec.erase(it, it+10000)
would perform a lot better than doing 10000 separate deletes.
Changing:
for (int i = 0; i < 10000; ++i)
{
itr = vec.erase(itr);
}
To:
vec.erase(itr, itr + 10000);
Gave me:
Starting vector measurement...
Took 19 μs
This is a major improvement already.
vec.erase(it)
invalidatesitr
. You wantitr = vec.erase(itr);
– Sinistervec.erase(it)
does ivalidateitr
and he is using that in each subsequent iteration. I was looking atRandomAccessDeletion
notTraversedDeletion
. – Bolducvector::erase
to get a valid itr – Loanloandavec.erase(it, it+10000)
would perform a lot better than doing 10000 separate deletes. – Pigeonhole