When I allocate large space with std::vector it takes significant time because std::vector default-initializes the objects in it.
I see that a long time ago people discussed the topic in the How can I avoid std::vector<> to initialize all its elements?, but most of the recommendations were to write own memory allocator (which is quite risky, hacky and expensive) or inherit to std::vector, which is not very good solution, as well.
Have something changes since that times? Do we have any reasonable way to avoid initialization? I create some 13 Gb vectors and each takes about 2 seconds to initialize to default values which will be rewritten by the next command. I can’t believe developers didn’t provide a flexible way to avoid this initialization.
I can’t use std::vector::reserve
here, since parallel execution policies (more precisely, non-ranges algorithms) require forward iterators or stronger and I can’t use std::back_inserter
to fill the container. See the Update section below.
With the long discussion in comments if 'std::vector' is suitable for my needs and what are the requirements for underlying elements, I want to sum up with a very simple requirement here:
I am totally fine with std::vector and it works in most of the cases. What I need is the ability to change its size without initializing the underlying elements, just an option to make std::vector::size = std::vector::capacity
not causing _Uninitialized_fill_n
in std::vector which could be done with my own allocator and which one I still want to avoid.
To show in code:
#include <chrono>
#include <vector>
#include <iostream>
int main()
{
std::chrono::high_resolution_clock::time_point start;
std::chrono::high_resolution_clock::time_point end;
const std::size_t size = 13'000'000'000 / sizeof(int);
start = std::chrono::high_resolution_clock::now();
int *arr = new int[size];
end = std::chrono::high_resolution_clock::now();
std::cout << "Allocation with new took: " << std::chrono::duration<double>(end - start) << "\n";
start = std::chrono::high_resolution_clock::now();
std::vector<int> v(size);
end = std::chrono::high_resolution_clock::now();
std::cout << "Allocation with std::vector took: " << std::chrono::duration<double>(end - start) << "\n";
// Prevent "over" optimization
for (std::size_t i = 1; i < size; ++i) {
v[i]+=v[i-1];
arr[i]+=arr[i-1];
}
std::cout << v[size-1] << arr[size-1];
}
Less memory allocated on live demo to make it runnable.
On my PC (Winx64, MSVC++ 2022 latest, Release, Fully Optimized) it shows:
Allocation with new took: 0.50399s
Allocation with std::vector took: 2.04084s
And these 1.5 seconds of difference for every vector I don’t want to pay for.
Update
Since this caused misunderstanding in comments, here is why std::vector::reserve
doesn't work for me. The code
#include <execution>
#include <vector>
int main()
{
std::vector source_data = { 1,2,3,4,5 }; // Gigabytes of real data
std::vector<int> v;
v.reserve(source_data.size());
std::transform(std::execution::par_unseq,
source_data.begin(), source_data.end(), std::back_inserter(v),
[](auto in) {
return in+1; // Some transformation
});
}
won't compile with the error code like
Error C2338 static_assert failed: 'Non-ranges algorithms require that mutable iterators be Cpp17ForwardIterators or stronger.
which is highly reasonable, since std::back_inserter
is output (sequential) iterator.
Update 2
In the discussion Change the size of C++ vector without initializing added elements people consider the same task, but solutions focused on two topics. The latter them is implementing own allocator, which, as I already stated above is expensive and risky.
The former solution to provide the underlying type without default initialization is more interesting, but in my circumstances (Winx64, MSVC++ 2022 latest, Release, Fully Optimized) this live demo leads only to three hundred times increase of execution time for vector allocation (to 600 seconds from 2 seconds above). Of course, the cause is this loop with emplace_back
which works fine when your goal is to avoid initialization at any cost, by fails when it comes to performance which is the key in my question. The key Please correct me, if I implemented the test in a wrong way.
The same goes to the question C++ vector that doesn't initialize its members?, where similar solutions works just fine to avoid initialization but with high cost when it comes to performance. So, the formally solve the task of initialization avoidance, but the root problem of performance is not solved; the performance results in most cases worse.
Update 3
Since I don't need to change the content (add/remove elements) on the later stages, most likely, the interface I need is std::span which mostly mimics the std::vector
interface except ownership and modifications. Any thoughts?
high_resolution_clock
prizes resolution over stability. On some systems you can watch it going backward in time, and that really <expletive deleted>s up the measurements. One of the creators ofhigh_resolution_clock
recommends not using it at all and preferring a more tightly-specified clock that'll give fewer surprises. – Shrinestd::vector
" isn't the right choice and you have to roll your own. Depending on how much ofvector
you need, this can take anywhere from a few minutes to hack out or a few weeks to write and bulletproof. That said, even though it's rare it comes up often enough that someone's going to have this written and bombproofed. – Shrinestd::chrono::high_resolution_clock::is_steady
. Then, if you care enough, look to see ifhigh_resolution_clock
is defined with something likeusing high_resolution_clock = system_clock;
. My experience has been you can "Just usesteady_clock
" and get great resolution and portability across mainstream systems, but if you've got some specialized timing hardware... – Shrinevector
?int *arr = new int[size];
doesn't use the memory it asked for, so on a modern system it probably doesn't actually get any of the memory, and you'll pay for mapping real storage to the virtual addresses you got page-by-page as you start storing stuff in it.std::vector<int> v(size);
is doing whatnew
did and immediately starts using the memory, forcing the work finding and mapping real storage immediately. Unless you have a sparse structure you may wind up paying the same either way over time. – Shrinestd
and people from support will grasp it in seconds, etc., etc. Again, I just asked if this is possible to do. – Desultorystd::vector
is different than plain memory, but it could at least provide a simple way to avoid unnecessary (in many cases) expensive initialization. – Desultoryarr[i]+=arr[i-1];
-- I believe this is undefined behavior due toarr
not being initialized. So your example code has a flaw -- in a real world program that usesnew[]
, at the very least the data has to be initialized (or set to some value) to do any actual work. – Bathystd::vector<IntWrapper> v(size);
is undefined behavior because it default createsIntWrapper
with not-initializedvalue
field, and then copies it to all new vector elements. Some compilers are able to optimize it, but it is still UB. – RosabellaresizeNoInit
function template. When I added it, the situation became even worse, it takes so long to resize the vector that I had to decrease the desired size 100 times and only after that I got 4 seconds of execution. Of course, the cause is this loop withemplace_back
which works fine when your goal is to avoid initialization at any cost, by fails when it comes to performance which is the key in my question. – Desultoryvector
that can have "gaps" anywhere in it's allocation, rather than just at the end thatvector
has. You'd write an iterator that placement-news in the same way thatback_inserter
push_back
s on assignment. – Contrabandiststd::vector
promises not to do – Contrabandistreserve
my vector, then have one thread append ~1000 elements at a time until it was fully in RAM, and a second thread that assigned real values, making sure not to go past the current length. – Beheadstd::execution::par_unseq
there? This is exactly the case whystd::transform
doesn't takeoutput_iterator
, see number (2) at the en.cppreference.com/. It requires ForwardIt3, but not OutputIt as numbers (1) and (3). – Desultoryvector::data
"Returns: A pointer such that [data(), data() + size()) is a valid range. For a non-empty vector, data() == addressof(front()) is true." plus the rules for object lifetimes – Contrabandistnew
it works just fine, the same array, the same allocation, just without this_Uninitialized_fill_n
invoke. Can you please share some sources where you used your approach with the reservation and threading? I am not sure I caught your idea well enough too implement well. – Desultorynew
andstd::vector::reserve
will commit virtual memory, but not actual RAM, and so are fast. But then the first time you try to write to each memory page, then the thread "pauses", and the OS will dump something else from RAM, and then give that page to your process, and then resume your thread. That is usually the slow part, rather than the initialization. – Beheadvector
is incompatible with your requirements – Contrabandiststd::vector
's guarantee that objects are always constructed is too strong for your specific use case, and you'll need a separate structure. Luckily, that's pretty straightforward. – Beheadint
andlong long
. What does is say? For me, the same, memory writing bottleneck. – Desultoryreserve()
method which allocates memory internally without initialization and this is still valid range according to C++ standard, although internal, range. C++ does not explicitly require the element behind the valid range to be initialized. Please prove otherwise, if you have the quote. – Desultoryboost::transform_iterator
, which will construct the vector with the final values in the first place. Are you allowed to useboost
? – Beheadstd::back_inserter
? The problem is that when it comes to add something to the range, it tries a kind ofemplace
method which doesn't work for parallel execution. Even in case I transform the incoming data, I won't be able to add it to the vector with parallel execution. If only I could move-construct std::vector from plain data array... – Desultorystd::span
since I don't plan to change the content somehow (add/remove elements). Seems that I could allocate the array in memory, wrap instd::unique_ptr
to handle ownership later and pass the allocated array tostd::span
. Any thoughts on that? – Desultoryunique_ptr
+span
is probably the easiest solution. – Beheadstd::ranges::views::transform
might work for your case, if you have C++20 – Beheadidx = 1 - idx;
for theunsigned idx
in the line 63? Is it buffer idx flipping between 0 and 1? – Desultorystd::span
which perfectly works for my tasks except some tiny extra work on ownership and resource management. – Desultoryint
s aren'tint
s yet.std::vector
is specified in such a way that you can't get at the uninitialized allocation. – Contrabandistnew
, objects are created. If you allocate withstd::allocator::allocate
, they aren't. A valid range is a range in which objects exist. It's not initialisation, it's the creation that you are lacking – Contrabandistnew
creates object, but doesn't initialize them. Object could exist in uninitialized state, this is totally fine in C++ unless you want to use it for read and some other stuff. Lots of object exist in C++ which are not initialized, like stack objects, etc. So, again, I never saw the requirement for objects in std::vector to be initialized. They should exist, of course, but show me a requirement for initialization. – Desultoryreserve
does not create objects. every member ofstd::vector
that increases size does so by initialising the new objects as it creates them, because everything goes through allocator_traits::construct – Contrabandiststd::vector
to have initialized objects? This is totally up to allocator and has nothing to do with requirements tostd::vector
. So, you statement "vector requires underlying objects to be initialized" is unsupported. – Desultoryallocator_traits::construct
always initialises objects, so there's simply no way ofvector
having uninitialized objects. It can have an allocation without objects in it, or it can have initialised objects. And we weren't saying that vector requires it, but that it ensures it. – Contrabandistallocator_traits::allocate
doesn't construct elements. But also the requirements on allocator aware containers areallocator_traits::construct(args...)
being well formed forargs...
appropriate to the various situations. – Contrabandiststd::vector
must useallocator_traits::construct
for new elements? I see that allocator_traits::construct must be in allocator. I see thatstd::vector
uses allocator. But I never see thatstd::vector
must useallocator_traits::construct
method of allocator. Can you pinpoint the place in standard or this? – Desultory