How to avoid default initialization of objects in std::vector? [duplicate]
Asked Answered
D

0

7

When I allocate large space with std::vector it takes significant time because std::vector default-initializes the objects in it.

I see that a long time ago people discussed the topic in the How can I avoid std::vector<> to initialize all its elements?, but most of the recommendations were to write own memory allocator (which is quite risky, hacky and expensive) or inherit to std::vector, which is not very good solution, as well.

Have something changes since that times? Do we have any reasonable way to avoid initialization? I create some 13 Gb vectors and each takes about 2 seconds to initialize to default values which will be rewritten by the next command. I can’t believe developers didn’t provide a flexible way to avoid this initialization.

I can’t use std::vector::reserve here, since parallel execution policies (more precisely, non-ranges algorithms) require forward iterators or stronger and I can’t use std::back_inserter to fill the container. See the Update section below.

With the long discussion in comments if 'std::vector' is suitable for my needs and what are the requirements for underlying elements, I want to sum up with a very simple requirement here:

I am totally fine with std::vector and it works in most of the cases. What I need is the ability to change its size without initializing the underlying elements, just an option to make std::vector::size = std::vector::capacity not causing _Uninitialized_fill_n in std::vector which could be done with my own allocator and which one I still want to avoid.

To show in code:

#include <chrono>
#include <vector>
#include <iostream>

int main()
{
    std::chrono::high_resolution_clock::time_point start;
    std::chrono::high_resolution_clock::time_point end;
       
    const std::size_t size = 13'000'000'000 / sizeof(int);

    start = std::chrono::high_resolution_clock::now();
    int *arr = new int[size];
    end = std::chrono::high_resolution_clock::now();
    std::cout << "Allocation with new took: " << std::chrono::duration<double>(end - start) << "\n";
    
    start = std::chrono::high_resolution_clock::now();
    std::vector<int> v(size);
    end = std::chrono::high_resolution_clock::now();
   
    std::cout << "Allocation with std::vector took: " << std::chrono::duration<double>(end - start) << "\n";

    // Prevent "over" optimization
    for (std::size_t i = 1; i < size; ++i) {
        v[i]+=v[i-1];
        arr[i]+=arr[i-1];
    }

    std::cout << v[size-1] << arr[size-1];
}

Less memory allocated on live demo to make it runnable.

On my PC (Winx64, MSVC++ 2022 latest, Release, Fully Optimized) it shows:

Allocation with new took: 0.50399s
Allocation with std::vector took: 2.04084s

And these 1.5 seconds of difference for every vector I don’t want to pay for.

Update

Since this caused misunderstanding in comments, here is why std::vector::reserve doesn't work for me. The code

#include <execution>
#include <vector>

int main()
{
    std::vector source_data = { 1,2,3,4,5 }; // Gigabytes of real data

    std::vector<int> v;
    v.reserve(source_data.size());

    std::transform(std::execution::par_unseq, 
        source_data.begin(), source_data.end(), std::back_inserter(v),
        [](auto in) {
            return in+1; // Some transformation
        });
}

won't compile with the error code like

Error C2338 static_assert failed: 'Non-ranges algorithms require that mutable iterators be Cpp17ForwardIterators or stronger.

which is highly reasonable, since std::back_inserter is output (sequential) iterator.

Update 2

In the discussion Change the size of C++ vector without initializing added elements people consider the same task, but solutions focused on two topics. The latter them is implementing own allocator, which, as I already stated above is expensive and risky.

The former solution to provide the underlying type without default initialization is more interesting, but in my circumstances (Winx64, MSVC++ 2022 latest, Release, Fully Optimized) this live demo leads only to three hundred times increase of execution time for vector allocation (to 600 seconds from 2 seconds above). Of course, the cause is this loop with emplace_back which works fine when your goal is to avoid initialization at any cost, by fails when it comes to performance which is the key in my question. The key Please correct me, if I implemented the test in a wrong way.

The same goes to the question C++ vector that doesn't initialize its members?, where similar solutions works just fine to avoid initialization but with high cost when it comes to performance. So, the formally solve the task of initialization avoidance, but the root problem of performance is not solved; the performance results in most cases worse.

Update 3

Since I don't need to change the content (add/remove elements) on the later stages, most likely, the interface I need is std::span which mostly mimics the std::vector interface except ownership and modifications. Any thoughts?

Desultory answered 21/3, 2024 at 22:8 Comment(74)
What are your exact requirements and limitations? What is the underlying problem you need to solve that needs 13 GB of memory? Perhaps there are other algorithms or data structures that can be used to solve the original and underlying problem? Like memory-mapped files? Sliding windows? Simply create your own vector-like wrapper around a raw non-owning pointer? Perhaps something else? I'm sorry, but unless this is just for plain curiosity (which you should state in the question itself) then this is an XY problem.Bother
Tactical note: high_resolution_clock prizes resolution over stability. On some systems you can watch it going backward in time, and that really <expletive deleted>s up the measurements. One of the creators of high_resolution_clock recommends not using it at all and preferring a more tightly-specified clock that'll give fewer surprises.Shrine
@Someprogrammerdude, I extracted very specific and simple problem: std::vector allocation mechanisms are not enough when it comes to work with large vectors filling them with std::transform-like algorithms with parallel executions policies. I just don't have a way to avoid expensive initialization. This comes at every point in the program where you have to deal with large data. No, this is not only curiosity, no this is not a premature optimization, no reserving doesn't help, no microseconds spent to std::cout don't change things (to be continued).Desultory
@Someprogrammerdude, well I work on LLM with large texts and build indexes to speedup some algorithms like finding duplicates, calculating distances, contexts, etc. Both texts and indexes could be quite large (even bulks); it comes to 64Gb and I would use more. Of course, I can do all you noticed, but why if a simple solution for std::vector exists? I asked very direct and clear question and I can't imagine I could do better putting the whole task here. I often need uninitialized vectors, why not? At the end of the day, C arrays on stack and on alloc don't initialize...Desultory
Sometimes you find cases where the usual wisdom of "Use std::vector" isn't the right choice and you have to roll your own. Depending on how much of vector you need, this can take anywhere from a few minutes to hack out or a few weeks to write and bulletproof. That said, even though it's rare it comes up often enough that someone's going to have this written and bombproofed.Shrine
Start by checking std::chrono::high_resolution_clock::is_steady. Then, if you care enough, look to see if high_resolution_clock is defined with something like using high_resolution_clock = system_clock;. My experience has been you can "Just use steady_clock" and get great resolution and portability across mainstream systems, but if you've got some specialized timing hardware...Shrine
@user4581301, I see. I asked because of the following points: (1) I was really surprised that in C++ which inherits C with its basis "we don't pay for anything extra" we got a container that forces as to pay for initialization when it is not needed; just surprised and was sure that there should be a solution; (2) std::vector works for me well, so far expect this, so I don't want to reinvent wheels; (3) I asked to know if I missed something, of course I can do all the stuff above, but it would be not nice to do so, missing a simple solution had it existed.Desultory
@xaxxon, for sure. See my answer to user4581301 above. The question is not how to do this, the question is if I missed a way to still use std::vector which works fine for me so far. I love its debugging features and I don't want to reinvent the wheels.Desultory
While I'm not experienced in LLM's, it sound more like you want some kind of graph or tree structure, rather than a flat contiguous array (checking for duplications in an array will have really bad performance). Alternatively, a forward list or a deque. Perhaps even multiple different data-structures, like one for the actual data, one for indexing, one for duplication handling, etc.Bother
"we got a container that forces as to pay for initialization when it is not needed" - it doesn't unless you create additional requirements on how it has to be used. Reserve works great for lots of people in lots of situations. You have a VERY specific use case that you've backed yourself into and now vector isn't great for you. std::vector always has defined elements from 0 to size-1 and that's not what you want. So you want a different data structure.Fem
One thing to watch out for is how much of this really is vector? int *arr = new int[size]; doesn't use the memory it asked for, so on a modern system it probably doesn't actually get any of the memory, and you'll pay for mapping real storage to the virtual addresses you got page-by-page as you start storing stuff in it. std::vector<int> v(size); is doing what new did and immediately starts using the memory, forcing the work finding and mapping real storage immediately. Unless you have a sparse structure you may wind up paying the same either way over time.Shrine
@xaxxon, following your logic, C malloc and C++ new should also initialize memory, but they don't and this is the language ideology, we don't pay for things we don't need. We allow the programmer to decide to pay for this or not. And even if std::vector doesn't follow this principle, if could at least have provided a simple way to avoid initialization. My task is not specific, I just care about performance. All people who use std::vectors pay extra sometimes not realizing this. Even for small vectors.Desultory
@user4581301, don't get me wrong, I am not a newbie. Of course, I can use plain memory or another container (not sure from std), but this approach has its own drawbacks. First to come: (1) I need the size of array to be passed to functions, so I, anyway will have to make a wrapper or end up with iterators and ranges; (2) std::vector has very good build-in debugging features I don't want to replicate; (3) it works expect this tiny feature; (4) It is std and people from support will grasp it in seconds, etc., etc. Again, I just asked if this is possible to do.Desultory
Is the transformation from a source-vector to a destination-vector crucial? Have you tried copying the source-vector to the destination-vector, and then use the destination vector for the transformation? Yes it's not as effective as your example using doing only a single transformation pass, but might be "Good Enough"?Bother
@DamirTenishev std::vector makes guarantees and provides a low cost abstraction to obtain those guarantees. If you don't want those guarantees then it's not the write data structure. That doesn't mean the data structure is wrong.Fem
@Someprogrammerdude, in real code these two vectors have different types of objects (I decided not to overcomplicate the example here, keeping only the key issue), so copying is not possible. And, I guess (will check) that copying will be even worse from performance perspective than initialization (at least because of doubling the loading on cache), so this won't work.Desultory
@xaxxon, I never said it is wrong. I said that I was surprised by the things I mentioned above. I agree that std::vector is different than plain memory, but it could at least provide a simple way to avoid unnecessary (in many cases) expensive initialization.Desultory
I was pointing out that your surprise was unwarranted. The guarantees come first. A vector guarantees contiguous constructed objects. It then does a good job to implement those guarantees given the way C++ is generally designed. It does not violate those principles. You want a different data structure.Fem
@DamirTenishev arr[i]+=arr[i-1]; -- I believe this is undefined behavior due to arr not being initialized. So your example code has a flaw -- in a real world program that uses new[], at the very least the data has to be initialized (or set to some value) to do any actual work.Bathy
@xaxxon, can you please quote the standard where it says that the objects must be constructed? I see that "elements are stored contiguously", period. The initialization is totally the responsibility of the allocator. The default one conducts the initialization, but this have nothing to std::vector, there are no requirements to the allocator that it must construct the values by default initialization. Correct me with the standard quote if I am wrong.Desultory
@Fedor, thank you for the link, it helps a lot. As you can see, so far we end up with "there is no good and simple solution here", so it is hard to say if it answers the question, since we are not sure that the "good" answer exists. Unfortunately, I can't reproduce this on live demo, but on my PC (see details in question) the accepted in the linked question approach leads only to increasing time from 2 seconds to 6 seconds. All other answers there go to implementing allocators, which is quite expensive.Desultory
@Someprogrammerdude, since you asked for the "original and underlying problem", here is the code review for one of the "a little bit more high level tasks I am dealing with. It provides the reason for use large indexes, allocating memory for them and using the allocated memory to write results at a first stage which illuminates the need for default initialization. If you are able to step in there with improvements, I will be really thankful.Desultory
Please note that in your live demo std::vector<IntWrapper> v(size); is undefined behavior because it default creates IntWrapper with not-initialized value field, and then copies it to all new vector elements. Some compilers are able to optimize it, but it is still UB.Rosabella
@Fedor, I see, but nothing changes in MSVC++ latest when I compile in Release with full optimization without this loop at the end. I just can't reproduce this on godbolt.org since it doesn't execute MSVC code, just compiles it. On my PC this code shows 6 seconds for vector allocation.Desultory
@Fedor, I just noticed that I missed the key part from your link, namely the resizeNoInit function template. When I added it, the situation became even worse, it takes so long to resize the vector that I had to decrease the desired size 100 times and only after that I got 4 seconds of execution. Of course, the cause is this loop with emplace_back which works fine when your goal is to avoid initialization at any cost, by fails when it comes to performance which is the key in my question.Desultory
C++ vector that doesn't initialize its members?Zebrass
You might be able to write a similar container to vector that can have "gaps" anywhere in it's allocation, rather than just at the end that vector has. You'd write an iterator that placement-news in the same way that back_inserter push_backs on assignment.Contrabandist
@Caleth, of course, I can. The question is how to avoid reinventing the wheels. Please see my answers in comments above, I explained this in a great detail. Every time when I implement something new people start screaming "RTFM!!! Stop reinventing the wheels! Reuse existing solutions! Just adjust them!". When after that I ask how to adjust std::vector, am getting the opposite proposals to write my own. The circle loops.Desultory
@user12002570, thank you so much for the link. I will try the approach with the inherited allocator, it seems to be not so risky and expensive. I will post the update here after tests. The main accepted answer there, however strictly aligned with my test suggested by Fedor and, although I can't reproduce the timing on godbolt.org, it doesn't work with MSVC++.Desultory
"What I need is the ability to change its size without initializing the underlying elements" That's specifically a thing that std::vector promises not to doContrabandist
(B) I once thought I had this issue, but eventually discovered that the default initialization wasn't really the source of the slow performance. The source of the slow performance was actually the OS committing pages to my process in memory. If that's also the case for you, then skipping the default initialization doesn't actually improve the performance. The best solution I found was to reserve my vector, then have one thread append ~1000 elements at a time until it was fully in RAM, and a second thread that assigned real values, making sure not to go past the current length.Behead
@MooingDuck the ExecutionPolicy overloads require ForwardIteratorContrabandist
@MooingDuck, can you please read my text and code carefully and pay a special attention to std::execution::par_unseq there? This is exactly the case why std::transform doesn't take output_iterator, see number (2) at the en.cppreference.com/. It requires ForwardIt3, but not OutputIt as numbers (1) and (3).Desultory
And for a standard reference, we have vector::data "Returns: A pointer such that [data(), data() + size()) is a valid range. For a non-empty vector, data() == addressof(front()) is true." plus the rules for object lifetimesContrabandist
@Caleth: if you run the program twice in a row, is the second time FAR faster than the first time? If so, then it's almost certainly the memory paging issue I mentioned, and not the construction. Another way to check is to carefully measure the CPU usage during this slow step. If it's mostly "user" CPU, then its initialization. If it's mostly "system" CPU, then it's memory paging.Behead
@MooingDuck, thank you for the hint on the OS committing pages. I will take a look, but so far I see precisely that the only difference is default initialization, since in case with new it works just fine, the same array, the same allocation, just without this _Uninitialized_fill_n invoke. Can you please share some sources where you used your approach with the reservation and threading? I am not sure I caught your idea well enough too implement well.Desultory
@MooingDuck, no my results is absolutely stable between runs.Desultory
@DamirTenishev: depending on your OS, new and std::vector::reserve will commit virtual memory, but not actual RAM, and so are fast. But then the first time you try to write to each memory page, then the thread "pauses", and the OS will dump something else from RAM, and then give that page to your process, and then resume your thread. That is usually the slow part, rather than the initialization.Behead
@Caleth, sorry, just to be on the same page, is your reference to standard related to my request on reference to standard where is says that the underlying objects my be default-initialized or this is the reference for another topic? If this is the answer to my request, I can't see how it answers to it. If not, please specify what question you are addressing (too many comments, easy to misunderstand).Desultory
@MooingDuck, are your really sure that this could cause 1.5 seconds delay? I have 1.5 seconds for 13 gigabytes which gives about 10 Gigabytes per second which more or less fits to memory bandwidth for writing operation. This way or another, writing 13 Gb couldn't be for free, you will have to pay and I am sure that these 1.5 seconds is exactly the price for this writing.Desultory
Yes, that was a reference for how vector is incompatible with your requirementsContrabandist
@DamirTenishev: I just noticed you have a vector of int, which has a trivial constructor. So yes, ~100% of the time here is definitely OS paging. When I encountered this in Windows, the problem was that it was paging other process' memory to disk, so the limiting factor then was the hard drive write speed. At 30-150 MB/second, 13Gb, that's 86 to 433 seconds. I eventually found the issue when I realized that the second run was far faster (because the RAM was empty, so nothing had to be paged to disk). During the slow performance, are you seeing disk I/O by the OS system?Behead
Oh, if you're seeing 1.5s for 13GB, then you're right that that roughly matches RAM bandwidth. That's an excellent observation, disproving my memory paging theory. So the answer is that std::vector's guarantee that objects are always constructed is too strong for your specific use case, and you'll need a separate structure. Luckily, that's pretty straightforward.Behead
@MooingDuck, just tested with long long (code only for checking, can't run MSVC there), on my local PC it gives the same time for int and long long. What does is say? For me, the same, memory writing bottleneck.Desultory
@Caleth, I can't see in which way it provides this information. According to 27.2.1 (see, for example, here ) says only that pointers must be in the same container and must be reachable. Nothing is said about the content, at all. Please, correct me with the standard quote on valid range if I am wrong here.Desultory
Pointers point to objects. If you don't initialise, you don't have an objectContrabandist
@Caleth, can you follow up this with standard quote, please? Who told that I care about object? Where it was said? We discussed, per your quote, pointers range only. No word was said about objects. Why you change to them suddenly? At the end of the day there is a reserve() method which allocates memory internally without initialization and this is still valid range according to C++ standard, although internal, range. C++ does not explicitly require the element behind the valid range to be initialized. Please prove otherwise, if you have the quote.Desultory
@DamirTenishev: I just realized that the best answer is probably boost::transform_iterator, which will construct the vector with the final values in the first place. Are you allowed to use boost?Behead
@MooingDuck, thank you for the hint. I'd like to avoid boost for the moment. Anyway, if I use it, in which way transform_iterator could help, if the main problem, anyway comes with the std::back_inserter? The problem is that when it comes to add something to the range, it tries a kind of emplace method which doesn't work for parallel execution. Even in case I transform the incoming data, I won't be able to add it to the vector with parallel execution. If only I could move-construct std::vector from plain data array...Desultory
@MooingDuck, and this "If only I could move-construct std::vector from plain data array..." led me to the idea to use std::span since I don't plan to change the content somehow (add/remove elements). Seems that I could allocate the array in memory, wrap in std::unique_ptr to handle ownership later and pass the allocated array to std::span. Any thoughts on that?Desultory
How important is that parallel execution? Given the memory bottleneck, it would appear that a single threaded constructor transform and a multi-threaded solution are probably going to be close to the same speed. But yes, if parallelism is faster, then unique_ptr + span is probably the easiest solution.Behead
Oh, std::ranges::views::transform might work for your case, if you have C++20Behead
@MooingDuck, ranges don't support parallel execution so far. Yes, transformation function does some work which benefits from vectorization and parallel execution.Desultory
@DamirTenishev: I tried to see if I can make a simple thing where one thread transforms, and a separate thread can treat the results like an input, which can then go directly into the vector, but I appear to have several bugs coliru.stacked-crooked.com/a/372a486c74696751Behead
@MooingDuck, thank you, trying to grasp the logic. What's the idea behind the idx = 1 - idx; for the unsigned idx in the line 63? Is it buffer idx flipping between 0 and 1?Desultory
yea, that is me toggling between index 0 and 1. I am currently rewriting the whole thing to have much more readable accesses of the atomic, and also have a buffer size of 30-60 items instead of 2, because at least on the machine I'm testing on, it seems to be running on a single physical core, and isn't switching threads when I call yield. a buffer size of 30-60 will improve the performance drastically for that scenario. and presumably the simpler code will help me find the bug.Behead
@MooingDuck, thank you so much. I will postpone the digging into your code till the new version, then. Eager to hear new from you here.Desultory
@DamirTenishev: I got it to compile and run. coliru.stacked-crooked.com/a/ae2a30b1e8c46675Behead
@MooingDuck, thank you again. I've compiled it with MSVC++ latest in Release with full optimization. With async_transformer it runs 0.92 sec and with default initialization (#if false in line 269) it runs 0.015 sec. If I extend the data size to 0xFFFFFF (adding one more 'F' in line 267) it runs 13.34 seconds with async_transformer and 0.24 seconds with default initialization. So, what is the profit?Desultory
@DamirTenishev: One thread is calculating, and the other thread is writing the results through the memory bus. If the calculation is slow and CPU-bound, then this can be useful in theory. But you're right that my test transform isn't slow enough to be worth it. The threading overhead is slower than just doing the transforms single-threaded. So I think there's no easy answer here. You'll have to make a custom data structure instead of vector.Behead
@MooingDuck, thank you for your help, anyway. I learned a lot from your code. At the moment I decided to stay with std::span which perfectly works for my tasks except some tiny extra work on ownership and resource management.Desultory
Yeah, that's probably easier than implementing a custom version of vectorBehead
By objects, I mean what C++ means: everything that has a type that isn't a reference or a function. An allocation where you intend to put some ints aren't ints yet. std::vector is specified in such a way that you can't get at the uninitialized allocation.Contrabandist
@Caleth, that's exactly the statement I am asking to be followed with the reference to standard. I can't see where it is required to have the underlying values initialized. I see that the range much be valid, but it never requires to be a range of initialized values. For instance, if I allocate a range with new and set begin and end of it, it still will be a valid range. And in std::vector specification I can see nothing additional to a valid range that would require initialization of the underlying values.Desultory
If you allocate with new, objects are created. If you allocate with std::allocator::allocate, they aren't. A valid range is a range in which objects exist. It's not initialisation, it's the creation that you are lackingContrabandist
@Caleth, you are confusing two distinct things here: object creation and object initialization. Operator new creates object, but doesn't initialize them. Object could exist in uninitialized state, this is totally fine in C++ unless you want to use it for read and some other stuff. Lots of object exist in C++ which are not initialized, like stack objects, etc. So, again, I never saw the requirement for objects in std::vector to be initialized. They should exist, of course, but show me a requirement for initialization.Desultory
reserve does not create objects. every member of std::vector that increases size does so by initialising the new objects as it creates them, because everything goes through allocator_traits::constructContrabandist
@Caleth, exactly. How is that related to the long discussed requirement std::vector to have initialized objects? This is totally up to allocator and has nothing to do with requirements to std::vector. So, you statement "vector requires underlying objects to be initialized" is unsupported.Desultory
allocator_traits::construct always initialises objects, so there's simply no way of vector having uninitialized objects. It can have an allocation without objects in it, or it can have initialised objects. And we weren't saying that vector requires it, but that it ensures it.Contrabandist
@Caleth, of course, it does by its name. My question in not about a particular implementation. My question about the requirements in standard to do so. Namely, where exactly it is said that std::vector must use allocator_traits::construct for its elements, but not allocator_traits::allocate? I understand the difference and how this works for resize/reserve. My question is where is it stated and required in the standard?Desultory
allocator_traits::allocate doesn't construct elements. But also the requirements on allocator aware containers are allocator_traits::construct(args...) being well formed for args... appropriate to the various situations.Contrabandist
@Caleth, don't get me wrong, I am not arguing that it is implemented in this way. I am asking where is it documented. Namely, where is it explicitly stated that std::vector must use allocator_traits::construct for new elements? I see that allocator_traits::construct must be in allocator. I see that std::vector uses allocator. But I never see that std::vector must use allocator_traits::construct method of allocator. Can you pinpoint the place in standard or this?Desultory
The container allocator requirements, and the members of vector which mention specific ones. It's only spelled out explicitly in a note.Contrabandist
@Caleth, thank you. You made the point. It is really funny how the requirement to construct the value comes only in the note while the whole section devoted to construction. Seems they decided "name speaks for itself".Desultory

© 2022 - 2025 — McMap. All rights reserved.