Vector: initialization or reserve?
Asked Answered
M

10

90

I know the size of a vector, which is the best way to initialize it?

Option 1:

vector<int> vec(3); //in .h
vec.at(0)=var1;     //in .cpp
vec.at(1)=var2;     //in .cpp
vec.at(2)=var3;     //in .cpp

Option 2:

vector<int> vec;     //in .h
vec.reserve(3);      //in .cpp
vec.push_back(var1); //in .cpp
vec.push_back(var2); //in .cpp
vec.push_back(var3); //in .cpp

I guess, Option2 is better than Option1. Is it? Any other options?

Mcgray answered 19/1, 2012 at 15:19 Comment(6)
Define "better".Angi
vector<int> vec(3); You should not initialize variables in header files.Centeno
@Mcgray both approaches, reserve and initialise, are equally professional.Highhat
@SigTerm: yes, i see. Is it only to avoid trivial mistakesMcgray
5 years later and I wish this question had a real answer... The accepted answer is just wrong (unless the question was edited after the answer was given).Yearling
@Apollys Could you comment on my answer (from 2015) what you think is missing and/or would make it a real answer =]Halves
S
54

Both variants have different semantics, i.e. you are comparing apples and oranges.

The first gives you a vector of n default-initialized values, the second variant reserves the memory, but does not initialize them.

Choose what better fits your needs, i.e. what is "better" in a certain situation.

Sibyls answered 19/1, 2012 at 15:38 Comment(6)
No, both give you a vector containing {var1, var2, var3} by slightly different routes. The question is, is either route better than the other?Fauve
So, i think initialization of n default values is useless if I don't use them.Mcgray
@Ale: initialization to default values is indeed useless in that case, however for types like int default initializing them and then overwriting is really not that expansive, and push_back is typically more expensive then operator[], so in that case there probably is no clear answer for efficency. In the end I would say you really shouldn't worry about that for most casesLivvy
@Mike Seymour: I've thought about this variant. However, the title state 'initialization or reserve?', which is now what I am focusing my answer on.Sibyls
@Grizzly: push_back more expensive? i guess the opposite but i get the point, thanksMcgray
@Mcgray push_back() et al must check capacity (regardless of whether you reserve()) & increment size, per element. To me it's very valid to say the resulting branch/load/store/whatever in "option 2" must be weighed against the saving of not default-initialisation/reassignment from "option 1". Conceptually, it's easy to think you're doing less with #2, but the more I think about it, the less convinced I am. Obviously #2 is needed (or at least vastly preferable) for some classes, but for basic types it's much less clear. The real answer seems to be, as always: benchmark, or don't speculate.Czarra
Y
70

Somehow, a non-answer answer that is completely wrong has remained accepted and most upvoted for ~7 years. This is not an apples and oranges question. This is not a question to be answered with vague cliches.

For a simple rule to follow:

Option #1 is faster... enter image description here enter image description here

...but this probably shouldn't be your biggest concern.

Firstly, the difference is pretty minor. Secondly, as we crank up the compiler optimization, the difference becomes even smaller. For example, on my gcc-5.4.0, the difference is arguably trivial when running level 3 compiler optimization (-O3): enter image description here

So in general, I would recommending using method #1 whenever you encounter this situation. However, if you can't remember which one is optimal, it's probably not worth the effort to find out. Just pick either one and move on, because this is unlikely to ever cause a noticeable slowdown in your program as a whole.


These tests were run by sampling random vector sizes from a normal distribution, and then timing the initialization of vectors of these sizes using the two methods. We keep a dummy sum variable to ensure the vector initialization is not optimized out, and we randomize vector sizes and values to make an effort to avoid any errors due to branch prediction, caching, and other such tricks.

main.cpp:

/* 
 * Test constructing and filling a vector in two ways: construction with size
 * then assignment versus construction of empty vector followed by push_back
 * We collect dummy sums to prevent the compiler from optimizing out computation
 */

#include <iostream>
#include <vector>

#include "rng.hpp"
#include "timer.hpp"

const size_t kMinSize = 1000;
const size_t kMaxSize = 100000;
const double kSizeIncrementFactor = 1.2;
const int kNumVecs = 10000;

int main() {
  for (size_t mean_size = kMinSize; mean_size <= kMaxSize;
       mean_size = static_cast<size_t>(mean_size * kSizeIncrementFactor)) {
    // Generate sizes from normal distribution
    std::vector<size_t> sizes_vec;
    NormalIntRng<size_t> sizes_rng(mean_size, mean_size / 10.0); 
    for (int i = 0; i < kNumVecs; ++i) {
      sizes_vec.push_back(sizes_rng.GenerateValue());
    }
    Timer timer;
    UniformIntRng<int> values_rng(0, 5);
    // Method 1: construct with size, then assign
    timer.Reset();
    int method_1_sum = 0;
    for (size_t num_els : sizes_vec) {
      std::vector<int> vec(num_els);
      for (size_t i = 0; i < num_els; ++i) {
        vec[i] = values_rng.GenerateValue();
      }
      // Compute sum - this part identical for two methods
      for (size_t i = 0; i < num_els; ++i) {
        method_1_sum += vec[i];
      }
    }
    double method_1_seconds = timer.GetSeconds();
    // Method 2: reserve then push_back
    timer.Reset();
    int method_2_sum = 0;
    for (size_t num_els : sizes_vec) {
      std::vector<int> vec;
      vec.reserve(num_els);
      for (size_t i = 0; i < num_els; ++i) {
        vec.push_back(values_rng.GenerateValue());
      }
      // Compute sum - this part identical for two methods
      for (size_t i = 0; i < num_els; ++i) {
        method_2_sum += vec[i];
      }
    }
    double method_2_seconds = timer.GetSeconds();
    // Report results as mean_size, method_1_seconds, method_2_seconds
    std::cout << mean_size << ", " << method_1_seconds << ", " << method_2_seconds;
    // Do something with the dummy sums that cannot be optimized out
    std::cout << ((method_1_sum > method_2_sum) ? "" : " ") << std::endl;
  }

  return 0;
}

The header files I used are located here:

Yearling answered 21/12, 2018 at 23:58 Comment(8)
Not a single word in case default contructor is missing?Korwin
@Korwin The default constructor of int will never be missing. If the OP wanted a broader discussion, they should have asked a far broader question. This is a great answer to what they asked, which addresses basic element types and scales to sensible numbers of them.Czarra
@underscore_d, it's a very good answer in respect of performance. But if there is no default constructor, the only thing left will be initialisation. You could argue that "option 2" implies that there has to be one, but perhaps the OP was just not yet aware of this issue.Korwin
so what is the optimization level for the first two pictures? What I know it could be -O0 and thus have no relevance...Enterprising
This is an excellent answer, and surprising since option 1 supposedly sets all the values to something, which seems more costly that simply allocating the memory without modifying it. How can modifying something -- especially a lot of something -- be faster than just putting a fence around it? Nonetheless, thank you.Kliman
@Kliman That's probably because with option#2 push_back you are modifying the vector (size increase, check if enough reserved memory etc). So it surely does "more work" for each inserted element than option#1Lamentable
Also: Your RNG header has a non-inline function definition. That's no good.Sibyls
I wonder why you start your answer with a hostile and uncalled for intro about how the accepted answer is not an answer. The question is not specifically about what is faster - only about what is better. But what is better depends on what you actually want, which is a lacking criterion in the question. From a pure logical POV, a definite answer is simply not possible, therefore your hostility is logically out of place. // edit: Oh, and look how you still did not fix your RNG-header, spreading bad C++, yet your cockiness is still in place.Sibyls
S
54

Both variants have different semantics, i.e. you are comparing apples and oranges.

The first gives you a vector of n default-initialized values, the second variant reserves the memory, but does not initialize them.

Choose what better fits your needs, i.e. what is "better" in a certain situation.

Sibyls answered 19/1, 2012 at 15:38 Comment(6)
No, both give you a vector containing {var1, var2, var3} by slightly different routes. The question is, is either route better than the other?Fauve
So, i think initialization of n default values is useless if I don't use them.Mcgray
@Ale: initialization to default values is indeed useless in that case, however for types like int default initializing them and then overwriting is really not that expansive, and push_back is typically more expensive then operator[], so in that case there probably is no clear answer for efficency. In the end I would say you really shouldn't worry about that for most casesLivvy
@Mike Seymour: I've thought about this variant. However, the title state 'initialization or reserve?', which is now what I am focusing my answer on.Sibyls
@Grizzly: push_back more expensive? i guess the opposite but i get the point, thanksMcgray
@Mcgray push_back() et al must check capacity (regardless of whether you reserve()) & increment size, per element. To me it's very valid to say the resulting branch/load/store/whatever in "option 2" must be weighed against the saving of not default-initialisation/reassignment from "option 1". Conceptually, it's easy to think you're doing less with #2, but the more I think about it, the less convinced I am. Obviously #2 is needed (or at least vastly preferable) for some classes, but for basic types it's much less clear. The real answer seems to be, as always: benchmark, or don't speculate.Czarra
C
53

The "best" way would be:

vector<int> vec = {var1, var2, var3};

available with a C++11 capable compiler.

Not sure exactly what you mean by doing things in a header or implementation files. A mutable global is a no-no for me. If it is a class member, then it can be initialized in the constructor initialization list.

Otherwise, option 1 would be generally used if you know how many items you are going to use and the default values (0 for int) would be useful.
Using at here means that you can't guarantee the index is valid. A situation like that is alarming itself. Even though you will be able to reliably detect problems, it's definitely simpler to use push_back and stop worrying about getting the indexes right.

In case of option 2, generally it makes zero performance difference whether you reserve memory or not, so it's simpler not to reserve*. Unless perhaps if the vector contains types that are very expensive to copy (and don't provide fast moving in C++11), or the size of the vector is going to be enormous.


* From Stroustrups C++ Style and Technique FAQ:

People sometimes worry about the cost of std::vector growing incrementally. I used to worry about that and used reserve() to optimize the growth. After measuring my code and repeatedly having trouble finding the performance benefits of reserve() in real programs, I stopped using it except where it is needed to avoid iterator invalidation (a rare case in my code). Again: measure before you optimize.

Consumptive answered 19/1, 2012 at 15:57 Comment(5)
I had my first iterator invalidation because I didn't use reserve. Thanks Stroustrups'quote!Mcgray
This is a very good answer. The only thing that should be added is part of Troyseph's answer: the fill constructor will not work with types that are not default-constructible, unless you provide a default value to be used for all of them. In many cases, this can be wildly inefficient. In such cases reserve() and push_back() are the better optionCowles
This option allows the vector to be constant as well: vector<int> const vec = {var1, var2, var3};Mckinzie
IMO, Stroustrup is simply wrong on this one. If you know how many elements you're going to have, or have a very good guess, then you should reserve(). At worst, it changes nothing. Otherwise, it avoids the pointless fanfare of incrementally getting up to a sufficient capacity. It only takes one line. Why not just do it? One argument I imagine is it makes code less generic, but if you're using a vector, it's because you know it's the best container in the absence of evidence for others, and you probably never need to change it; then, writing generically for its own sake is an antipattern.Czarra
Another thing is worth pointing out: while the initializer_list constructor is obviously the prettiest and obviously correct in such a trivial case as a few ints (and many others), it does require copying from list to container, which might be a significant overhead for larger class types, or simply impossible. So sometimes emplaceing is the only option, or might be faster even if it's not strictly needed.Czarra
H
14

While your examples are essentially the same, it may be that when the type used is not an int the choice is taken from you. If your type doesn't have a default constructor, or if you'll have to re-construct each element later anyway, I would use reserve. Just don't fall into the trap I did and use reserve and then the operator[] for initialisation!


Constructor

std::vector<MyType> myVec(numberOfElementsToStart);
int size = myVec.size();
int capacity = myVec.capacity();

In this first case, using the constructor, size and numberOfElementsToStart will be equal and capacity will be greater than or equal to them.

Think of myVec as a vector containing a number of items of MyType which can be accessed and modified, push_back(anotherInstanceOfMyType) will append it the the end of the vector.


Reserve

std::vector<MyType> myVec;
myVec.reserve(numberOfElementsToStart);
int size = myVec.size();
int capacity = myVec.capacity();

When using the reserve function, size will be 0 until you add an element to the array and capacity will be equal to or greater than numberOfElementsToStart.

Think of myVec as an empty vector which can have new items appended to it using push_back or emplace_back with no memory allocation for at least the first numberOfElementsToStart elements.

Note that push_back() and emplace_back() still require an internal check to ensure that size < capacity and to increment size, so you may want to weigh this against the cost of default construction.


List initialisation

std::vector<MyType> myVec{ var1, var2, var3 };

This is an additional option for initialising your vector, and while it is only feasible for very small vectors, it is a clear way to initialise a small vector with known values. size will be equal to the number of elements you initialised it with, and capacity will be equal to or greater than size. Modern compilers may optimise away the creation of temporary objects and prevent unnecessary copying.

Halves answered 13/10, 2015 at 10:47 Comment(4)
"with no overhead" is just wrong. There is obviously overhead for each push_back() in checking whether there is enough capacity and incrementing the size. Having reserve()d just means the check on capacity will always succeed, but it can't stop the library from having to check and obviously to upsize. It is then a very valid question whether those checks/increments will counterbalance the time saved not initially default-constructing the elements.Czarra
@Czarra Not all elements can be default constructed, and for some default construction may be costly but yes, I shall amend my answer. For future reference "is just wrong" sounds more aggressive than "is wrong". For some people that difference may mean they become defensive rather than cooperative.Halves
You have some useful information and analysis here, but it's a bit unfocused and doesn't directly address the comparison given in the question. Perhaps the answer you want to give, specifically for the code in the original question, is that they are the same. In that case, emphasize it more and give justification/explanation.Yearling
@ApollyssupportsMonica The most upvoted answer already covers the int example given in the question, however the title of the question is more general, so a lot of people may end up here looking for an answer to the title, not the question bodyHalves
C
8

Option 2 is better, as reserve only needs to reserve memory (3 * sizeof(T)), while the first option calls the constructor of the base type for each cell inside the container.

For C-like types it will probably be the same.

Coverture answered 19/1, 2012 at 15:26 Comment(12)
Option 2 is better, even for scalar types. The first will set each element to zero, while the second will leave the memory uninitialised.Fauve
What makes you think that reserve doesn't call constructors?Radarman
For which definition of "better"?Sibyls
@SigTerm: The definition of reserve is that it increases the capacity (i.e. the amount of allocated memory) if necessary, but does not change the size (i.e. the number of objects). Therefore, it cannot construct any objects, except to move them if the vector wasn't empty to start with.Fauve
@phresnel: Since both are correct, and neither particularly more readable than the other, the only sensible definition of "better" would be "more efficient".Fauve
@MikeSeymour: reserve() might do less work in not setting the elements to zero, but push_back is typically more expensive then operator[], so when it comes to performance it's really not that clear which is more efficient for easy types and therefore "better" (though for most cases it is obviously silly to worry about that)Livvy
@Coverture For how many elements? How did you measure? (I once measured for std::vector<double>, g++, a while back, and I found constructing the complete vector, then using [] to initialize each element, was the fastest solution.)Footie
@JamesKanze I wrote a Custom Type 'S' which logs calls to its constructors/destructor and tried both scenarios (with 3 and 1000 elements). reserve() did not call them. Don't forget to log inside the copy constructor if you try it out.Coverture
@MikeSeymour: Guessed so, but "more efficient" is as unsharp as "better" is. Maybe you need 100 default initialized Foobars to let your algorithm go off (maybe think of typed memory pools). In that case, the vector(100) might be more efficient.Sibyls
@Coverture And... We know that reserve doesn't call constructors. If constructing an object and then assigning to it is significantly slower than just copy constructing them, then the solution resulting in the least execution code will be different than for simply cases like double.Footie
@nob: "I tried it out" is not a good argument. If standard does not guarantee that reserve never calls a constructor, you'll be relying on undocumented feature which will backfire on different compiler. As far as I can tell, standard tells nothing about it.Radarman
reserve() is documented as increasing capacity() but not size(), and it's the size() that determines how many living, usable objects exist in the vector, so by definition reserve() can never construct any new elements, in the notional sense of "new", i.e. that it can never call any 'normal' constructors but only, possibly, copy or move constructors if it needs to reallocate.Czarra
G
4

How it Works

This is implementation specific however in general Vector data structure internally will have pointer to the memory block where the elements would actually resides. Both GCC and VC++ allocate for 0 elements by default. So you can think of Vector's internal memory pointer to be nullptr by default.

When you call vector<int> vec(N); as in your Option 1, the N objects are created using default constructor. This is called fill constructor.

When you do vec.reserve(N); after default constructor as in Option 2, you get data block to hold 3 elements but no objects are created unlike in option 1.

Why to Select Option 1

If you know the number of elements vector will hold and you might leave most of the elements to its default values then you might want to use this option.

Why to Select Option 2

This option is generally better of the two as it only allocates data block for the future use and not actually filling up with objects created from default constructor.

Gilbertina answered 6/6, 2016 at 0:38 Comment(7)
You clearly missed where OP's 2nd option calls .reserve(), which exists entirely to "eliminate unnecessary expansion of vector as you add elements." You also seem to imply .reserve()ing less than the implementation's default will reallocate, but that isn't required and would signal a very foolish implementation... And they know exactly the size. The Q is whether it's cheaper to default-initialise then reassign basic types - or to repeatedly push_back(), taking branches to check the .capacity() and so forth for every element. This seems to totally miss that and talk about other thingsCzarra
I think you misunderstood my answer. I just rewrote it which hopefully solves your confusion.Gilbertina
I don't find it convincing. I don't think a chance of weird implementations means it's always best to redundantly default-construct then reassign, without attention to (a) how it's just not possible for types that can only be constructed and not assigned, e.g. due to const or reference members and (b) benchmarks showing the overhead of reallocating (as we're ignoring what else pushing must do) in such weird libraries to outweigh that of default-constructing then reassigning. I haven't measured (b), but it seems neither has anyone else in the whole thread... So it's all just idle speculation.Czarra
Who do you mean by "it's always best to redundantly default-construct then reassign"? Not sure how you are interpreting things. If you know the number of elements that will go in vector then its always recommended to specify it in constructor. The reason some other systems like .Net does default non-zero allocation is because statistics over large number of program suggests that non-zero allocation would perform bit better.Gilbertina
"Recommended" where, by who? This isn't my interpretation but a plain fact: "option 1" is to initialise a vector filled with default values, then to reassign each element with the value it was always meant to have. What I'm trying to emphasise is that (a) doing so is not possible for all types & (b) we need more than speculation about whether doing that is reliably cheaper than "option 2", which itself means having to think about reallocation, the unmentioned overhead of pushing back (check capacity, increment size), &c. It may well be! But I don't see any convincing argument/evidence ITT.Czarra
I see what you are saying. I was too focused on example given in question which uses int and fill constructor would typically initialize the block with single assembly instruction. However for vector of complex objects, this would be really troublesome. I was also too focused on reallocations which however might be less worrysome than calling expensive default constructors. I've updated the answer. BTW, AFAIK, you can't put objects without public constructors in vector: #19826876Gilbertina
Requiring copy constructors is only needed by push_back/insert type functions if we can't move instead, and never by emplace type functions. IMO emplace_back() with arguments is the best to use wherever possible (where we're definitely adding elements, separate from the question here).Czarra
P
2

In the long run, it depends on the usage and numbers of the elements.

Run the program below to understand how the compiler reserves space:

vector<int> vec;
for(int i=0; i<50; i++)
{
  cout << "size=" << vec.size()  << "capacity=" << vec.capacity() << endl;
  vec.push_back(i);
}

size is the number of actual elements and capacity is the actual size of the array to imlement vector. In my computer, till 10, both are the same. But, when size is 43 the capacity is 63. depending on the number of elements, either may be better. For example, increasing the capacity may be expensive.

Pang answered 19/1, 2012 at 15:38 Comment(1)
I don't see how this is relevant. You talk about reallocation as size increases. The OP talked about a known size and in both cases avoided reallocating (by initialising or reserving space)Czarra
Y
2

Since it seems 5 years have passed and a wrong answer is still the accepted one, and the most-upvoted answer is completely useless (missed the forest for the trees), I will add a real response.

Method #1: we pass an initial size parameter into the vector, let's call it n. That means the vector is filled with n elements, which will be initialized to their default value. For example, if the vector holds ints, it will be filled with n zeros.

Method #2: we first create an empty vector. Then we reserve space for n elements. In this case, we never create the n elements and thus we never perform any initialization of the elements in the vector. Since we plan to overwrite the values of every element immediately, the lack of initialization will do us no harm. On the other hand, since we have done less overall, this would be the better* option.

* better - real definition: never worse. It's always possible a smart compiler will figure out what you're trying to do and optimize it for you.


Conclusion: use method #2.

Yearling answered 3/11, 2017 at 8:58 Comment(3)
And what is wrong with the accepted answer? It also says that the first method creates n entries with default value and the second only reserves the memory. Beside that the first one has the overhead of default initalisation and the boundary check the second one has the overhead in the push_back logic (chaning the current size and the check if the reserved memory has to be increased).Pietje
'Method #2 is better i.e. never worse' - this would be contradicted by your own other, far better answer a year later! I really liked that one. I'm not so sure about this one. It's possible that a not-so-smart compiler will not realise it doesn't have to check capacity vs size every time and will do so, which then wastes time loading, testing, and branching; whereas initialising then assigning never has to check capacity. As your other answer showed, benchmarking with realistic numbers of elements is the only way to assess this.Czarra
Probably this answer should be deleted by the author, in favour of their own other, far better answer.Lights
S
1

Another option is to Trust Your Compiler(tm) and do the push_backs without calling reserve first. It has to allocate some space when you start adding elements. Perhaps it does that just as well as you would?

It is "better" to have simpler code that does the same job.

Semibreve answered 19/1, 2012 at 17:31 Comment(3)
Your idea would contradict the actual. See example given at: en.cppreference.com/w/cpp/container/vector/reserveEmergence
That's an example. Other implementations might use a first allocation of 16 bytes, because they know that is the minimum heap allocation size anyway. And that would then work for 4 push_backs.Semibreve
Thanks for another example in you comment reply.Emergence
D
0

I think answer may depend on situation. For instance: Lets try to copy simple vector to another vector. Vector hold example class which has only integer. In first example lets use reserve.

#include <iostream>
#include <vector>
#include <algorithm>

class example
{
    public:
    // Copy constructor
    example(const example& p1)
    {
        std::cout<<"copy"<<std::endl;
        this->a = p1.a;

    }

    example(example&& o) noexcept
    {
        std::cout<<"move"<<std::endl;
        std::swap(o.a, this->a);
    }

    example(int a_)
    {
        std::cout<<"const"<<std::endl;
        a = a_;
    }

    example()
    {
        std::cout<<"Def const"<<std::endl;
    }
    
    int a;
};

int main()
{
    auto vec = std::vector<example>{1,2,3};

    auto vec2 = std::vector<example>{};
    vec2.reserve(vec.size());   

    auto dst_vec2 = std::back_inserter(vec2);
    std::cout<<"transform"<<std::endl;

    std::transform(vec.begin(), vec.end(), 
    dst_vec2, [](const example& ex){ return ex; });


}

For this case, transform will call copy and move constructors. The output of the transform part:

copy
move
copy
move
copy
move

Now lets remove the reserve and use the constructor.

#include <iostream>
#include <vector>
#include <algorithm>

class example
{
    public:
    // Copy constructor
    example(const example& p1)
    {
        std::cout<<"copy"<<std::endl;
        this->a = p1.a;

    }

    example(example&& o) noexcept
    {
        std::cout<<"move"<<std::endl;
        std::swap(o.a, this->a);
    }

    example(int a_)
    {
        std::cout<<"const"<<std::endl;
        a = a_;
    }

    example()
    {
        std::cout<<"Def const"<<std::endl;
    }
    
    int a;
};

int main()
{
    auto vec = std::vector<example>{1,2,3};

    std::vector<example> vec2(vec.size());

    auto dst_vec2 = std::back_inserter(vec2);
    std::cout<<"transform"<<std::endl;

    std::transform(vec.begin(), vec.end(), 
    dst_vec2, [](const example& ex){ return ex; });


}

And in this case transform part produces:

copy
move
move
move
move
copy
move
copy
move

As it is seen, for this specific case, reserve prevents extra move operations because there is no initialized object to move.

Dochandorrach answered 13/9, 2022 at 23:43 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.