How does the capacity of std::vector grow automatically? What is the rate?
Asked Answered
R

7

40

I had been going through the Book: C++ Primer, Third Edition By Stanley B. Lippman, Josée Lajoie, found 1 mistake in the program given under the Article 6.3 How a vector Grows Itself, this program missed a "<" in the couts:

#include <vector>
#include <iostream>

using namespace std;

int main() {
    vector<int> ivec;
    cout < "ivec: size: " < ivec.size() < " capacity: "  < ivec.capacity() < endl;
    
    for (int ix = 0; ix < 24; ++ix) {
        ivec.push_back(ix);
        cout < "ivec: size: " < ivec.size()
        < " capacity: "  < ivec.capacity() < endl;
    }    
}

Later within that article:

"Under the Rogue Wave implementation, both the size and the capacity of ivec after its definition are 0. On inserting the first element, however, ivec's capacity is 256 and its size is 1."

But, on correcting and running the code i get the following output:


ivec: size: 0 capacity: 0
ivec[0]=0 ivec: size: 1 capacity: 1
ivec[1]=1 ivec: size: 2 capacity: 2
ivec[2]=2 ivec: size: 3 capacity: 4
ivec[3]=3 ivec: size: 4 capacity: 4
ivec[4]=4 ivec: size: 5 capacity: 8
ivec[5]=5 ivec: size: 6 capacity: 8
ivec[6]=6 ivec: size: 7 capacity: 8
ivec[7]=7 ivec: size: 8 capacity: 8
ivec[8]=8 ivec: size: 9 capacity: 16
ivec[9]=9 ivec: size: 10 capacity: 16
ivec[10]=10 ivec: size: 11 capacity: 16
ivec[11]=11 ivec: size: 12 capacity: 16
ivec[12]=12 ivec: size: 13 capacity: 16
ivec[13]=13 ivec: size: 14 capacity: 16
ivec[14]=14 ivec: size: 15 capacity: 16
ivec[15]=15 ivec: size: 16 capacity: 16
ivec[16]=16 ivec: size: 17 capacity: 32
ivec[17]=17 ivec: size: 18 capacity: 32
ivec[18]=18 ivec: size: 19 capacity: 32
ivec[19]=19 ivec: size: 20 capacity: 32
ivec[20]=20 ivec: size: 21 capacity: 32
ivec[21]=21 ivec: size: 22 capacity: 32
ivec[22]=22 ivec: size: 23 capacity: 32
ivec[23]=23 ivec: size: 24 capacity: 32

Is the capacity increasing with the formula 2^N where N is the initial capacity? Please explain.

Rarotonga answered 8/3, 2011 at 12:6 Comment(0)
F
99

The rate at which the capacity of a vector grows is required by the standard to be exponential (which, IMHO, is over-specification). The standard specifies this in order to meet the amortized constant time requirement for the push_back operation. What amortized constant time means and how exponential growth achieves this is interesting.

Every time a vector's capacity is grown the elements need to be copied. If you 'amortize' this cost out over the lifetime of the vector, it turns out that if you increase the capacity by an exponential factor you end up with an amortized constant cost.

This probably seems a bit odd, so let me explain to you how this works...

  • size: 1 capacity 1 - No elements have been copied, the cost per element for copies is 0.
  • size: 2 capacity 2 - When the vector's capacity was increased to 2, the first element had to be copied. Average copies per element is 0.5
  • size: 3 capacity 4 - When the vector's capacity was increased to 4, the first two elements had to be copied. Average copies per element is (2 + 1 + 0) / 3 = 1.
  • size: 4 capacity 4 - Average copies per element is (2 + 1 + 0 + 0) / 4 = 3 / 4 = 0.75.
  • size: 5 capacity 8 - Average copies per element is (3 + 2 + 1 + 1 + 0) / 5 = 7 / 5 = 1.4
  • ...
  • size: 8 capacity 8 - Average copies per element is (3 + 2 + 1 + 1 + 0 + 0 + 0 + 0) / 8 = 7 / 8 = 0.875
  • size: 9 capacity 16 - Average copies per element is (4 + 3 + 2 + 2 + 1 + 1 + 1 + 1 + 0) / 9 = 15 / 9 = 1.67
  • ...
  • size 16 capacity 16 - Average copies per element is 15 / 16 = 0.938
  • size 17 capacity 32 - Average copies per element is 31 / 17 = 1.82

As you can see, every time the capacity jumps, the number of copies goes up by the previous size of the array. But because the array has to double in size before the capacity jumps again, the number of copies per element always stays less than 2.

If you increased the capacity by 1.5 * N instead of by 2 * N, you would end up with a very similar effect, except the upper bound on the copies per element would be higher (I think it would be 3).

I suspect an implementation would choose 1.5 over 2 both to save a bit of space, but also because 1.5 is closer to the golden ratio. I have an intuition (that is currently not backed up by any hard data) that a growth rate in line with the golden ratio (because of its relationship to the fibonacci sequence) will prove to be the most efficient growth rate for real-world loads in terms of minimizing both extra space used and time.

Ferocity answered 8/3, 2011 at 12:19 Comment(11)
+1 for explaining what amortized constant time means. It is important to note that implementations do not choose exponential growth, that is required by the standard (push_back has to be amortized constant time), they just choose the factor by which they grow the value of K (that is required by the standard to be greater than 1).Underwing
@David Rodríguez - dribeas: I incorporated your comments into my answer.Ferocity
@Omnifarious: your intuition wrt the golden ratio is right. Alexandrescu had published measures on comp.lang.c++.moderated backing it with data if I recall correctly. Though its argument that you could sometimes grow "in place" because allocators allocate by power of 2 always seem weird to me (what's a growth in place if not a missed opportunity to actually grow more the first time without any space lost ?)Kester
@Matthieu M.: "what's a growth in place if not a missed opportunity to actually grow more the first time without any space lost?". It's a tradeoff. If you grow the first time and later don't use the additional capacity, you have just wasted memory.Bobine
One advantage of being on, or slightly below, the golden ratio is that the discarded memory blocks will eventually add up to be large enough to be reused. If you always double the size, you need a fresh block each time.Austria
@Rafal, @Bo: Ah! I finally got it. I had thought that the allocator would reserve 2**n bytes anyway (which is the case for a BiBOP allocator), but I had forgotten that the traditional approach of using blocks of different sizes at the same place could benefit from this. So my comment only apply to BiBOP allocators, sorry :pKester
@MatthieuM. do you have a link or source to that data and presentation?Telamon
@CMCDragonkai: No sorry :xKester
@DavidRodríguez-dribeas: The amortized constant time requirement does not require exponential growth. For example, on a 64-bit implementation you could just put each vector in your program on a 2^30 byte boundary and add pages as they grow. This would work for almost every vector. Then you would have 0 copies when a vector grew, and easily meet the requirement and each time it would only be growing by a page size increment.Ferocity
@Omnifarious: I fear the allocator protocol does not allow for growth in the way that you say. std::vector uses the allocator, and the allocator does not support growing -> it needs to allocate a different buffer and copy, which requires exponential growth to be amortised linear. You could conceive an allocator (extension) that supported this, but the allocator is an extension point for std::vector so it would still need to support exponential growth for user provided allocators.Underwing
@DavidRodríguez-dribeas - That's a good point. Perhaps a future standard could address that issue. Hmm...Ferocity
S
14

To be able to provide amortized constant time insertions at the end of the std::vector, the implementation must grow the size of the vector (when needed) by a factor K>1 (*), such that when trying to append to a vector of size N that is full, the vector grows to be K*N.

Different implementations use different constants K that provide different benefits, in particular most implementations go for either K = 2 or K = 1.5. A higher K will make it faster as it will require less grows, but it will at the same time have a greater memory impact. As an example, in gcc K = 2, while in VS (Dinkumware) K = 1.5.

(*) If the vector grew by a constant quantity, then the complexity of push_back would become linear instead of amortized constant. For example, if the vector grew by 10 elements when needed, the cost of growing (copy of all element to the new memory address) would be O( N / 10 ) (every 10 elements, move everything) or O( N ).

Spermogonium answered 8/3, 2011 at 12:30 Comment(0)
F
2

Just to add some mathematic proof on the time complexity on vector::push_back, say the size of vector is n, what we care about here is the number of copies happened so far, say y, notice the copy happens every time you grow the vector.

Grow by factor of K

  y = K^1 + K^2 + K^3 ... K^log(K, n)
K*y =     + K^2 + K^3 ... K^log(K, n) + K*K^log(K, n)

K*y-y = K*K^log(K, n) - K
y = K(n-1)/(K-1) = (K/(K-1))(n-1)

T(n) = y/n = (K/(K-1)) * (n-1)/n < K/(K-1) = O(1)

K/(K-1) is a constant, and see the most common cases:

  • K=2, T(n) = 2/(2-1) = 2
  • K=1.5, T(n) = 1.5/(1.5-1) = 3

and actually there is a reason of choosing K as 1.5 or 2 in different implementations, see this graph: as T(n) reaching the minimum when K is around 2, there is not much benefit on using a larger K, at the cost of allocating more memory

Grow by constant quantity of C

y = C + 2*C + 3*C + 4*C +  ... (n/C) * C
  = C(1+2+3+...+n/C), say m = n/C
  = C*(m*(m-1)/2)
  = n(m-1)/2

T(n) = y/n = (n(m-1)/2)/n = (m-1)/2 = n/2C - 1/2 = O(n)

As we could see it is liner

Favoritism answered 4/4, 2019 at 20:56 Comment(1)
This is not correct. K^log(K,n) = n for all K, so your sum is not a geometric progression. The actual formula for the total number of copies made is approximately T(n) = (b^ceil( log(n)/log(b) ) - 1)/(b-1) - (1/2)*(ceil(b) - floor(b))*ceil( log(n)/log(b) ), where n is the number of insertions and b is the value after which the vector is resized (K in your notation). This formula is exact for integer b. With some basic algebra you can show that T(n)/n < b/(b-1) for integer b, and in the limit of large n for rational b.Bride
W
1

The capacity of the vector is completely implementation-dependent, no one can tell how it's growing..

Whited answered 8/3, 2011 at 12:9 Comment(1)
It is implementation dependent, but not completely so, there is a requirement that it has to grow by a factor K>1, or else the push_back amortized constant time cost would not be achieved.Underwing
E
1

Are you using the "Rogue Wave" implementation?

How capacity grows is up to the implementation. Yours use 2^N.

Edenedens answered 8/3, 2011 at 12:9 Comment(3)
-1 : doesn't tell the OP something (s)he doesn't already know.Ferocity
@Omnifarious: Tells him that capacity growth is implementation dependent, which explains why he doesn't get the results he expects.Edenedens
The precise base is up to the implementation. Exponential growth is not; it's how you can achieve the mandated amortized constant time push_back.Vertebra
M
0

Yes, the capacity doubles each time it is exceeded. This is implementation dependent.

Misconstruction answered 8/3, 2011 at 12:9 Comment(0)
L
0

before pushing back an element the vector check if the size is greater than it's capacity like bellow

i will explain it with reserve function :


void    push_back(const value_type &val)   //push_back actual prototype
{
   if (size_type < 10)
        reserve(size_type + 1);
   else if (size_type > (_capacity / 4 * 3))
        reserve(_capacity + (this->_capacity / 4));
   //then the vector get filled with value
}

size_type : the vector size.

_capacity : the vector _capacity.

Leandroleaning answered 4/11, 2022 at 10:7 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.