STL vectors with uninitialized storage?
Asked Answered
P

17

55

I'm writing an inner loop that needs to place structs in contiguous storage. I don't know how many of these structs there will be ahead of time. My problem is that STL's vector initializes its values to 0, so no matter what I do, I incur the cost of the initialization plus the cost of setting the struct's members to their values.

Is there any way to prevent the initialization, or is there an STL-like container out there with resizeable contiguous storage and uninitialized elements?

(I'm certain that this part of the code needs to be optimized, and I'm certain that the initialization is a significant cost.)

Also, see my comments below for a clarification about when the initialization occurs.

SOME CODE:

void GetsCalledALot(int* data1, int* data2, int count) {
    int mvSize = memberVector.size()
    memberVector.resize(mvSize + count); // causes 0-initialization

    for (int i = 0; i < count; ++i) {
        memberVector[mvSize + i].d1 = data1[i];
        memberVector[mvSize + i].d2 = data2[i];
    }
}
Pronghorn answered 18/9, 2008 at 20:31 Comment(7)
Note - using reserve() is not a solution, since you can't legally access the data that's at locations end() and above.Pronghorn
Another clarification: it's not that the constructor initializes the values to 0. It's that resize calls insert, which does.Pronghorn
Could you give us the struct declaration, too? Thanks... :-)Uralaltaic
NOTE: you can't access data that's uninitialized anyway. Same issue for vector<T> past .end() and uninitialized members of a T[]. But with a vector, chances are the debugging code tells you that now. The array code will fail silently on the customers PC.Emee
This is a good question. For some applications, it is important to realize that std::vector always initializes its elements, even if they are plain-old-data (POD).Gaul
Also see https://mcmap.net/q/16765/-avoiding-default-construction-of-elements-in-standard-containers/1969455 for an allocator approach. (good for POD types)Decrypt
The crux of the problem is that vector does "zero" initialization by default, because std::allocator only has value and zero initialization, but has no method for default initialization.Mllly
M
29

std::vector must initialize the values in the array somehow, which means some constructor (or copy-constructor) must be called. The behavior of vector (or any container class) is undefined if you were to access the uninitialized section of the array as if it were initialized.

The best way is to use reserve() and push_back(), so that the copy-constructor is used, avoiding default-construction.

Using your example code:

struct YourData {
    int d1;
    int d2;
    YourData(int v1, int v2) : d1(v1), d2(v2) {}
};

std::vector<YourData> memberVector;

void GetsCalledALot(int* data1, int* data2, int count) {
    int mvSize = memberVector.size();

    // Does not initialize the extra elements
    memberVector.reserve(mvSize + count);

    // Note: consider using std::generate_n or std::copy instead of this loop.
    for (int i = 0; i < count; ++i) {
        // Copy construct using a temporary.
        memberVector.push_back(YourData(data1[i], data2[i]));
    }
}

The only problem with calling reserve() (or resize()) like this is that you may end up invoking the copy-constructor more often than you need to. If you can make a good prediction as to the final size of the array, it's better to reserve() the space once at the beginning. If you don't know the final size though, at least the number of copies will be minimal on average.

In the current version of C++, the inner loop is a bit inefficient as a temporary value is constructed on the stack, copy-constructed to the vectors memory, and finally the temporary is destroyed. However the next version of C++ has a feature called R-Value references (T&&) which will help.

The interface supplied by std::vector does not allow for another option, which is to use some factory-like class to construct values other than the default. Here is a rough example of what this pattern would look like implemented in C++:

template <typename T>
class my_vector_replacement {

    // ...

    template <typename F>
    my_vector::push_back_using_factory(F factory) {
        // ... check size of array, and resize if needed.

        // Copy construct using placement new,
        new(arrayData+end) T(factory())
        end += sizeof(T);
    }

    char* arrayData;
    size_t end; // Of initialized data in arrayData
};

// One of many possible implementations
struct MyFactory {
    MyFactory(int* p1, int* p2) : d1(p1), d2(p2) {}
    YourData operator()() const {
        return YourData(*d1,*d2);
    }
    int* d1;
    int* d2;
};

void GetsCalledALot(int* data1, int* data2, int count) {
    // ... Still will need the same call to a reserve() type function.

    // Note: consider using std::generate_n or std::copy instead of this loop.
    for (int i = 0; i < count; ++i) {
        // Copy construct using a factory
        memberVector.push_back_using_factory(MyFactory(data1+i, data2+i));
    }
}

Doing this does mean you have to create your own vector class. In this case it also complicates what should have been a simple example. But there may be times where using a factory function like this is better, for instance if the insert is conditional on some other value, and you would have to otherwise unconditionally construct some expensive temporary even if it wasn't actually needed.

Microreader answered 18/9, 2008 at 22:12 Comment(0)
I
17

In C++11 (and boost) you can use the array version of unique_ptr to allocate an uninitialized array. This isn't quite an stl container, but is still memory managed and C++-ish which will be good enough for many applications.

auto my_uninit_array = std::unique_ptr<mystruct[]>(new mystruct[count]);
Inverse answered 27/8, 2013 at 14:43 Comment(0)
P
11

C++0x adds a new member function template emplace_back to vector (which relies on variadic templates and perfect forwarding) that gets rid of any temporaries entirely:

memberVector.emplace_back(data1[i], data2[i]);
Phosgene answered 9/5, 2010 at 18:48 Comment(0)
D
8

To clarify on reserve() responses: you need to use reserve() in conjunction with push_back(). This way, the default constructor is not called for each element, but rather the copy constructor. You still incur the penalty of setting up your struct on stack, and then copying it to the vector. On the other hand, it's possible that if you use

vect.push_back(MyStruct(fieldValue1, fieldValue2))

the compiler will construct the new instance directly in the memory thatbelongs to the vector. It depends on how smart the optimizer is. You need to check the generated code to find out.

Disillusionize answered 18/9, 2008 at 20:56 Comment(1)
It turns out that the optimizer for gcc, at level O3, is not smart enough to avoid the copy.Pronghorn
P
8

You can use boost::noinit_adaptor to default initialize new elements (which is no initialization for built-in types):

std::vector<T, boost::noinit_adaptor<std::allocator<T>> memberVector;

As long as you don't pass an initializer into resize, it default initializes the new elements.

Palaeogene answered 6/5, 2020 at 17:44 Comment(0)
P
5

So here's the problem, resize is calling insert, which is doing a copy construction from a default constructed element for each of the newly added elements. To get this to 0 cost you need to write your own default constructor AND your own copy constructor as empty functions. Doing this to your copy constructor is a very bad idea because it will break std::vector's internal reallocation algorithms.

Summary: You're not going to be able to do this with std::vector.

Ply answered 18/9, 2008 at 20:55 Comment(1)
This is the real issue. std::vector ought to realize it doesn't have to do any initialization if T has a trivial default constructor. Thanks for pointing out that the copy constructor is what's doing the unnecessary work here.Zayin
A
4

You can use a wrapper type around your element type, with a default constructor that does nothing. E.g.:

template <typename T>
struct no_init
{
    T value;

    no_init() { static_assert(std::is_standard_layout<no_init<T>>::value && sizeof(T) == sizeof(no_init<T>), "T does not have standard layout"); }

    no_init(T& v) { value = v; }
    T& operator=(T& v) { value = v; return value; }

    no_init(no_init<T>& n) { value = n.value; }
    no_init(no_init<T>&& n) { value = std::move(n.value); }
    T& operator=(no_init<T>& n) { value = n.value; return this; }
    T& operator=(no_init<T>&& n) { value = std::move(n.value); return this; }

    T* operator&() { return &value; } // So you can use &(vec[0]) etc.
};

To use:

std::vector<no_init<char>> vec;
vec.resize(2ul * 1024ul * 1024ul * 1024ul);
Antemeridian answered 28/2, 2017 at 13:52 Comment(1)
Will it be slow when program is compiled with no optimizations? May it slow down access in debug mode?Mireyamiriam
R
4

I tested a few of the approaches suggested here. I allocated a huge set of data (200GB) in one container/pointer:

Compiler/OS:

g++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0

Settings: (c++-17, -O3 optimizations)

g++ --std=c++17 -O3

I timed the total program runtime with linux-time

1.) std::vector:

#include <vector>

int main(){
  constexpr size_t size = 1024lu*1024lu*1024lu*25lu;//25B elements = 200GB
  std::vector<size_t> vec(size);
}
real    0m36.246s
user    0m4.549s
sys     0m31.604s

That is 36 seconds.

2.) std::vector with boost::noinit_adaptor

#include <vector>
#include <boost/core/noinit_adaptor.hpp>

int main(){
  constexpr size_t size = 1024lu*1024lu*1024lu*25lu;//25B elements = 200GB
  std::vector<size_t,boost::noinit_adaptor<std::allocator<size_t>>> vec(size);
}

real    0m0.002s
user    0m0.001s
sys     0m0.000s

So this solves the problem. Just allocating without initializing costs basically nothing (at least for large arrays).

3.) std::unique_ptr<T[]>:

#include <memory>

int main(){
  constexpr size_t size = 1024lu*1024lu*1024lu*25lu;//25B elements = 200GB
  auto data = std::unique_ptr<size_t[]>(new size_t[size]);
}

real    0m0.002s
user    0m0.002s
sys     0m0.000s

So basically the same performance as 2.), but does not require boost. I also tested simple new/delete and malloc/free with the same performance as 2.) and 3.).

So the default-construction can have a huge performance penalty if you deal with large data sets. In practice you want to actually initialize the allocated data afterwards. However, some of the performance penalty still remains, especially if the later initialization is performed in parallel. E.g., I initialize a huge vector with a set of (pseudo)random numbers:

(now I use fopenmp for parallelization on a 24 core AMD Threadripper 3960X)

g++ --std=c++17-fopenmp -O3

1.) std::vector:

#include <vector>
#include <random>

int main(){
  constexpr size_t size = 1024lu*1024lu*1024lu*25lu;//25B elements = 200GB
  std::vector<size_t> vec(size);
  #pragma omp parallel
  {
    std::minstd_rand0 gen(42);
    #pragma omp for schedule(static)
    for (size_t i = 0; i < size; ++i) vec[i] = gen();
  }
}
real    0m41.958s
user    4m37.495s
sys     0m31.348s

That is 42s, only 6s more than the default initialization. The problem is, that the initialization of std::vector is sequential.

2.) std::vector with boost::noinit_adaptor:

#include <vector>
#include <random>
#include <boost/core/noinit_adaptor.hpp>

int main(){
  constexpr size_t size = 1024lu*1024lu*1024lu*25lu;//25B elements = 200GB
  std::vector<size_t,boost::noinit_adaptor<std::allocator<size_t>>> vec(size);
  #pragma omp parallel
  {
    std::minstd_rand0 gen(42);
    #pragma omp for schedule(static)
    for (size_t i = 0; i < size; ++i) vec[i] = gen();
  }
}
real    0m10.508s
user    1m37.665s
sys     3m14.951s

So even with the random-initialization, the code is 4 times faster because we can skip the sequential initialization of std::vector.

So if you deal with huge data sets and plan to initialize them afterwards in parallel, you should avoid using the default std::vector.

Rhines answered 26/4, 2022 at 13:50 Comment(0)
U
3

Err...

try the method:

std::vector<T>::reserve(x)

It will enable you to reserve enough memory for x items without initializing any (your vector is still empty). Thus, there won't be reallocation until to go over x.

The second point is that vector won't initialize the values to zero. Are you testing your code in debug ?

After verification on g++, the following code:

#include <iostream>
#include <vector>

struct MyStruct
{
   int m_iValue00 ;
   int m_iValue01 ;
} ;

int main()
{
   MyStruct aaa, bbb, ccc ;

   std::vector<MyStruct> aMyStruct ;

   aMyStruct.push_back(aaa) ;
   aMyStruct.push_back(bbb) ;
   aMyStruct.push_back(ccc) ;

   aMyStruct.resize(6) ; // [EDIT] double the size

   for(std::vector<MyStruct>::size_type i = 0, iMax = aMyStruct.size(); i < iMax; ++i)
   {
      std::cout << "[" << i << "] : " << aMyStruct[i].m_iValue00 << ", " << aMyStruct[0].m_iValue01 << "\n" ;
   }

   return 0 ;
}

gives the following results:

[0] : 134515780, -16121856
[1] : 134554052, -16121856
[2] : 134544501, -16121856
[3] : 0, -16121856
[4] : 0, -16121856
[5] : 0, -16121856

The initialization you saw was probably an artifact.

[EDIT] After the comment on resize, I modified the code to add the resize line. The resize effectively calls the default constructor of the object inside the vector, but if the default constructor does nothing, then nothing is initialized... I still believe it was an artifact (I managed the first time to have the whole vector zerooed with the following code:

aMyStruct.push_back(MyStruct()) ;
aMyStruct.push_back(MyStruct()) ;
aMyStruct.push_back(MyStruct()) ;

So... :-/

[EDIT 2] Like already offered by Arkadiy, the solution is to use an inline constructor taking the desired parameters. Something like

struct MyStruct
{
   MyStruct(int p_d1, int p_d2) : d1(p_d1), d2(p_d2) {}
   int d1, d2 ;
} ;

This will probably get inlined in your code.

But you should anyway study your code with a profiler to be sure this piece of code is the bottleneck of your application.

Uralaltaic answered 18/9, 2008 at 20:34 Comment(7)
I wrote a note above. It's not the constructor of vector that initializes to 0. It's resize() that does.Pronghorn
In this case MyStruct has a trivial constructor so nothing is initialized. This may be different than the OP's situation.Popelka
I think you're on the right track. I have no constructor defined in the struct, so its default constructor (I believe) zero-initializes. I'll check if adding a default constructor that does nothing solves the problem.Pronghorn
If you have no constructor defined and all elements are POD types, then the constructor does nothing. If elements aren't POD's then it would simply call their default constructors.Popelka
Greg Rogers is right. My guess is that the memory was "zero" because of some process initialization independant of the code I did write. In C++, you do not pay for something that you don't use. So if you're writting C-like code, you should not have overhead. And vectors are quite good at that.Uralaltaic
It appears that vector has let us down in this case. We do pay for initialization, even if we don't need or want it. This is guaranteed by the semantics of insert() which is called by resize(). The value used for initialization is based on the whatever happens to be in the MyStruct passed to resize(). Since you didn't specify anything when you called resize(), the default constructor was used. Since the default constructor does nothing in this case, you might get zeros or you might get something else. Either way, you pay for the initialization performed by resize().Gaul
@nobar : It depends on the MyStruct constructor. If it is empty, and inlined, and MyStruct members have a zero cost constructors, then the C++ compiler will optimise it to nothing. Then, we won't pay for it. Only for the resize.Uralaltaic
M
1

From your comments to other posters, it looks like you're left with malloc() and friends. Vector won't let you have unconstructed elements.

Merlynmermaid answered 18/9, 2008 at 20:57 Comment(0)
M
1

From your code, it looks like you have a vector of structs each of which comprises 2 ints. Could you instead use 2 vectors of ints? Then

copy(data1, data1 + count, back_inserter(v1));
copy(data2, data2 + count, back_inserter(v2));

Now you don't pay for copying a struct each time.

Merlynmermaid answered 18/9, 2008 at 21:15 Comment(1)
Interesting. This might just work -- it seems like it would avoid the construction of an intermediate object.Gaul
A
1

If you really insist on having the elements uninitialized and sacrifice some methods like front(), back(), push_back(), use boost vector from numeric . It allows you even not to preserve existing elements when calling resize()...

Aeneous answered 7/9, 2011 at 16:56 Comment(0)
U
1

I'm not sure about all those answers that says it is impossible or tell us about undefined behavior.

Sometime, you need to use an std::vector. But sometime, you know the final size of it. And you also know that your elements will be constructed later. Example : When you serialize the vector contents into a binary file, then read it back later. Unreal Engine has its TArray::setNumUninitialized, why not std::vector ?

To answer the initial question "Is there any way to prevent the initialization, or is there an STL-like container out there with resizeable contiguous storage and uninitialized elements?"

yes and no. No, because STL doesn't expose a way to do so.

Yes because we're coding in C++, and C++ allows to do a lot of thing. If you're ready to be a bad guy (and if you really know what you are doing). You can hijack the vector.

Here a sample code that works only for the Windows's STL implementation, for another platform, look how std::vector is implemented to use its internal members :

// This macro is to be defined before including VectorHijacker.h. Then you will be able to reuse the VectorHijacker.h with different objects.
#define HIJACKED_TYPE SomeStruct

// VectorHijacker.h
#ifndef VECTOR_HIJACKER_STRUCT
#define VECTOR_HIJACKER_STRUCT

struct VectorHijacker
{
    std::size_t _newSize;
};

#endif


template<>
template<>
inline decltype(auto) std::vector<HIJACKED_TYPE, std::allocator<HIJACKED_TYPE>>::emplace_back<const VectorHijacker &>(const VectorHijacker &hijacker)
{
    // We're modifying directly the size of the vector without passing by the extra initialization. This is the part that relies on how the STL was implemented.
    _Mypair._Myval2._Mylast = _Mypair._Myval2._Myfirst + hijacker._newSize;
}

inline void setNumUninitialized_hijack(std::vector<HIJACKED_TYPE> &hijackedVector, const VectorHijacker &hijacker)
{
    hijackedVector.reserve(hijacker._newSize);
    hijackedVector.emplace_back<const VectorHijacker &>(hijacker);
}

But beware, this is hijacking we're speaking about. This is really dirty code, and this is only to be used if you really know what you are doing. Besides, it is not portable and relies heavily on how the STL implementation was done.

I won't advise you to use it because everyone here (me included) is a good person. But I wanted to let you know that it is possible contrary to all previous answers that stated it wasn't.

Ultimo answered 5/10, 2020 at 14:51 Comment(1)
Note that I made the code on Visual Studio 2019 Community 16.7.5. And I used the latest draft... And conformance mode disabled (permissive-) (because sadly, template and conformance mode is not a really good combination in a lot of cases). Finally, it relies on the way the platform implemented the STL, therefore, you'll need to adapt the solution if you want to use it (but you won't dirty your hands, will you?).Ultimo
B
0

Use the std::vector::reserve() method. It won't resize the vector, but it will allocate the space.

Bondmaid answered 18/9, 2008 at 20:32 Comment(0)
E
0

Do the structs themselves need to be in contiguous memory, or can you get away with having a vector of struct*?

Vectors make a copy of whatever you add to them, so using vectors of pointers rather than objects is one way to improve performance.

Epididymis answered 18/9, 2008 at 20:44 Comment(1)
They have to be contiguous. They're in a buffer that's about to be sent over the network as one huge chunk.Pronghorn
C
0

I don't think STL is your answer. You're going to need to roll your own sort of solution using realloc(). You'll have to store a pointer and either the size, or number of elements, and use that to find where to start adding elements after a realloc().

int *memberArray;
int arrayCount;
void GetsCalledALot(int* data1, int* data2, int count) {
    memberArray = realloc(memberArray, sizeof(int) * (arrayCount + count);
    for (int i = 0; i < count; ++i) {
        memberArray[arrayCount + i].d1 = data1[i];
        memberArray[arrayCount + i].d2 = data2[i];
    }
    arrayCount += count;
}
Complexity answered 18/9, 2008 at 20:53 Comment(0)
D
0

I would do something like:

void GetsCalledALot(int* data1, int* data2, int count)
{
  const size_t mvSize = memberVector.size();
  memberVector.reserve(mvSize + count);

  for (int i = 0; i < count; ++i) {
    memberVector.push_back(MyType(data1[i], data2[i]));
  }
}

You need to define a ctor for the type that is stored in the memberVector, but that's a small cost as it will give you the best of both worlds; no unnecessary initialization is done and no reallocation will occur during the loop.

Dun answered 18/9, 2008 at 21:59 Comment(1)
This doesn't seem to solve the problem since it uses a temporary MyType() and copies that into the vector. There is still a double initialization.Gaul

© 2022 - 2024 — McMap. All rights reserved.