Is it possible to allocatate uninialized array in a way that does not result in UB?
Asked Answered
K

2

6

When implementing certain data structures in C++ one needs to be able to create an array that has uninitialized elements. Because of that, having

buffer = new T[capacity];

is not suitable, as new T[capacity] initializes the array elements, which is not always possible (if T does not have a default constructor) or desired (as constructing objects might take time). The typical solution is to allocate memory and use placement new.

For that, if we know the number of elements is known (or at least we have an upper bound) and allocate on stack, then, as far as I am aware, one can use an aligned array of bytes or chars, and then use std::launder to access the members.

alignas(T) std::byte buffer[capacity];

However, it solves the problem only for stack allocations, but it does not solve the problem for heap alloations. For that, I assume one needs to use aligned new, and write something like this:

auto memory =  ::operator new(sizeof(T) * capacity, std::align_val_t{alignof(T)});

and then cast it either to std::byte* or unsigned char* or T*.

// not sure what the right type for reinterpret cast should be
buffer = reinterpret_cast(memory);

However, there are several things that are not clear to me.

  1. The result reinterpret_cast<T*>(ptr) is defined if ptr points an object that is pointer-interconvertible with T. (See this answer or https://eel.is/c++draft/basic.types#basic.compound-3) for more detail. I assume, that converting it to T* is not valid, as T is not necessarily pointer-interconvertible with result of new. However, is it well defined for char* or std::byte?
  2. When converting the result of new to a valid pointer type (assuming it is not implementation defined), is it treated as a pointer to first element of array, or just a pointer to a single object? While, as far as I know, it rarely (if at all) matters in practice, there is a semantic difference, an expression of type pointer_type + integer is well defined only if pointed element is an array member, and if the result of arithmetic points to another array element. (see https://eel.is/c++draft/expr.add#4).
  3. As for lifetimes are concerned, an object of type array unsigned char or std::byte can provide storage for result of placement new (https://eel.is/c++draft/basic.memobj#intro.object-3), however is it defined for arrays of other types?
  4. As far as I knowT::operator new and T::operator new[] expressions call ::operator new or ::operator new[] behind the scenes. Since the result of builtin new is void, how conversion to the right type is done? Are these implementation based or we have well defined rules to handle these?
  5. When freeing the memory, should one use
::operator delete(static_cast<void*>(buffer), sizeof(T) * capacity, std::align_val_t{alignof(T)});

or there is another way?

PS: I'd probably use the standard library for these purposes in real code, however I try to understand how things work behind the scenes.

Thanks.

Kakaaba answered 14/3, 2021 at 11:47 Comment(2)
"as new T[] initializes the array elements" No, it doesn't. new T[]() would, but not new T[]. I mean, it will default initialize them, so if a default constructor exists, it will be called. But if T is a trivial type, it will be left uninitialized. So what exactly do you mean by "uninitialized" here? Do you mean that there are no actual Ts, or do you want Ts to exist but have uninitialized values?Exosphere
I am interested in having space for instances of T without constructing them. Since they might be destructed later, then 'no actual T' is the correct term. I corrected the new T statement.Kakaaba
B
1

pointer-interconvertibility

Regarding pointer-interconvertibility, it doesn't matter if you use T * or {[unsigned] char|std::byte} *. You will have to cast it to T * to use it anyway.

Note that you must call std::launder (on the result of the cast) to access the pointed T objects. The only exception is the placement-new call that creates the objects, because they don't exist yet. The manual destructor call is not an exception.

The lack of pointer-interconvertibility would only be a problem if you didn't use std::launder.

When converting the result of new to a valid pointer type (assuming it is not implementation defined), is it treated as a pointer to first element of array, or just a pointer to a single object?

If you want to be extra safe, store the pointer as {[unsigned] char|std::byte} * and reinterpret_cast it after peforming any pointer arithmetic.

an object of type array unsigned char or std::byte can provide storage for result of placement new

The standard doesn't say anywhere that "providing storage" is required for placement-new to work. I think this term is defined solely to be used in definitions of other terms in the standard.

Consider [basic.life]/example-2 where operator= uses placement-new to reconstruct an object in place, even though type T doesn't "provide storage" for the same type T.

Since the result of builtin new is void, how conversion to the right type is done?

Not sure what the standard has to say about it, but what else can it be other than reinterpret_cast?

freeing the memory

Your approach looks correct, but I think you don't have to pass the size.

Browse answered 14/3, 2021 at 15:21 Comment(11)
I agree that I need std::launder on the result, but theoretical point is pointer arithmetic.Kakaaba
While I am not sure that I interpret <eel.is/c++draft/intro.object#10> correctly, but new(... ) implicitly constructs an array of byte or char while it does not implicitly construct an array of T. As for 'providing storage' my wording is confusing. What I mean is that using placement new or explicit destruction will not "mess" an array of unsigned ints, whilie it might "mess" an array of T.Kakaaba
After calling (TArray + i)->~T() I have an array in invalid state, which is not an array any longer, hence pointer arithmetic theoretically is not defined, and calculations for position of placement new are possibly not defined.Kakaaba
@RazielMagius That's why I suggest doing arithmetic on char * instead.Browse
"Not sure what the standard has to say about it, but what else can it be other than reinterpret_cast." A very clumsy solution would be to allocate char array at the location created by operator new as it would guarantee the proper alignment, but I am sure it is not the way it is supposed to be.Kakaaba
there are two delete operators that take alignment as parameter, one with size and another one without. Since I used size when allocating using new, I thought that correct way to delete memory would also require providing size?Kakaaba
@RazielMagius "allocate char array at the location created by operator new as it would guarantee the proper alignment" What do you mean by "allocating an array" there, and what does it have to do with alignment? If the type is overaligned, the overload of the allocation function with the alignment parameter is used, so the returned pointer must already be properly aligned.Browse
@RazielMagius "Since I used size when allocating" There's no way to not use it. "thought that correct way to delete memory would also require providing size" Cppreference is not entirely clear on this, but it says that delete-expression calls the size-less overload, so you doing it should be fine too.Browse
If I am writing a generic collection, it is right assumption that I do not know anything about alignment. I am not sure, but ::operator new(sizeof(T) * capacity, std::align_val_t{alignof(T)}) is supposed to work both for overaligned and "normal" types? I did provide both size and alignment when allocating, what do you mean by "There is no way to use it"?Kakaaba
@RazielMagius Yes, it should work for non-overaligned types too. "what do you mean by ..." I said there's no way to not use it. How would you allocate memory without specifying size?Browse
Missed the not part, my bad.Kakaaba
S
0

I think your premise may be incorrect. If T is a class the default constructor should be called. However that can be blank and if your class contains all POD (plain old data) then nothing will be initialized. I actually count on this all the time because I often don't want things initialized for performance reasons.

I believe there are are a few caveats with this for global data and so forth where some things are zero initialized. But in general heap stuff isn't. You can test it and you will find there's a bunch of garbage in memory, at least when compiled in release mode. Some compilers will initialize memory in debug mode but that's done outside constructors.

For instance you can set data in a custom placement new function and if it's POD it will still be there in the constructor. Some people will argue this is UB but I think the standard says "nothing is done" for POD, which implies no initialization.

Solmization answered 14/3, 2021 at 14:52 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.