treating memory returned by operator new(sizeof(T) * N) as an array
Asked Answered
I

4

11

In C one can allocate dynamic arrays using malloc(sizeof(T) * N) and then use pointer arithmetic to get elements at i offset in this dynamic array.

In C++ one can do similar using operator new() in the same way as malloc() and then placement new (for an example one can see solution for item 13 in a book "Exceptional C++: 47 engineering puzzles, programming problems, and solutions" by Herb Sutter). If you don't have one, the summary of the solution for this question would be:

T* storage = operator new(sizeof(T)*size);

// insert element    
T* p = storage + i;
new (p) T(element);

// get element
T* element = storage[i];

For me this looked legit since I'm asking for a chunk of memory with enough memory to hold N aligned elements of size = sizeof(T). Since sizeof(T) should return a size of element which is aligned, and they are laid one after another in a chunk of memory, using pointer arithmetic is OK here.

However I was then pointed to links like: http://eel.is/c++draft/expr.add#4 or http://eel.is/c++draft/intro.object#def:object and claiming that in C++ operator new() does not return an array object, so pointer arithmetic over what it has returned and using it as an array is undefined behavior as opposed to ANSI C.

I'm not this good at such low level stuff and I'm really trying to understand by reading this: https://www.ibm.com/developerworks/library/pa-dalign/ or this: http://jrruethe.github.io/blog/2015/08/23/placement-new/ but I still fail to understand if Sutter was just plain wrong?

I do understand that alignas make sense in constructions such as:

alignas(double) char array[sizeof(double)];

(c) http://georgeflanagin.com/alignas.php

If array appears to be not in a boundary of double (perhaps following char in a structure ran at 2-byte reading processor).

But this is different - I've requested memory from the heap/free storage especially requested operator new to return memory which will hold elements aligned to sizeof(T).

To summarize in case this was TL;DR:

  • Is it possible to use malloc() for dynamic arrays in C++?
  • Is it possible to use operator new() and placement new for dynamic arrays in older C++ which has no alignas keyword?
  • Is pointer arithmetic undefined behavior when used over memory returned by operator new()?
  • Is Sutter advising code which might break on some antique machine?

Sorry if this is dumb.

Iy answered 23/11, 2018 at 18:59 Comment(13)
Is it possible to use malloc() for dynamic arrays in C? – you wanted to write C++?Belanger
Yes. And I wanted to understand what the difference is between malloc in C and C++ in that case. Because I've seen operator new() implemented in terms of malloc() in some C++ headers in some version of GCC long ago, so operator new() would be just equivalent to malloc(), so can I use pointer arithmetic over it in C++ without alignas? I'm totally confused. :(Iy
@Iy I suppose the implementation is free to make use of undefined behavior if it gives more guarantees for itself. So whether GCC uses something in its library implementation (even assuming it is free of bugs and standard adherent) doesn't tell you anything about whether it is defined behavior in the standard.Cobia
Can you put the whole code here? Because the code you've presented would not even compile (at the first line, there is a missing cast at least).Lashley
For pointer arithmetic you don't need an array, the only thing you have to be careful of to avoid undefined behavior is to not move past the end of the pointer (one past your allocated size).Lanford
I am having trouble locating what I think is a relevant post. However I think this is fine because I am pretty sure the standard says a single object is considered as an array of one element with respect to pointer arithmetic.Berkshire
@Berkshire It says that in eel.is/c++draft/expr.add#footnote-85. I don't think the pointer is considered pointing to single object though. No object was constructed in the allocated memory.Cobia
@eukaryota Yes I think I misunderstood the question. Well do you not suppose some mileage can be gotten out of the expression (possibly-hypothetical) from the standard wording?Berkshire
Yes, as I see, that code has UB indeed. But, in my opinion, it's the standard which need to be fixed, not Herb's code. It would be interesting to know, why do we have such a restricting rule about pointer arithmetics.Lashley
@Lashley Maybe it is because the definition of object was recently changed to be more restrictive? Previously even uninitialized memory was an object.Berkshire
@Berkshire I don't really know, but in my amateur reading possibly-hypothetical refers to the hypothetical x[n] after the actual array and the qualifying if of 4.2 does not use it either.Cobia
@eukaryota I think it means for the purposes of arithmetic as long as the memory is allocated and could theoretically become an object then the arithmetic works. But its hard to be sureBerkshire
@Galik: As far as I remember, C++98 had the same rule, and it meant the same (in the sense that Herb's code was never well-defined).Lashley
H
4

The C++ standards contain an open issue that underlying representation of objects is not an "array" but a "sequence" of unsigned char objects. Still, everyone treats it as an array (which is intended), so it is safe to write the code like:

char* storage = static_cast<char*>(operator new(sizeof(T)*size));
// ...
char* p = storage + sizeof(T)*i;  // precondition: 0 <= i < size
new (p) T(element);

as long as void* operator new(size_t) returns a properly aligned value. Using sizeof-multiplied offsets to keep the alignment is safe.

In C++17, there is a macro STDCPP_DEFAULT_NEW_ALIGNMENT, which specifies the maximum safe alignment for "normal" void* operator new(size_t), and void* operator new(std::size_t size, std::align_val_t alignment) should be used if a larger alignment is required.

In earlier versions of C++, there is no such distinction, which means that void* operator new(size_t) needs to be implemented in a way that is compatible with the alignment of any object.

As to being able to do pointer arithmetic directly on T*, I am not sure it needs to be required by the standard. However, it is hard to implement the C++ memory model in such a way that it would not work.

Habiliment answered 23/11, 2018 at 20:52 Comment(12)
CWG 1701 has nothing related to the problem in the question. CWG 1701 is about object representation. The problem with allocation functions that they do not create objects. How resolution of the issue should help here?Sudan
everyone treats it as an array (which is intended) Where this "which is intended" is coming from? Note from the issue: An additional point of concern has been raised as to whether it is appropriate to refer to the constituent bytes of an object as being “objects” themselves. How this is intended to be an array when its elements should not be objects?Sudan
@LanguageLawyer, it's not true that allocation functions don't create objects. See the standard. Intended by the authors of the language, which follows from how they (and everyone else) use such constructs; if in doubt, you can ask them directly, their emails are not secret.Habiliment
@Habiliment See the standard. It does not say that object is created, it says its lifetime is started. See the standard, when object is created.Sudan
@LanguageLawyer, then "when the object is created" is a straw man. Semantically, there is no difference if an allocation function "creates" or "refers to" an array of bytes.Habiliment
@Habiliment No, it is not a straw man. It is intended. Your (very popular among ppl.) misinterpretation of the lifetime start rule that a myriad of objects magically appear in a storage of appropriate size and alignment clearly contradicts several rules such as when objects during their lifetime can haz the same address, which shows that such interpretation was not intended by the Committee.Sudan
@LanguageLawyer, no, it's your interpretation that they appear "magically". The standard explicitly says that their lifetime starts when the storage with the proper alignment and size is obtained. If you think that it contradicts something else in the standard, file a defect report.Habiliment
@Habiliment The standard explicitly says that their lifetime starts when the storage with the proper alignment and size is obtained. Yep. When an object is created, the first thing is that the storage obtained for it. And if there is no non-vacuous initialization, the lifetime of the object being created is started. This is the correct interpretation of the rule.Sudan
@Habiliment Anyway, here is a proposal from a Committee member saying "this maintains the status quo that malloc alone is not sufficient to create an object". You told "it's not true that allocation functions don't create objects. Intended by the authors of the language". As we can see, not intended.Sudan
@LanguageLawyer, C++ was created and is evolving as a language with one of its strongest selling points being the ability to work with POD objects not created by language constructs (from hardware registers to data in memory mapped files to objects created in the same process by code written in another language). Had the Committee once decided to disallow this ability, such a foolish decision would create a massive outcry in the industry, which would be impossible to miss.Habiliment
@Habiliment If objects magically appeared in any suitable storage, then the code in the OP post wouldn't have problems, because an array of unsigned char spanning the whole piece of allocated storage would have appeared there. But this is not the case. See the proposal in the top-rated answer.Sudan
@LanguageLawyer, the code in the OP post has no problems, except for those possibly relating to STDCPP_DEFAULT_NEW_ALIGNMENT. What has problems with this code is the interpretation that the objects in the current C++ can start their lifetimes only as a result of object-creating language constructs. While changing the language in a way that this interpretation becomes correct could seem to be a nice idea, it may unnecessarily break a lot of existing code, especially in freestanding implementations, giving nothing useful in return.Habiliment
S
10

The issue of pointer arithmetic on allocated memory, as in your example:

T* storage = static_cast<T*>(operator new(sizeof(T)*size));
// ...
T* p = storage + i;  // precondition: 0 <= i < size
new (p) T(element);

being technically undefined behaviour has been known for a long time. It implies that std::vector can't be implemented with well-defined behaviour purely as a library, but requires additional guarantees from the implementation beyond those found in the standard.

It was definitely not the intention of the standards committee to make std::vector unimplementable. Sutter is, of course, right that such code is intended to be well-defined. The wording of the standard needs to reflect that.

P0593 is a proposal that, if accepted into the standard, may be able to solve this problem. In the meantime, it is fine to keep writing code like the above; no major compiler will treat it as UB.

Edit: As pointed out in the comments, I should have stated that when I said storage + i will be well-defined under P0593, I was assuming that the elements storage[0], storage[1], ..., storage[i-1] have already been constructed. Although I'm not sure I understand P0593 well enough to conclude that it wouldn't also cover the case where those elements hadn't already been constructed.

Sidonius answered 23/11, 2018 at 19:28 Comment(13)
Hmm, why is P0593 relevant here? T can be any type. I think that proposal won't solve this problem.Lashley
std::vector is not unimplementable. It is arguably unimplementable in user code. The standard library can't be implemented in portable code, much less user-written code. That's one of the reasons that it comes with the compiler -- it can take advantage of known behavior of that compiler and the target OS.Reorganization
@PeteBecker That's what I meant. std::vector was not meant to be unimplementable in user code.Sidonius
@Lashley I made the assumption (which I probably should have stated) that storage[0], storage[1], ..., storage[i-1] have all be constructed already. In this case, P0593 implies that an array object with i elements is implicitly created, and storage + i is past the end, and is therefore well-defined. P0593 points out that it intends this to work for arrays of any type.Sidonius
What I meant is that P0593 is about types which the author calls "implicit lifetime types". So this proposal doesn't handle all types, it cannot be a general solution to this problem. But I'm not 100% sure about this either :)Lashley
But what is the reason of the UB? It's not pointer arithmetic but this std::bless whatever that is? The placement new? Can you boil this down to elementary blocks?Iy
But what about arithmetic on (uint8_t *)storage? Wouldn't it be well-defined, allowing a well-defined vector implementation?Beta
@HolyBlackCat: No. As far as I know, you need to reinterpret_cast it to uintptr_t, and do the arithmetic there. Which is of course implementation-defined behaviorLashley
@Lashley It says that an array type of any element type is an implicit lifetime type (regardless of whether the element type is an implicit lifetime type). Because if you already have a bunch of objects of type T lined up in memory, then it takes no additional code to create an array of T.Sidonius
Thanks, that actually makes sense! Now it's time to re-read that proposal for the 35th time :)Lashley
I'm not sure if I should create a new question, let me ask it here. Suppose we don't use the value returned by new. Can we use storage or p to access elements? Or we have to std::launder it first?Crow
@Crow That's a good question. I think std::launder is required - otherwise the original pointer remains an invalid pointer value, as there is no provision in the standard for it to automatically begin pointing to the newly created object. But I am not sure about this.Sidonius
Although I'm not sure I understand P0593 well enough to conclude that it wouldn't also cover the case where those elements hadn't already been constructed. It would be impossible to do this otherwise: buf_end_size = newbuf + sizeof(T) * size();. Here the pointer arithmetic is used to get a pointer that jumps to the end of an array of objects that don't exist yet. Am I wrong?Sural
H
4

The C++ standards contain an open issue that underlying representation of objects is not an "array" but a "sequence" of unsigned char objects. Still, everyone treats it as an array (which is intended), so it is safe to write the code like:

char* storage = static_cast<char*>(operator new(sizeof(T)*size));
// ...
char* p = storage + sizeof(T)*i;  // precondition: 0 <= i < size
new (p) T(element);

as long as void* operator new(size_t) returns a properly aligned value. Using sizeof-multiplied offsets to keep the alignment is safe.

In C++17, there is a macro STDCPP_DEFAULT_NEW_ALIGNMENT, which specifies the maximum safe alignment for "normal" void* operator new(size_t), and void* operator new(std::size_t size, std::align_val_t alignment) should be used if a larger alignment is required.

In earlier versions of C++, there is no such distinction, which means that void* operator new(size_t) needs to be implemented in a way that is compatible with the alignment of any object.

As to being able to do pointer arithmetic directly on T*, I am not sure it needs to be required by the standard. However, it is hard to implement the C++ memory model in such a way that it would not work.

Habiliment answered 23/11, 2018 at 20:52 Comment(12)
CWG 1701 has nothing related to the problem in the question. CWG 1701 is about object representation. The problem with allocation functions that they do not create objects. How resolution of the issue should help here?Sudan
everyone treats it as an array (which is intended) Where this "which is intended" is coming from? Note from the issue: An additional point of concern has been raised as to whether it is appropriate to refer to the constituent bytes of an object as being “objects” themselves. How this is intended to be an array when its elements should not be objects?Sudan
@LanguageLawyer, it's not true that allocation functions don't create objects. See the standard. Intended by the authors of the language, which follows from how they (and everyone else) use such constructs; if in doubt, you can ask them directly, their emails are not secret.Habiliment
@Habiliment See the standard. It does not say that object is created, it says its lifetime is started. See the standard, when object is created.Sudan
@LanguageLawyer, then "when the object is created" is a straw man. Semantically, there is no difference if an allocation function "creates" or "refers to" an array of bytes.Habiliment
@Habiliment No, it is not a straw man. It is intended. Your (very popular among ppl.) misinterpretation of the lifetime start rule that a myriad of objects magically appear in a storage of appropriate size and alignment clearly contradicts several rules such as when objects during their lifetime can haz the same address, which shows that such interpretation was not intended by the Committee.Sudan
@LanguageLawyer, no, it's your interpretation that they appear "magically". The standard explicitly says that their lifetime starts when the storage with the proper alignment and size is obtained. If you think that it contradicts something else in the standard, file a defect report.Habiliment
@Habiliment The standard explicitly says that their lifetime starts when the storage with the proper alignment and size is obtained. Yep. When an object is created, the first thing is that the storage obtained for it. And if there is no non-vacuous initialization, the lifetime of the object being created is started. This is the correct interpretation of the rule.Sudan
@Habiliment Anyway, here is a proposal from a Committee member saying "this maintains the status quo that malloc alone is not sufficient to create an object". You told "it's not true that allocation functions don't create objects. Intended by the authors of the language". As we can see, not intended.Sudan
@LanguageLawyer, C++ was created and is evolving as a language with one of its strongest selling points being the ability to work with POD objects not created by language constructs (from hardware registers to data in memory mapped files to objects created in the same process by code written in another language). Had the Committee once decided to disallow this ability, such a foolish decision would create a massive outcry in the industry, which would be impossible to miss.Habiliment
@Habiliment If objects magically appeared in any suitable storage, then the code in the OP post wouldn't have problems, because an array of unsigned char spanning the whole piece of allocated storage would have appeared there. But this is not the case. See the proposal in the top-rated answer.Sudan
@LanguageLawyer, the code in the OP post has no problems, except for those possibly relating to STDCPP_DEFAULT_NEW_ALIGNMENT. What has problems with this code is the interpretation that the objects in the current C++ can start their lifetimes only as a result of object-creating language constructs. While changing the language in a way that this interpretation becomes correct could seem to be a nice idea, it may unnecessarily break a lot of existing code, especially in freestanding implementations, giving nothing useful in return.Habiliment
F
1

To all of the widely used recent posix-compatible systems, that is, Windows, Linux (& Android ofc.), and MacOSX the followings apply

Is it possible to use malloc() for dynamic arrays in C++?

Yes it is. Using reinterpret_cast to convert the resulting void* to the desired pointer type is the best practice, and it yields in a dynamically allocated array like this: type *array = reinterpret_cast<type*>(malloc(sizeof(type)*array_size); Be careful, that in this case constructors are not called on array elements, therefore it is still an uninitialized storage, no matter what type is. Nor destructors are called when free is used for deallocations


Is it possible to use operator new() and placement new for dynamic arrays in older C++ which has no alignas keyword?

Yes, but you need to be aware of alignment in case of placement new, if you feed it with custom locations (i.e ones that do not come from malloc/new). Normal operator new, as well as malloc, will provide native word aligned memory areas (at least whenever allocation size >= wordsize). This fact and the one that structure layouts and sizes are determined so that alignment is properly considered, you don't need to worry about alignment of dyn arrays if malloc or new is used. One might notice, that word size is sometimes significantly smaller than the greatest built-in data type (which is typically long double), but it must be aligned the same way, since alignment is not about data size, but the bit width of addresses on memory bus for different access sizes.


Is pointer arithmetic undefined behavior when used over memory returned by operator new()?

Nope, as long as you respect the process' memory boundaries -- from this point of view new basically works the same way as malloc, moreover, new actually calls malloc in the vast majority of implementations in order to acquire the required area. As a matter of fact, pointer arithmetic as such is never invalid. However, result of an arithmetic expression that evaluates to a pointer might point to a location outside of permitted areas, but this is not the fault of pointer arithmetic, but of the flawed expression.


Is Sutter advising code which might break on some antique machine?

I don't think so, provided the right compiler is used. (don't compile avr instructions or 128-bit-wide memory mov's into a binary that's intended to run on a 80386) Of course, on different machines with different memory sizes and layouts the same literal address may access areas of different purpose/status/existence, but why would you use literal addresses unless you write driver code to a specific hardware?... :)

Flocculate answered 23/11, 2018 at 19:47 Comment(12)
But why in the @Brian answer link for the exact same example as Sutters (except for the implementation details) of a home-brewed vector it states that: "In practice, this code works across a range of existing implementations, but according to the C++ object model, undefined behavior occurs at points #a, #b, #c, #d, and #e, because they attempt to perform pointer arithmetic on a region of allocated storage that does not contain an array object."? (c) open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0593r2.htmlIy
What exactly do you mean with widely used posix-based systems?. Are you basing your answers on some standardization or implementation's guarantee?Cobia
@eukaryota none of them, but sheer experience. Applies to at least recent Windows, Linux (and Android ofc), and Mac OSX versions.Fernanda
@Iy It's simply not true... every allocation will eventually lead to a malloc call, and malloc implementation is provided by the platform's C library which has to solve these problems itself, otherwise you wouldn't be able to use struct arrays in C either...Fernanda
Omg, people... haven't you ever seen any assembly code? Pointers are simple unsigned integers, this fear using them in arithmetic is insane...Fernanda
@GézaTörök it's not fear of pointers or arithmetic on them, it's the fear of the complexity of C++, it's changing all the time and inability to make any sense of what is written in the standard (just like with tax regulations). There should be an issue at least for compiler writers since it exists, so while from the point of view of C++ user the answers to my original question are totally satisfying, from the point of view of a curious person I still don't understand what the problem is (maybe it's that C++ abstract machine cannot work on memory represented by french fries with pepper?)Iy
@Iy I still think it's totally overreacted. The good old storage classes that each of us has got used to long ago in the C era work exactly the same way in C++. Therefore everything about pointers works the exact same way. The only tricky part is object construction/destruction, but the single-object placement new/delete provides the opportunity to technically call the constructor and the destructor as a function. Of course you can shoot yourself in the leg if you try to, but we still have the necessary toolset and it can be used reliably.Fernanda
@Iy So I don't really understand either what the problem in fact is...Fernanda
@GézaTörök: The problem is that compilers will assume that if two pointers or lvalues "can't" identify the same storage, operations on them may be safely reordered relative to each other. The fact that a pointer is formed from another using pointer arithmetic is not always sufficient to convince some compilers that they might identify the same storage, if they think the object-typing rules would forbid the pointers from accessing the same storage.Recollection
@Recollection I think you are talking about very rare cases, for compiler versions before the introduction of strict aliasing. If this kind of misinterpretation was common, that would, for instance, render reinterpret_cast completely useless.Fernanda
@GézaTörök: To the contrary, the problem isn't with pre-standard compiler versions, but is instead with compilers that--rather than interpreting the "strict aliasing" rules as merely saying that compilers may assume that seemingly-unrelated pointers don't alias--instead interpret the rules as an invitation to ignore obvious relationships between pointers. The Standard doesn't specify that given float *p;, a compiler must treat *(uint32_t*)p += 0x08000000; as a potential access to any object of type float which might be identified by p, because the authors of the Standard...Recollection
...thought it obviosu that such constructs should be processed "In a documented fashion characteristic of the environment" in situations where that would be useful and practical, without regard for whether the Standard required it or not. The question of when to support such constructs was left as a Quality of Implementation issue, which the authors of the Standard thought the marketplace could judge better than the Committee. With regard to compilers people would actually pay for, I think they were right, but the bundling of gcc with Linux sheltered it from market forces.Recollection
N
0

You can do it with "old fashioned" malloc, which gives you a block of memory that fulfils the most restrictive alignment on the respective platform (e.g. that of a long long double). So you will be able to place any object into such a buffer without violating any alignment requirements.

Given that, you can use placement new for arrays of your type based on such a memory block:

struct MyType {
    MyType() {
        cout << "in constructor of MyType" << endl;
    }
    ~MyType() {
        cout << "in destructor of MyType" << endl;
    }
    int x;
    int y;
};

int main() {

    char* buffer = (char*)malloc(sizeof(MyType)*3);
    MyType *mt = new (buffer)MyType[3];

    for (int i=0; i<3; i++)  {
        mt[i].~MyType();
    }
    free(mt);
}

Note that - as always with placement new - you'll have to take care of calling the destructors explicitly and freeing the memory in a distinct step; You must not use the delete or delete[]-functions, which combine these two steps and thereby would free memory that they don't own.

Neuralgia answered 23/11, 2018 at 19:32 Comment(4)
But can I use operator new() in place of malloc() in your example?Iy
As far as I understand the array placement-new may require an unspecified memory overhead. So whether this code has defined behavior is implementation-dependent. See #8720925Cobia
@eukaryota note that both Sutter and open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0593r2.html examples to not use array placement new. They only use placement new for each element which they create inside of a memory chunk.Iy
@Iy Yes, I am referring specifically to the code example in this answer. The non-array placement-new is not allowed to have this overhead.Cobia

© 2022 - 2024 — McMap. All rights reserved.