Is it defined behavior to place exotically aligned objects in the coroutine state?
Asked Answered
N

3

12

Edit: Thanks for everyone's answer and replies. Language Lawyer's answer is technically the correct one so that's accepted, but Human-Compiler's answer is the only one that meets the criteria (getting 2+ points) for the bounty, or that is elaborated enough on the question's specific topic.


Full question

Is it defined behavior to have an object b placed in the coroutine state (by e.g. having it as a parameter, or preserving it across a suspension point), where alignof(b) > __STDCPP_DEFAULT_NEW_ALIGNMENT__?

Example:

inline constexpr size_t large_alignment =
    __STDCPP_DEFAULT_NEW_ALIGNMENT__ * 2;

struct alignas(large_alignment) behemoth {
  void attack();
  unsigned char data[large_alignment];
};

task<void> invade(task_queue &q) {
  behemoth b{};
  co_await submit_to(q);
  b.attack();
}

Explanation

When a coroutine is called, heap memory for the coroutine state is allocated via operator new.

This call to operator new may take one of the following forms:

  1. passing all arguments passed to the coroutine following the size requested, or if no such overloads can be found,
  2. passing just the size requested.

Whichever form the call takes, note that it doesn't use the overloads accepting a std::align_val_t, which are necessary to allocate memory that must be aligned more than __STDCPP_DEFAULT_NEW_ALIGNMENT__. Therefore, if an object whose alignment is larger than __STDCPP_DEFAULT_NEW_ALIGNMENT__ must be saved in the coroutine state, there should be no way to guarantee that the object will end up properly aligned in memory.


Experimentation

Godbolt

async f(): Assertion `reinterpret_cast<uintptr_t>(&b) % 32ull == 0' failed.

so it definitely doesn't work on GCC trunk (11.0.1 20210307). Replacing 32 with 16 (which equals __STDCPP_DEFAULT_NEW_ALIGNMENT__) eliminates this assertion failure.

godbolt.org cannot run Windows binaries, but the assertion fires with MSVC on my computer as well.

Nullifidian answered 9/3, 2021 at 12:29 Comment(7)
Related: I'd like to know the same for things like std::variant/any/optional?Parrott
@Parrott optional and variant don't seem to have this problem. [optional.optional.1]: "The contained value shall be allocated in a region... suitably aligned for the type T." [variant.variant.1]: "The contained value shall be allocated in a region... suitably aligned for all types in Types." It is easy to meet the alignment requirement for optional and variant by using either a union or aligned_storage_t<sizeof(T), alignof(T)> as the buffer, and their alignments automatically propagate to the enclosing types.Nullifidian
@Parrott Conversely, I cannot find any alignment requirement for any in [any.class]. However, in practice it would be quite difficult to not meet the right alignment anyway, as the new expression automatically use the align_val_t-accepting overloads for types that have extended alignment. MSVC's, libstdc++'s and libc++'s impls all seem to handle extended alignment correctly (Godbolt).Nullifidian
Dup of anything from stackoverflow.com/…Semitrailer
@LanguageLawyer Just because two questions have the same answer doesn't mean they're duplicates. Example: Why are perpetual motion machines impossible? vs How does a bomb calorimeter work?Nullifidian
So your point is: one can create 10 questions about different contexts: Are over-aligned types allowed in function bodies?, Are over-aligned types allowed in function parameters?, Are over-aligned types allowed as class members?, Are over-aligned types allowed in for loop initializer? etc. and neither of these questions should be closed as a dup?Semitrailer
@LanguageLawyer Short answer: No, they shouldn't. Long answer: No, because [c++] questions do not equal [language-lawyer] questions. Limitations of real-world impls are as important, if not more so, to the topics compared to the overarching standard. This question in particular should not be closed as dup because it illustrates a unique corner case of compiler support from major vendors, and highlights a potential standard defect (unless a rational can be given against permitting support for the extended-alignment alloc functions for allocating coro states).Nullifidian
S
0

From [basic.align]/3:

It is implementation-defined whether any extended alignments are supported and the contexts in which they are supported.

So the answer is: implementation-defined. The second part of the sentence sounds like it is possible that an implementation may support creating objects of over-aligned types in ordinary functions and not support in coroutines.

[basic.align]/9:

If a request for a specific extended alignment in a specific context is not supported by an implementation, the program is ill-formed.

Implementations which do not emit a diagnostic message when they don't support extended alignment are not conforming.

Semitrailer answered 13/3, 2021 at 17:8 Comment(6)
Another interesting context is "passed by value to a function".Isobaric
I disagree with this answer. The paragraphs for the definition of coroutines with storage is clear that it comes from the ::operator new(std::size_t) overload, which does not handle over-alignment. This would make it undefined behavior and not implementation-defined, because there is no way for an implementation to support this without deviating from the standardPistareen
@Human-Compiler I've already addressed your comment in the comment under your answer. I don't see where the standard says that an implementation shall allocate no more than sum of sizeofs of objects with automatic storage duration or smth like this. An implementation can allocate sizeof(over-aligned) + alignof(over-aligned) and place the over-aligned object at suitably aligned offset within the storage.Semitrailer
There is no place in the C++ language as far as I'm aware where this is, or has been, done -- even though it could be. Additionally this has to happen for the entire state of the stack that may be resumed, and you may have more than one over-aligned object. I'm not sure if you're actually suggesting that each one goes through this transformation, or that the compiler does one large transformation for all stack objects up to that point -- but either case seems ridiculous IMOPistareen
@Human-Compiler I'm not sure if you're actually suggesting that each one goes through this transformation, or that the compiler does one large transformation for all stack objects up to that point Is there any essential difference? I think this only affects necessary buffer size.Semitrailer
It's not that ridiculous. Just allocate an additional max(alignof(captured-things)) + sizeof(void*) bytes, and position the entire frame at a suitable offset within the allocation. The extra void* is to remember the start of the allocation so it can be freed.Isobaric
P
1

From my reading, this would be undefined behavior.

dcl.fct.def.coroutine/9 covers the lookup order for determining the allocation function that will be used should the coroutine need additional storage. The lookup order is quite clear:

An implementation may need to allocate additional storage for a coroutine. This storage is known as the coroutine state and is obtained by calling a non-array allocation function ([basic.stc.dynamic.allocation]).

The allocation function's name is looked up in the scope of the promise type. If this lookup fails, the allocation function's name is looked up in the global scope. If the lookup finds an allocation function in the scope of the promise type, overload resolution is performed on a function call created by assembling an argument list. The first argument is the amount of space requested, and has type std​::​size_­t. The lvalues p1pn are the succeeding arguments.

If no viable function is found ([over.match.viable]), overload resolution is performed again on a function call created by passing just the amount of space required as an argument of type std​::​size_­t.

(Emphasis mine)

This explicitly mentions that the new overload it will call must start with a std::size_t argument, and may optionally operate on a list of lvalue references p1, p2, ..., pn (if its found in the scope of the promise).

Since in your above example there is no custom operator new defined for the promise type, that means it must select ::operator new(std::size_t) as the overload.

As you already know, ::operator new is only guaranteed to be aligned to __STDCPP_DEFAULT_NEW_ALIGNMENT__ -- which is below the extended alignment required for the coroutine storage. This effectively makes any extended-aligned type in a coroutine be undefined behavior due to misalignment.

Because of how strict the wording is that it must call ::operator new(std::size_t), this should be consistent on any system that implements c++20 correctly. If an implementation chose to support extended-aligned types, it would technically be violating the standard by calling the wrong new overload (which would be an observable deviation).


Judging by the wording on the overload resolution for the allocation function, I think in a case where you require extended-alignment, you should be defining a member-based operator new for your promise that is aware of the possible alignment requirement.

Pistareen answered 13/3, 2021 at 5:43 Comment(5)
The problem is that the alignment needs of the coroutine state is determined by the coroutine function body, which I have no knowledge or control of even if I define class-scope operator new for my promise type. I can compute the maximum alignment among the coroutine parameters, but not for local variables.Nullifidian
list of lvalue references p1, p2, ..., pn Could you show the word «references» in the standard text?Semitrailer
How the fact that ::operator new(std::size_t) is used proves that over-aligned types are not supported? One can always allocate enough memory to place an over-aligned object at some offset.Semitrailer
@LanguageLawyer Yes, I thought about that solution some hours ago as well.Nullifidian
I also think, that "just the amount of space required" (vs "requested") quirky allows some room for interpretation further on here.Cult
S
0

From [basic.align]/3:

It is implementation-defined whether any extended alignments are supported and the contexts in which they are supported.

So the answer is: implementation-defined. The second part of the sentence sounds like it is possible that an implementation may support creating objects of over-aligned types in ordinary functions and not support in coroutines.

[basic.align]/9:

If a request for a specific extended alignment in a specific context is not supported by an implementation, the program is ill-formed.

Implementations which do not emit a diagnostic message when they don't support extended alignment are not conforming.

Semitrailer answered 13/3, 2021 at 17:8 Comment(6)
Another interesting context is "passed by value to a function".Isobaric
I disagree with this answer. The paragraphs for the definition of coroutines with storage is clear that it comes from the ::operator new(std::size_t) overload, which does not handle over-alignment. This would make it undefined behavior and not implementation-defined, because there is no way for an implementation to support this without deviating from the standardPistareen
@Human-Compiler I've already addressed your comment in the comment under your answer. I don't see where the standard says that an implementation shall allocate no more than sum of sizeofs of objects with automatic storage duration or smth like this. An implementation can allocate sizeof(over-aligned) + alignof(over-aligned) and place the over-aligned object at suitably aligned offset within the storage.Semitrailer
There is no place in the C++ language as far as I'm aware where this is, or has been, done -- even though it could be. Additionally this has to happen for the entire state of the stack that may be resumed, and you may have more than one over-aligned object. I'm not sure if you're actually suggesting that each one goes through this transformation, or that the compiler does one large transformation for all stack objects up to that point -- but either case seems ridiculous IMOPistareen
@Human-Compiler I'm not sure if you're actually suggesting that each one goes through this transformation, or that the compiler does one large transformation for all stack objects up to that point Is there any essential difference? I think this only affects necessary buffer size.Semitrailer
It's not that ridiculous. Just allocate an additional max(alignof(captured-things)) + sizeof(void*) bytes, and position the entire frame at a suitable offset within the allocation. The extra void* is to remember the start of the allocation so it can be freed.Isobaric
N
0

Just to recap on the responses this question has received so far:

  1. As @LanguageLawyer has pointed out, compilers have no obligation to support extended-alignment types in any context, and the extent to which those types are supported is allowed to vary among different contexts. Therefore, it is conformant for compilers to not guarantee that b is correctly aligned.

  2. Even though the standard mandates that the non-extended-alignment versions of operator new should always be used, a compiler could still manage to place the coroutine state at the correct alignment by overallocating, and choosing a sufficiently aligned address at run time. But again, it technically does not have to support anything.

  3. In today's status quo, OP's code sample definitely isn't correct, though in a more rigorous sense it is only undefined behavior when at run time a b actually gets misplaced and leads to some form of misaligned access (UB is a runtime property).

Nullifidian answered 18/3, 2021 at 10:49 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.