Symmetric transfer does not prevent stack-overflow for C++20 coroutines
Asked Answered
L

1

6

According to the blog post C++ Coroutines: Understanding Symmetric Transfer symmetric transfer allows you to suspend one coroutine and resume another without consuming any additional stack space. This prevents stack-overflows, which can occur when coroutines contain loops and co_await tasks that can potentially complete synchronously within the body of that loop.

Even though the following code example uses symmetric transfer, it crashes due to a stack-overflow. Please note that the code below is a minimal example to reproduce the stack-overflow: e.g., if I include the definition of the destructor of type Type in the header file, then I don't get a stack-overflow.

// type.h
#pragma once

struct Type {
  ~Type();
};
// type.cc
#include "type.h"

Type::~Type() {}
// main.cc
#include <cstdint>
#include <exception>
#include <type_traits>
#include <utility>

#include "type.h"

#if __has_include(<coroutine>)  // when using g++
#include <coroutine>
namespace coro {
using std::coroutine_handle;
using std::noop_coroutine;
using std::suspend_always;
}  // namespace coro
#elif __has_include(<experimental/coroutine>)  // when using clang++
#include <experimental/coroutine>
namespace coro {
using std::experimental::coroutine_handle;
using std::experimental::noop_coroutine;
using std::experimental::suspend_always;
}  // namespace coro
#endif

template <typename T = void>
class Task {
 public:
  struct PromiseBase {
    friend struct final_awaitable;

    struct final_awaitable {
      bool await_ready() const noexcept { return false; }

      template <typename PROMISE>
      coro::coroutine_handle<> await_suspend(
          coro::coroutine_handle<PROMISE> coro) noexcept {
        if (coro.promise().m_continuation) {
          return coro.promise().m_continuation;
        } else {
          // The top-level task started from within main() does not have a
          // continuation. This will give control back to the main function.
          return coro::noop_coroutine();
        }
      }

      void await_resume() noexcept {}
    };

    coro::suspend_always initial_suspend() noexcept { return {}; }

    auto final_suspend() noexcept { return final_awaitable{}; }

    void unhandled_exception() noexcept { std::terminate(); }

    void set_continuation(coro::coroutine_handle<> continuation) noexcept {
      m_continuation = continuation;
    }

   private:
    coro::coroutine_handle<> m_continuation;
  };

  struct PromiseVoid : public PromiseBase {
    auto get_return_object() { return coroutine_handle_t::from_promise(*this); }

    void return_void() noexcept {}

    void result() {}
  };

  struct PromiseT : public PromiseBase {
    auto get_return_object() { return coroutine_handle_t::from_promise(*this); }

    void return_value(T&& v) { value = std::move(v); }

    T&& result() && { return std::move(value); }

    T value;
  };

  using promise_type =
      std::conditional_t<std::is_same_v<T, void>, PromiseVoid, PromiseT>;

  using coroutine_handle_t = coro::coroutine_handle<promise_type>;

  Task(coroutine_handle_t coroutine) : m_coroutine(coroutine) {}

  ~Task() {
    if (m_coroutine) {
      m_coroutine.destroy();
    }
  }

  void start() noexcept { m_coroutine.resume(); }

  auto operator co_await() const noexcept { return awaitable{m_coroutine}; }

 private:
  struct awaitable {
    coroutine_handle_t m_coroutine;

    awaitable(coroutine_handle_t coroutine) noexcept : m_coroutine(coroutine) {}

    bool await_ready() const noexcept { return false; }

    coro::coroutine_handle<> await_suspend(
        coro::coroutine_handle<> awaitingCoroutine) noexcept {
      m_coroutine.promise().set_continuation(awaitingCoroutine);
      return m_coroutine;
    }

    auto await_resume() { return std::move(m_coroutine.promise()).result(); }
  };
  coroutine_handle_t m_coroutine;
};

Task<Type> coro2() { co_return Type{}; }

Task<> coro1() { auto s = co_await coro2(); }

Task<> test() {
  for (std::uint64_t i = 0; i != 50000000; ++i) {
    co_await coro1();
  }
}

int main() {
  auto task = test();
  task.start();
}

I compile the code using clang++ version 12.0.1 and g++ version 11.1.0:

clang++-12 main.cc type.cc -std=c++20 -stdlib=libc++ -O3 -fsanitize=address
g++-11 main.cc type.cc -std=c++20 -O3 -fsanitize=address

Here is the truncated output for clang++:

$ ./a.out 

AddressSanitizer:DEADLYSIGNAL
=================================================================
==20846==ERROR: AddressSanitizer: stack-overflow on address 0x7ffc76b1aff8 (pc 0x0000004cb7ab bp 0x7ffc76b1b050 sp 0x7ffc76b1afa0 T0)
    #0 0x4cb7ab in coro1() (.resume) (/home/leonard/Desktop/hiwi/async_io_uring/stack_overflow/a.out+0x4cb7ab)
    #1 0x4cbe4a in test() (.resume) (/home/leonard/Desktop/hiwi/async_io_uring/stack_overflow/a.out+0x4cbe4a)
    #2 0x4cbe4a in test() (.resume) (/home/leonard/Desktop/hiwi/async_io_uring/stack_overflow/a.out+0x4cbe4a)
    #3 0x4cbe4a in test() (.resume) (/home/leonard/Desktop/hiwi/async_io_uring/stack_overflow/a.out+0x4cbe4a)
    #4 0x4cbe4a in test() (.resume) (/home/leonard/Desktop/hiwi/async_io_uring/stack_overflow/a.out+0x4cbe4a)
    #5 0x4cbe4a in test() (.resume) (/home/leonard/Desktop/hiwi/async_io_uring/stack_overflow/a.out+0x4cbe4a)
    #6 0x4cbe4a in test() (.resume) (/home/leonard/Desktop/hiwi/async_io_uring/stack_overflow/a.out+0x4cbe4a)
    #7 0x4cbe4a in test() (.resume) (/home/leonard/Desktop/hiwi/async_io_uring/stack_overflow/a.out+0x4cbe4a)
    #8 0x4cbe4a in test() (.resume) (/home/leonard/Desktop/hiwi/async_io_uring/stack_overflow/a.out+0x4cbe4a)
    #9 0x4cbe4a in test() (.resume) (/home/leonard/Desktop/hiwi/async_io_uring/stack_overflow/a.out+0x4cbe4a)
    #10 0x4cbe4a in test() (.resume) (/home/leonard/Desktop/hiwi/async_io_uring/stack_overflow/a.out+0x4cbe4a)
    #11 0x4cbe4a in test() (.resume) (/home/leonard/Desktop/hiwi/async_io_uring/stack_overflow/a.out+0x4cbe4a)
    #12 0x4cbe4a in test() (.resume) (/home/leonard/Desktop/hiwi/async_io_uring/stack_overflow/a.out+0x4cbe4a)
    #13 0x4cbe4a in test() (.resume) (/home/leonard/Desktop/hiwi/async_io_uring/stack_overflow/a.out+0x4cbe4a)
    #14 0x4cbe4a in test() (.resume) (/home/leonard/Desktop/hiwi/async_io_uring/stack_overflow/a.out+0x4cbe4a)
    #15 0x4cbe4a in test() (.resume) (/home/leonard/Desktop/hiwi/async_io_uring/stack_overflow/a.out+0x4cbe4a)
    #16 0x4cbe4a in test() (.resume) (/home/leonard/Desktop/hiwi/async_io_uring/stack_overflow/a.out+0x4cbe4a)
    #17 0x4cbe4a in test() (.resume) (/home/leonard/Desktop/hiwi/async_io_uring/stack_overflow/a.out+0x4cbe4a)
    #18 0x4cbe4a in test() (.resume) (/home/leonard/Desktop/hiwi/async_io_uring/stack_overflow/a.out+0x4cbe4a)
    #19 0x4cbe4a in test() (.resume) (/home/leonard/Desktop/hiwi/async_io_uring/stack_overflow/a.out+0x4cbe4a)
    #20 0x4cbe4a in test() (.resume) (/home/leonard/Desktop/hiwi/async_io_uring/stack_overflow/a.out+0x4cbe4a)
    #21 0x4cbe4a in test() (.resume) (/home/leonard/Desktop/hiwi/async_io_uring/stack_overflow/a.out+0x4cbe4a)
    #22 0x4cbe4a in test() (.resume) (/home/leonard/Desktop/hiwi/async_io_uring/stack_overflow/a.out+0x4cbe4a)
    #23 0x4cbe4a in test() (.resume) (/home/leonard/Desktop/hiwi/async_io_uring/stack_overflow/a.out+0x4cbe4a)
    #24 0x4cbe4a in test() (.resume) (/home/leonard/Desktop/hiwi/async_io_uring/stack_overflow/a.out+0x4cbe4a)
    #25 0x4cbe4a in test() (.resume) (/home/leonard/Desktop/hiwi/async_io_uring/stack_overflow/a.out+0x4cbe4a)
...

Here is the truncated output for g++:

$ ./a.out

AddressSanitizer:DEADLYSIGNAL
=================================================================
==21434==ERROR: AddressSanitizer: stack-overflow on address 0x7fff2904dff8 (pc 0x7fd5f7825180 bp 0x7fff2904e880 sp 0x7fff2904dff0 T0)
    #0 0x7fd5f7825180 in __sanitizer::BufferedStackTrace::UnwindImpl(unsigned long, unsigned long, void*, bool, unsigned int) ../../../../src/libsanitizer/asan/asan_stack.cpp:57
    #1 0x7fd5f781b0eb in __sanitizer::BufferedStackTrace::Unwind(unsigned long, unsigned long, void*, bool, unsigned int) ../../../../src/libsanitizer/sanitizer_common/sanitizer_stacktrace.h:122
    #2 0x7fd5f781b0eb in operator delete(void*) ../../../../src/libsanitizer/asan/asan_new_delete.cpp:160
    #3 0x5643118400b7 in _Z5coro2v.destroy(coro2()::_Z5coro2v.frame*) (/home/leonard/Desktop/hiwi/async_io_uring/stack_overflow/a.out+0x20b7)
    #4 0x564311840e36 in _Z5coro1v.actor(coro1()::_Z5coro1v.frame*) (/home/leonard/Desktop/hiwi/async_io_uring/stack_overflow/a.out+0x2e36)
    #5 0x56431183fe20 in _Z5coro2v.actor(coro2()::_Z5coro2v.frame*) (/home/leonard/Desktop/hiwi/async_io_uring/stack_overflow/a.out+0x1e20)
    #6 0x564311840f15 in _Z5coro1v.actor(coro1()::_Z5coro1v.frame*) (/home/leonard/Desktop/hiwi/async_io_uring/stack_overflow/a.out+0x2f15)
    #7 0x564311841741 in _Z4testv.actor(test()::_Z4testv.frame*) (/home/leonard/Desktop/hiwi/async_io_uring/stack_overflow/a.out+0x3741)
    #8 0x564311840f15 in _Z5coro1v.actor(coro1()::_Z5coro1v.frame*) (/home/leonard/Desktop/hiwi/async_io_uring/stack_overflow/a.out+0x2f15)
    #9 0x56431183fe20 in _Z5coro2v.actor(coro2()::_Z5coro2v.frame*) (/home/leonard/Desktop/hiwi/async_io_uring/stack_overflow/a.out+0x1e20)
    #10 0x564311840f15 in _Z5coro1v.actor(coro1()::_Z5coro1v.frame*) (/home/leonard/Desktop/hiwi/async_io_uring/stack_overflow/a.out+0x2f15)
    #11 0x564311841741 in _Z4testv.actor(test()::_Z4testv.frame*) (/home/leonard/Desktop/hiwi/async_io_uring/stack_overflow/a.out+0x3741)
    #12 0x564311840f15 in _Z5coro1v.actor(coro1()::_Z5coro1v.frame*) (/home/leonard/Desktop/hiwi/async_io_uring/stack_overflow/a.out+0x2f15)
    #13 0x56431183fe20 in _Z5coro2v.actor(coro2()::_Z5coro2v.frame*) (/home/leonard/Desktop/hiwi/async_io_uring/stack_overflow/a.out+0x1e20)
    #14 0x564311840f15 in _Z5coro1v.actor(coro1()::_Z5coro1v.frame*) (/home/leonard/Desktop/hiwi/async_io_uring/stack_overflow/a.out+0x2f15)
    #15 0x564311841741 in _Z4testv.actor(test()::_Z4testv.frame*) (/home/leonard/Desktop/hiwi/async_io_uring/stack_overflow/a.out+0x3741)
    #16 0x564311840f15 in _Z5coro1v.actor(coro1()::_Z5coro1v.frame*) (/home/leonard/Desktop/hiwi/async_io_uring/stack_overflow/a.out+0x2f15)
    #17 0x56431183fe20 in _Z5coro2v.actor(coro2()::_Z5coro2v.frame*) (/home/leonard/Desktop/hiwi/async_io_uring/stack_overflow/a.out+0x1e20)
    #18 0x564311840f15 in _Z5coro1v.actor(coro1()::_Z5coro1v.frame*) (/home/leonard/Desktop/hiwi/async_io_uring/stack_overflow/a.out+0x2f15)
    #19 0x564311841741 in _Z4testv.actor(test()::_Z4testv.frame*) (/home/leonard/Desktop/hiwi/async_io_uring/stack_overflow/a.out+0x3741)
    #20 0x564311840f15 in _Z5coro1v.actor(coro1()::_Z5coro1v.frame*) (/home/leonard/Desktop/hiwi/async_io_uring/stack_overflow/a.out+0x2f15)
    #21 0x56431183fe20 in _Z5coro2v.actor(coro2()::_Z5coro2v.frame*) (/home/leonard/Desktop/hiwi/async_io_uring/stack_overflow/a.out+0x1e20)
    #22 0x564311840f15 in _Z5coro1v.actor(coro1()::_Z5coro1v.frame*) (/home/leonard/Desktop/hiwi/async_io_uring/stack_overflow/a.out+0x2f15)
    #23 0x564311841741 in _Z4testv.actor(test()::_Z4testv.frame*) (/home/leonard/Desktop/hiwi/async_io_uring/stack_overflow/a.out+0x3741)
    #24 0x564311840f15 in _Z5coro1v.actor(coro1()::_Z5coro1v.frame*) (/home/leonard/Desktop/hiwi/async_io_uring/stack_overflow/a.out+0x2f15)
    #25 0x56431183fe20 in _Z5coro2v.actor(coro2()::_Z5coro2v.frame*) (/home/leonard/Desktop/hiwi/async_io_uring/stack_overflow/a.out+0x1e20)

Interestingly, if I compile with clang++-12 main.cc type.cc -std=c++20 -stdlib=libc++ -O0 -fsanitize=address the program does not trigger a stack-overflow and exits without any errors. Furthermore, if I omit -fsanitize=address, then I get a segmentation fault when using -O3 and no error at all when using -O0.

Can anyone tell me what I am doing wrong?

Lenni answered 8/5, 2021 at 10:22 Comment(0)
M
4

I faced a similar issue when playing around with coroutines. I am not 100% certain of the reason why the stack builds up but this is what I think might happen.

First of all, I don't think symmetric transfer is a given, it depends on compiler optimization and in some cases it might be difficult for the compiler to make this tail-call transformation. One of the reason could be because of the non-trivial destructor that lays in another compilation unit for Type (this is just a guess).

Reading the blog post you mentioned, it says: "The bool-returning version, however, can have a slight win in terms of optimisability in some cases compared to the symmetric-transfer form.", so it might be because compiler support is not fully mature yet (?) and it might be a good alternative to try the bool-returning form instead.

I would love to have a good answer on this problem as well, just trying to give my opinion based on my current finding at the moment, so please don't take this answer as absolute truth.


Edit:

Here is a workaround that prevents the stack-overflow. It uses the bool-returning version of the await_suspend() function. Unfortunately, the workaround introduces other problems. For example, the Task type is not thread-safe anymore. For further information look at the section "The Coroutines TS solution" of the blog post C++ Coroutines: Understanding Symmetric Transfer.

// in main.cc
struct PromiseBase {
// ...
  struct final_awaitable {
  // ...
    template <typename PROMISE>
    void await_suspend(coro::coroutine_handle<PROMISE> coro) noexcept {
      if (coro.promise().m_continuation &&
          std::exchange(coro.promise().ready, true)) {
        // coro did not complete synchronously, therefore we need to resume
        // the continuation
        coro.promise().m_continuation.resume();
      }
    }
  // ...
  };

  bool ready{false};
// ...
};
// in main.cc
struct awaitable {
// ...
    // The bool returning version of await_suspend resumes awaitingCoroutine
    // without consuming any additional stack-space if the value false is
    // returned. Otherwise, it returns control to the caller/resumer of
    // awaitingCoroutine.
    bool await_suspend(coro::coroutine_handle<> awaitingCoroutine) noexcept {
      m_coroutine.promise().set_continuation(awaitingCoroutine);
      m_coroutine.resume();
      // resume awaitingCoroutine if m_coroutine completed synchronously
      return !std::exchange(m_coroutine.promise().ready, true);
    }
// ...
};
Mertiemerton answered 16/5, 2021 at 14:20 Comment(1)
The blog post states that "the compiler guarantees that this will always be a tail-call, regardless of whether optimisations are enabled or not". Therefore, I also believe that the compiler support is just not fully mature yet. I tried your suggestion of using the bool returning version and this actually prevents the stack-overflow. I will edit your answer to include the necessary modifications to my code.Lenni

© 2022 - 2024 — McMap. All rights reserved.