Async Await Recursion in .NET 4.8 causes StackoverflowException (not in .Net Core 3.1!)
Asked Answered
L

1

6

Why does the following code cause a StackOverflowException in .Net4.8 with only a 17-depth recursion? However this does not happen in NetCore 3.1 (I can set the count to 10_000 and it still works)

class Program
{
  static async Task Main(string[] args)
  {
    try
    {
      await TestAsync(17);
    }
    catch(Exception e)
    {
      Console.WriteLine("Exception caught: " + e);
    }
  }

  static async Task TestAsync(int count)
  {
    await Task.Run(() =>
    {
      if (count <= 0)
        throw new Exception("ex");
    });

    Console.WriteLine(count);
    await TestAsync2(count);
  }

  static async Task TestAsync2(int count) => await TestAsync3(count);
  static async Task TestAsync3(int count) => await TestAsync4(count);
  static async Task TestAsync4(int count) => await TestAsync5(count);
  static async Task TestAsync5(int count) => await TestAsync6(count);
  static async Task TestAsync6(int count) => await TestAsync(count - 1);
}

Is this a known bug in .Net 4.8? I would except a lot more than 17 levels of recursion in such a function... Does this effectively mean writing recursions with async/await is not recommended?

Update: Simplified version

class Program
{
  // needs to be compiled as AnyCpu Prefer 64-bit
  static async Task Main(string[] args)
  {
    try
    {
      await TestAsync(97); // 96 still works
    }
    catch(Exception e)
    {
      Console.WriteLine("Exception caught: " + e);
    }
  }

  static async Task TestAsync(int count)
  {
    await Task.Run(() =>
    {
      if (count <= 0)
        throw new Exception("ex");
    });

    Console.WriteLine(count);
    await TestAsync(count-1);
  }
}

It only happens so fast when choosing Any Cpu with Prefer 32-bit disabled, but is reproducable on multiple machines (Windows 1903 and 1909) on multiple .net versions (.Net 4.7.2 and .Net 4.8)

Lodhia answered 8/4, 2020 at 14:32 Comment(10)
It throws when count < 0. But in TestAsync6 you call TestAsync with count-1. No matter if you start with 17 or 2 or whatever, sooner or later you call it with -1 and then get the exception. I'd say it's expected.Strenuous
Well yes it throws when count <= 0, and this is fine when calling it with say 2 than it will throw and the exception will be caught and everything is fine, however when calling with 17 or higher the exception will be thrown but will in turn cause a stack overflow exception which is what i did not expect...Lodhia
There appears to be a related question where the OP is running into the same issue with a visitor pattern on .NET Framework 4.7.2, but has a hypothesis on why the Exception is a StackOverflowException. Check it out: #61102665Internationalism
I just double checked. Run under 4.7.2 in both Debug and Release for 20000. Works as a charm.Strenuous
I was able to reproduce it with await TestAsync(235); in .NET Framework 4.8. But at this point it's kind of expected.Octangular
Well I compiled the above code with .net 4.8 (Console Application) on my Windows 1903 machine, here is the compiled version: gofile.io/?c=5cvb3L It crashes on my Windows 1909 and Windows 1903 with a stackoverflow exceptionLodhia
I don't really understand why it there would be different recursion limits depending on the PC it is running on? Would be nice to now if it is a difference that occurs during runtime or during compile time - however if it can be as low as 17 on some machines (maybe even lower?) that does not bode well with me and screams that it could very well be a bugLodhia
Are you by chance running this on the 32bit machine?Strenuous
No both machines are 64bit, i compiled it with default settings. IlSpy says: // Architecture: AnyCPU (64-bit preferred) However i was wrong about the OS Versions, both machine have the exact same Version: Windows 1903 (OS Build 18362.720)Lodhia
The gofile.io version crashes on my OS - Windows 1909.Schmid
T
5

I suspect you're seeing the Stack Overflow on the completions - i.e., every number is printed out all the way down to 1 before the Stack Overflow message.

My guess is that this behavior is because await uses synchronous continuations. There's supposed to be code that prevents synchronous continuations from overflowing the stack, but it's heuristic and doesn't always work.

I suspect this behavior doesn't happen on .NET Core because a lot of optimization work has gone into .NET Core's async support, likely meaning that continuations on that platform take up less stack space, making the heuristic check work. It is also possible that the heuristic itself was fixed in .NET Core. Either way, I wouldn't hold my breath expecting .NET Framework to get those updates.

I would except a lot more than 17 levels of recursion in such a function...

Not really 17. You've got 102 levels of recursion (17 * 6). To measure the actual stack space taken up, it would be 17 * 6 * (number of stacks to resume continuations). On my machine, 17 works; it fails somewhere over 200 (1200 calls deep).

Bear in mind that this only happens for long sequences of tail recursive asynchronous functions - i.e., none of them have any more asynchronous work to do after their await. If you change any of the functions to have some other asynchronous work after their recursive await, that will avoid the stack overflow:

static async Task TestAsync(int count)
{
  await Task.Run(() =>
  {
    if (count <= 0)
      throw new Exception("ex");
  });

  Console.WriteLine(count);
  try
  {
    await TestAsync2(count);
  }
  finally
  {
    await Task.Yield(); // some other async work
  }
}
Tiffanytiffi answered 9/4, 2020 at 13:10 Comment(4)
Yes you are right, I updated my question to have a simpler example that shows that 97 direct recursion levels are the critical point. However it is only that low when compiling for Any CPU with Prefer 32-bit disabled, so there must be something more going on there. I also tested the version with the yield and it indeed resolves the problem, however in our real-world scenario (visitor) we have some visit methods without work... so I would still consider this a bugLodhia
I agree that it's a bug. I just don't think it'll be fixed.Tiffanytiffi
We will start a Support-Call with Microsoft in the coming weeks, I will update this issue as soon as I have a responseLodhia
It works on .NET Core because of #23152.Convolvulaceous

© 2022 - 2024 — McMap. All rights reserved.