How to eliminate time spent in JIT_TailCall for functions that are genuinely non-recursive
Asked Answered
O

1

7

I am writing a 64 bit F# solution and profiling has revealed a surprisingly & unexpectedly large amount of time spent in JIT_TailCall...it is in fact dominating the runtime (circa 80%). This appears together with its evil cousin JIT_TailCallHelperStub_ReturnAddress.

I have definitely traced the source to passing a struct type (custom value types) in a method or property call across an assembly boundary. I am certain of this because if I bypass the method call and assign my struct to the property directly (the one the offending method was using) the performance magically improves by a factor of 4-5 x less runtime!

The calling assembly is using F# 3.1 because it is being dynamically compiled with the latest stable release of FSharp.Compiler.Services.

The assembly being called is using F# 4.0 / .NET 4.6 (VS 2015).

UPDATE

A simplification of what I am trying to do is to assign a custom struct value to a position in an array from a dynamically generated assembly...

Runtime is fast and no extraneous tail calls are generated when calling:

  1. A property exposing the private array in the type

However, the runtime is slow due to extraneous tail calls being generated when calling:

  1. An indexer property exposing the array (Item)

  2. A member method acting as a setter for the array

The reason I need to call the member method is that I need to perform a few checks prior to insertion of the item in the array.

PRACTICAL

Over and above understanding the root of the issue, I would like to know whether F# 4.0 and by implication the coming release of FSharp.Compiler.Services would solve this issue. Given that the updated FSharp.Compiler.Services is relatively imminent, it may then just be best to wait.

Overhaul answered 15/7, 2015 at 14:44 Comment(6)
What I have read about 64bit JItter is that it does not generate very optimized code. Thats why the focus on RyuJIT in .NET 4.6. Can you try out .NET 4.6 (VS 2015) and see if the performance issues that you see reduces?Wichita
The client assembly is compiled on .NET 4.6 & F# 4.0 BUT I am using the latest stable release of FSharp.Compiler.Services to generate a "server" assembly that will be based on F# 3.1Overhaul
@GaneshR. note, that RyuJIT as it currently ships in CTP form generally generates worse code. I have not tested recursion, though.Debarath
What performance do you see when you completely disable tail-calls in the F# compiler? Does the application get significantly faster? Because it might be the case that the profiler is showing that a time is spent in tail calls, but it actually includes some work that would have to be done anyway, it would just appear differently in the logs...Eucharis
@TomasPetricek [see updated info in question] I disabled tail calls in all the assemblies other than the one being dynamically generated by FSharp.Compiler.Services. This did improve the performance noticeably BUT not to the same extent as (1) I expect based on other baselines (2) the scenario where I don't pass the struct via a method but rather set the property directly.Overhaul
I have a case when mutually recursive functions generate 30% load for JIT_TailCall and 15% load for JIT_TailCallHelperStub_ReturnAddress. This functions are closed over method variables and a class fields. When I turn off the tail call generation, my performance increases exactly by 45%. Thanks for the suggestion to turn it off. Starting to rewrite the recursion as loop...Samoyed
S
1

I posted this on your GitHub question, but cross-post it here so it is easier to find:

I have a case when mutually recursive functions generate 30% load for JIT_TailCall and 15% load for JIT_TailCallHelperStub_ReturnAddress. These functions are closed over method variables and class fields. When I turn off tail call generation, my performance increases exactly by 45%.

I haven't profiled this snippet, but this is how my real code is structured:

#time "on"
type MyRecType() = 

  let list = System.Collections.Generic.List()

  member this.DoWork() =
    let mutable tcs = (System.Runtime.CompilerServices.AsyncTaskMethodBuilder<int>.Create())
    let returnTask = tcs.Task // NB! must access this property first
    let mutable local = 1

    let rec outerLoop() =
      if local < 1000000 then
        innerLoop(1)
      else
        tcs.SetResult(local)
        ()

    and innerLoop(inc:int) =
      if local % 2 = 0 then
        local <- local + inc
        outerLoop()
      else
        list.Add(local) // just fake access to a field to illustrate the pattern
        local <- local + 1
        innerLoop(inc)

    outerLoop()

    returnTask


let instance = MyRecType()

instance.DoWork().Result

> Real: 00:00:00.019, CPU: 00:00:00.031, GC gen0: 0, gen1: 0, gen2: 0
> val it : int = 1000001

.NET 4.6 and F# 4.0 don't help at all.

I tried to rewrite this as methods, but got StackOverflowException. However, I do not understand why I am not getting SO when I run a very big number of iterations without tail call generation?

Update Rewriting the method as:

  member this.DoWork2() =
    let mutable tcs = (System.Runtime.CompilerServices.AsyncTaskMethodBuilder<int>.Create())
    let returnTask = tcs.Task // NB! must access this property first
    let mutable local = 1
    let rec loop(isOuter:bool, inc:int) =
      if isOuter then
        if local < 1000000 then
          loop(false,1)
        else
          tcs.SetResult(local)
          ()
      else
        if local % 2 = 0 then
          local <- local + inc
          loop(true,1)
        else
          list.Add(local) // just fake access to a field to illustrate the pattern
          local <- local + 1
          loop(false,1)

    loop(true,1)

    returnTask


> Real: 00:00:00.004, CPU: 00:00:00.015, GC gen0: 0, gen1: 0, gen2: 0
> val it : int = 1000001

reduces JIT_TailCall and JIT_TailCallHelperStub_ReturnAddress overhead to 18% an 2% of execution time that is 2x faster, so the actual overhead decreased from 45% to 10% of the initial time. Still high, but not as dismal as in the first scenario.

Samoyed answered 3/9, 2015 at 21:25 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.