Crash in GC finalizer thread, what's the problem with "DestroyScout"?

Asked 15/6, 2022 at 13:3 Answered 29/8, 2022 at 18:57

Solved c#multithreading garbage-collection dump windows-server-2016

I'm facing with a .Net server application, which crashes on an almost weekly basis on a problem in a "GC Finalizer Thread", more exactly at line 798 of "mscorlib.dll ...~DestroyScout()", according to Visual Studio.

Visual Studio also tries to open the file "DynamicILGenerator.gs". I don't have this file, but I've found a version of that file, where line 798 indeed is inside the destructor or the DestroyScout (whatever this might mean).

I have the following information in my Visual Studio environment:

Threads :

Not Flagged >   5892    0   Worker Thread   GC Finalizer Thread mscorlib.dll!System.Reflection.Emit.DynamicResolver.DestroyScout.~DestroyScout

Call stack:

    [Managed to Native Transition]  
>   mscorlib.dll!System.Reflection.Emit.DynamicResolver.DestroyScout.~DestroyScout() Line 798   C#
[Native to Managed Transition]  
kernel32.dll!@BaseThreadInitThunk@12()  Unknown
ntdll.dll!__RtlUserThreadStart()    Unknown
ntdll.dll!__RtlUserThreadStart@8()  Unknown

Locals (no way to be sure if that $exception object is correct):

+       $exception  {"Exception of type 'System.ExecutionEngineException' was thrown."} System.ExecutionEngineException
    this    Cannot obtain value of the local variable or argument because it is not available at this instruction pointer,
            possibly because it has been optimized away.    System.Reflection.Emit.DynamicResolver.DestroyScout
    Stack objects   No CLR objects were found in the stack memory range of the current frame.

Source code of "DynamicILGenerator.cs", mentioning the DestroyScout class (line 798 is mentioned in comment):

    private class DestroyScout
    {
        internal RuntimeMethodHandleInternal m_methodHandle;

        [System.Security.SecuritySafeCritical]  // auto-generated
        ~DestroyScout()
        {
            if (m_methodHandle.IsNullHandle())
                return;

            // It is not safe to destroy the method if the managed resolver is alive.
            if (RuntimeMethodHandle.GetResolver(m_methodHandle) != null)
            {
                if (!Environment.HasShutdownStarted &&
                    !AppDomain.CurrentDomain.IsFinalizingForUnload())
                {
                    // Somebody might have been holding a reference on us via weak handle.
                    // We will keep trying. It will be hopefully released eventually.
                    GC.ReRegisterForFinalize(this);
                }
                return;
            }

            RuntimeMethodHandle.Destroy(m_methodHandle); // <===== line 798
        }
    }

Watch window (m_methodHandle):

m_methodHandle  Cannot obtain value of the local variable or argument because 
                it is not available at this instruction pointer,
                possibly because it has been optimized away.
                System.RuntimeMethodHandleInternal

General dump module information:

Dump Summary
------------
Dump File:  Application_Server2.0.exe.5296.dmp : C:\Temp_Folder\Application_Server2.0.exe.5296.dmp
Last Write Time:    14/06/2022 19:08:30
Process Name:   Application_Server2.0.exe : C:\Runtime\Application_Server2.0.exe
Process Architecture:   x86
Exception Code: 0xC0000005
Exception Information:  The thread tried to read from or write to a virtual address
                        for which it does not have the appropriate access.
Heap Information:   Present

System Information
------------------
OS Version: 10.0.14393
CLR Version(s): 4.7.3920.0

Modules
-------
Module Name                                           Module Path   Module Version
-----------                                           -----------   --------------
...
clr.dll     C:\Windows\Microsoft.NET\Framework\v4.0.30319\clr.dll       4.7.3920.0
...

Be aware: the dump arrived on a Windows-Server 2016 computer, I'm investigating the dump on my Windows-10 environment (don't be mistaking on OS Version in the dump summary)!

Edit

What might the destroyscout be trying to destroy? That might be very interesting.

Hawsehole answered 15/6, 2022 at 13:3 Comment(27)

to me it seems like a race condition in a multithreaded scenario, where multiple threads dispose of the same object handle – Autography 15/6, 2022 at 13:15

A race condition? In a piece of source I don't have access to? Any way to get this solved (is this a known bug, is it possible to follow the progress of it, ...)? – Hawsehole 15/6, 2022 at 13:17

looking again, i think it has nothing to do with mulithread, but it obviously seems like a bug yes. reregistering this for finalize within the destructor might result in the GC calling the destructor again.. what is this ?? thats a strange logic of destroying objects in c#, waiting until all weak references have given up their handle.. i have no idea, sorry for commenting – Autography 15/6, 2022 at 13:18

Have you tried using a newer version of .NET Framework? ExecutionEngineException indicates to me probably some kind of corrupted memory, which happens to only manifest at finalization. Are you using unsafe or PInvoke? – Provo 15/6, 2022 at 13:21

@MichaelSchönbauer: don't feel sorry for trying :-) – Hawsehole 15/6, 2022 at 13:46

@Charlieface: I just checked all source code. I have found three instances of unsafe but all of them are inside a piece of code which is not used here. PInvoke is never used. You mention upgrading my .NET framework. Imagine I would do that, how can I know which .NET framework solves this issue? – Hawsehole 15/6, 2022 at 13:48

The fact that unsafe is not used here doesn't mean it doesn't have a bug, it may be overwriting memory it shouldn't, but the effect only appears here. I'm not aware of this bug (if it is a bug) and can't find any documentation on it, just suggesting you try upgrade framework – Provo 15/6, 2022 at 13:50

You might have a reason to thoroughly review System.Reflection.Emit code in the codebase. But this is a memory corruption problem that can strike anywhere, anytime. Clearly the CLR version is badly outdated, one thing you never want to do with trouble like this is preventing stability and security updates from being deployed. – Gallipot 15/6, 2022 at 13:55

@HansPassant: sorry for the long delay but I currently have a similar problem again. Again CLR version is mentioned to be 4.7.something, in this case 4.7.3946.0. You mention it being badly outdated, but I believe the CLR version not being part of the dumpfile, but being part of my own system, so it can't be related to the crash I'm facing. Am I correct? (Sorry for my ignorance) – Hawsehole 22/8, 2022 at 8:14

Look at the following code: referencesource.microsoft.com/#mscorlib/system/reflection/emit/… I would say you've gone a bit overboard with dynamic IL generation. Try to ensure there's no new IL stuff during shutdown. – Snuff 22/8, 2022 at 16:49

@Dominique: Did you already try different GC modes like server or workstation? Maybe this helps to narrow down the problem. – Muoimuon 26/8, 2022 at 9:11

@Fabian: Sorry, but I never heard of any garbage collection configuration. Do you have any idea which configuration setting might influence the behaviour I'm describing in my question? In my system I have found following entries: <gcServer enabled="true"/> and <gcConcurrent enabled="false" />. ` – Hawsehole 26/8, 2022 at 9:17

@Fabian: the answer of stackoverflow.com/users/16587692/teodor-mihail mentions GC optimalisation. Is there a setting which suppresses GC optimalisation? – Hawsehole 26/8, 2022 at 9:28

@Dominique: I am really no expert of the topic, but since the idea of a racing condition floated around I remembered, that there is a concurrent and a non-concurrent GC mode. From here I see however, that your settings already set the non-concurrent server garbage collection. Then again in this article it states that the machine configuration file overrides the application config. – Muoimuon 26/8, 2022 at 9:29

Concerning the "optimized away": I do not think that the GC optimizes it away. This message refers to optimizations of release mode dlls that does not allow the debugger to find the value of the property. – Muoimuon 26/8, 2022 at 9:33

@Fabian: do you have any idea where I might find the machine's configuration? (I tried doing a search for the setting in all files of the machine, but seems to be a bad idea :-) ). Or is it somewhere in the registry? – Hawsehole 26/8, 2022 at 9:37

Concerning the GC settings. Please check if the Machine.Config has a gcConcurrentSetting. This will override the application.config settings. – Muoimuon 26/8, 2022 at 9:38

@Fabian: I have found four Machine.config files. None of them contained any "gcCon..." entry. – Hawsehole 26/8, 2022 at 9:41

@Fabian: I think we can conclude that my machine is set NOT to be GC-concurrent. Any way this might cause the issue I'm having here? (Sorry for my ignorance but as stated before I never heard of GC configuration before) – Hawsehole 26/8, 2022 at 9:46

@Charlieface: what do you mean by using unsafe or PInvoke? (Sorry for my ignorance, but I have no idea what you're talking about.) – Hawsehole 26/8, 2022 at 9:49

@Dominique: You could try the other combinations of the gcServer and gcConcurrent. But the problem may very well be unrelated to the GC Settings. – Muoimuon 26/8, 2022 at 9:54

@Fabian: Hmm, I can't do trial-and-error: the issue happens on a customer system and the problem seems to occur randomly: the customer won't agree and even if the customer would agree, I would not know when I can decide that a trial is successful or not (the last crash happened more than two months ago). – Hawsehole 26/8, 2022 at 10:2

unsafe is a C# keyword, and means you get to muck around with pointers. Using PInvoke means you are caling into native APIs using the [DllImport] attribute. If you are using either of these you could be open to memory corruption if not done correctly. Once you get memory corruption it could manifest anywhere, the exact location is probably not actually relevant. The .NET version is also a concern – Provo 26/8, 2022 at 10:27

@Charlieface: I've investigated the entire code, the words unsafe and PInvoke are not present in the source code. – Hawsehole 26/8, 2022 at 11:24

You would be looking for DllImport not PInvoke. Again: have you tried upgrading the .NET version? A race condition is also a concern: be aware that a race condition that you create could corrupt memory you don't know about, for example if you access a function that is not thread-safe and cause a torn read/write. – Provo 26/8, 2022 at 11:39

@Charlieface: upgrading .Net version is not an update (the customer is very reluctant towards updates) and the only DLLImport inside the source code is the following line: [DllImport("user32.dll")]. – Hawsehole 26/8, 2022 at 13:56

After that, the external command ShutdownBlockReasonCreate(...) is mentioned.`. – Hawsehole 26/8, 2022 at 14:3

I don't know what exactly is causing this crash, but I can tell you what DestroyScout does.

It's related to creating dynamic methods. The class DynamicResolver needs to clean up related unmanaged memory, which is not tracked by GC. But it cannot be cleaned up until there are definitely no references to the method anymore.

However, because malicious (or outright weird) code can use a long WeakReference which can survive a GC, and therefore resurrect the reference to the dynamic method after its finalizer has run. Hence DestroyScout comes along with its strange GC.ReRegisterForFinalize code in order to ensure that it's the last reference to be destroyed.

It's explained in a comment in the source code

// We can destroy the unmanaged part of dynamic method only after the managed part is definitely gone and thus
// nobody can call the dynamic method anymore. A call to finalizer alone does not guarantee that the managed 
// part is gone. A malicious code can keep a reference to DynamicMethod in long weak reference that survives finalization,
// or we can be running during shutdown where everything is finalized.
//
// The unmanaged resolver keeps a reference to the managed resolver in long weak handle. If the long weak handle 
// is null, we can be sure that the managed part of the dynamic method is definitely gone and that it is safe to 
// destroy the unmanaged part. (Note that the managed finalizer has to be on the same object that the long weak handle 
// points to in order for this to work.) Unfortunately, we can not perform the above check when out finalizer 
// is called - the long weak handle won't be cleared yet. Instead, we create a helper scout object that will attempt 
// to do the destruction after next GC.

As to your crash, this is happening in internal code, and is causing an ExecutionEngineException. This most likely happens when there is memory corruption, when memory is used in a way it wasn't supposed to be.

Memory corruption can happen for a number of reasons. In order of likelihood:

Incorrect use of PInvoke to native Win32 functions (DllImport and asscociated marshalling).
Incorrect use of unsafe (including library classes such as Unsafe and Buffer which do the same thing).
Multi-threaded race conditions on objects which the Runtime does not expect to be used multi-threaded. This can cause such problems as torn reads and memory-barrier violations.
A bug in .NET itself. This can be the easiest to exclude: just upgrade to the latest build.

Consider submitting the crash report to Microsoft for investigation.

Edit from the author:
In order to submit a crash report to Microsoft, the following URL can be used: https://www.microsoft.com/en-us/unifiedsupport. Take into account that this is a paying service and that you might need to deliver your entire source code Microsoft in order to get a full analysis of your crash dump.

Provo answered 29/8, 2022 at 18:57 Comment(1)

I particularly love the idea where you propose to send the crash dump to Microsoft. Maybe they'll see something I didn't. – Hawsehole 30/8, 2022 at 10:13

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Edit

Recommended topics

Hot tags