Troubleshooting .NET "Fatal Execution Engine Error"
Asked Answered
H

5

45

Summary:

I periodically get a .NET Fatal Execution Engine Error on an application which I cannot seem to debug. The dialog that comes up only offers to close the program or send information about the error to Microsoft. I've tried looking at the more detailed information but I don't know how to make use of it.

Error:

The error is visible in Event Viewer under Applications and is as follows:

.NET Runtime version 2.0.50727.3607 - Fatal Execution Engine Error (7A09795E) (80131506)

The computer running it is Windows XP Professional SP 3. (Intel Core2Quad Q6600 2.4GHz w/ 2.0 GB of RAM) Other .NET-based projects that lack multi-threaded downloading (see below) seem to run just fine.

Application:

The application is written in C#/.NET 3.5 using VS2008, and installed via a setup project.

The app is multi-threaded and downloads data from multiple web servers using System.Net.HttpWebRequest and its methods. I've determined that the .NET error has something to do with either threading or HttpWebRequest but I haven't been able to get any closer as this particular error seems impossible to debug.

I've tried handling errors on many levels, including the following in Program.cs:

// handle UI thread exceptions
Application.ThreadException += Application_ThreadException;

// handle non-UI thread exceptions
AppDomain.CurrentDomain.UnhandledException += CurrentDomain_UnhandledException;

Application.EnableVisualStyles();
Application.SetCompatibleTextRenderingDefault(false);

// force all windows forms errors to go through our handler
Application.SetUnhandledExceptionMode(UnhandledExceptionMode.CatchException);

More Notes and What I've Tried...

  • Installed Visual Studio 2008 on the target machine and tried running in debug mode, but the error still occurs, with no hint as to where in source code it occurred.
  • When running the program from its installed version (Release) the error occurs more frequently, usually within minutes of launching the application. When running the program in debug mode inside of VS2008, it can run for hours or days before generating the error.
  • Reinstalled .NET 3.5 and made sure all updates are applied.
  • Broke random cubicle objects in frustration.
  • Rewritten parts of code that deal with threading and downloading in attempts to catch and log exceptions, though logging seemed to aggravate the problem (and never provided any data).

Question:

What steps can I take to troubleshoot or debug this kind of error? Memory dumps and the like seem to be the next step, but I'm not experienced at interpreting them. Perhaps there's something more I can do in the code to try and catch errors... It would be nice if the "Fatal Execution Engine Error" was more informative, but internet searches have only told me that it's a common error for a lot of .NET-related items.

Hoon answered 12/5, 2010 at 23:15 Comment(7)
Have you tried running under .NET 4.0? Though not a solution, it's another data-point.Murrell
@Eamon: thanks that's a good idea that I will try.Hoon
Can you narrow the problem down to certain compononents/classes? If yes, you might be able to get at least the problematic code by adding outputs of the actual position to a debug log.Naiad
Assuming the problem is memory corruption, you could try downloading pageheap.exe from Microsoft and seeing if it shows any problems.Froehlich
The most I've been able to isolate it, by commenting out portions of code systematically, is that the error occurs only when calls to HttpWebRequest are executed. If I retain all threading and parsing functionality but do not actually download new data, the error does not occur. Basically I've tried to determine whether the problem was due to my thread class (such as cross thread calls) or the data parsing class. The error only happens if downloading is enabled, and even then only about once per week (the program runs 24/7). It's difficult to reproduce!Hoon
Have you tried adding more logging around your possible trouble areas via MS's EnterpriseLibrary logging classes? I had a similar issue awhile back, and adding in logging before and after every possible point of failure brought me to the solution (which was nothing like what i thought it was or what the exception seemed to be)Jilljillana
@Jason M: Not using that particular logging class, no; thanks for the suggestion.Hoon
S
46

Well, you've got a Big Problem. That exception is raised by the CLR when it detects that the garbage collected heap integrity is compromised. Heap corruption, the bane of any programmer that ever wrote code in an unmanaged language like C or C++.

Those languages make it very easy to corrupt the heap, all it takes is to write past the end of an array that's allocated on the heap. Or using memory after it has been released. Or having a bad value for a pointer. The kind of bugz that managed code was invented to solve.

But you are using managed code, judging from your question. Well, mostly, your code is managed. But you are executing lots of unmanaged code. All the low-level code that actually makes a HttpWebRequest work is unmanaged. And so is the CLR, it was written in C++ so is technically just as likely to corrupt the heap. But after over four thousand revisions of it, and millions of programs using it, the odds that it still suffers from heap cooties are very small.

The same isn't true for all the other unmanaged code that wants a piece of HttpWebRequest. The code you don't know about because you didn't write it and isn't documented by Microsoft. Your firewall. Your virus scanner. Your company's Internet usage monitor. Lord knows whose "download accelerator".

Isolate the problem, assume it is neither your code nor Microsoft's code that causes the problem. Assume it is environmental first and get rid of the crapware.

For an epic environmental FEEE story, read this thread.

Samalla answered 20/6, 2010 at 23:17 Comment(4)
A great analysis of the "big picture" and provides insight to the reasons that this error may be occurring. Still, it would be nice to have some guidelines or methods as to how to replicate the error, work around it, or avoid it altogether.Hoon
Sorry, there aren't any. The exception is raised long after the damage was done. Work from the assumption that the cause is environmental, change the environment.Samalla
For example I am thinking of using a third-party library such as /n software's IPWorks, but I don't know if it uses the same unmanaged code blocks or effectively avoids it. (nsoftware.com/ipworks/v8/default.aspx)Hoon
Given its age, it is highly likely to contain unmanaged code with a managed wrapper. I seriously doubt it solves the root cause of your problem. If you can get a reliable repro for the crash, you'd be better off spending money on Microsoft Support.Samalla
R
10

Since the previous suggestions are fairly generic in nature, I thought it might be of use to post my own battle against this exception with specific code examples, the background changes I implemented to cause this exception to occur, and how I solved it.

First, the TL;DR version: I was using an in-house dll that was written in C++ (unmanaged). I passed in an array of a specific size from my .NET executable. The unmanaged code attempted to write to an array location that was not allocated by the managed code. This caused a corruption in memory that was later set to be garbage collected. When garbage collector prepares to collect memory, it first checks the status of the memory (and bounds). When it finds the corruption, BOOM.

Now the detailed version:

I am using an unmanaged dll developed in-house, written in C++. My own GUI development is in C# .Net 4.0. I am calling a variety of those unmanaged methods. That dll effectively acts as my data source. An example extern definition from the dll:

    [DllImport(@"C:\Program Files\MyCompany\dataSource.dll",
        EntryPoint = "get_sel_list",
        CallingConvention = CallingConvention.Winapi)]
    private static extern int ExternGetSelectionList(
        uint parameterNumber,
        uint[] list,
        uint[] limits,
        ref int size);

I then wrap the methods in my own interface for use throughout my project:

    /// <summary>
    /// Get the data for a ComboBox (Drop down selection).
    /// </summary>
    /// <param name="parameterNumber"> The parameter number</param>
    /// <param name="messageList"> Message number </param>
    /// <param name="valueLimits"> The limits </param>
    /// <param name="size"> The maximum size of the memory buffer to 
    /// allocate for the data </param>
    /// <returns> 0 - If successful, something else otherwise. </returns>
    public int GetSelectionList(uint parameterNumber, 
           ref uint[] messageList, 
           ref uint[] valueLimits, 
           int size)
    {
        int returnValue = -1;
        returnValue = ExternGetSelectionList(parameterNumber,
                                         messageList, 
                                         valueLimits, 
                                         ref size);
        return returnValue;
    }

An example call of this method:

            uint[] messageList = new uint[3];
            uint[] valueLimits = new uint[3];
            int dataReferenceParameter = 1;
            
            // BUFFERSIZE = 255.
            MainNavigationWindow.MainNavigationProperty.DataSourceWrapper.GetSelectionList(
                          dataReferenceParameter, 
                          ref messageList, 
                          ref valueLimits, 
                          BUFFERSIZE);

In the GUI, one navigates through different pages containing a variety of graphics and user inputs. The previous method allowed me to get the data to populate ComboBoxes. An example of my navigation setup and call at the time before this exception:

In my host window, I set up a property:

    /// <summary>
    /// Gets or sets the User interface page
    /// </summary>
    internal UserInterfacePage UserInterfacePageProperty
    {
        get
        {
            if (this.userInterfacePage == null)
            {
                this.userInterfacePage = new UserInterfacePage();
            }

            return this.userInterfacePage;
        }

        set { this.userInterfacePage = value; }
    }

Then, when needed, I navigate to the page:

MainNavigationWindow.MainNavigationProperty.Navigate(
        MainNavigation.MainNavigationProperty.UserInterfacePageProperty);

Everything worked well enough, though I did have some serious creeping issues. When navigating using the object (NavigationService.Navigate Method (Object)), the default setting for the IsKeepAlive property is true. But the issue is more nefarious than that. Even if you set the IsKeepAlive value in the constructor of that page specifically to false, it is still left alone by the garbage collector as if it was true. Now for many of my pages, this was no big deal. They had small memory footprints with not all that much going on. But many other of these pages had some large highly detailed graphics on them for illustration purposes. It wasn't too long before normal usage of this interface by operators of our equipment caused huge allocations of memory that never cleared and eventually clogged up all the processes on the machine. After the rush of initial development subsided from a tsunami to more of a tidal bore, I finally decided to tackle the memory leaks once and for all. I won't go into the details of all the tricks I implemented to clean up the memory (WeakReferences to images, unhooking event handlers on Unload(), using a custom timer implementing the IWeakEventListener interface, etc...). The key change I made was to navigate to the pages using the Uri instead of the object (NavigationService.Navigate Method (Uri)). There are two important differences when using this type of navigation:

  1. IsKeepAlive is set to false by default.
  2. The garbage collector now will try to clean up the navigation object as if IsKeepAlive was set to false.

So now my navigation looks like:

MainNavigation.MainNavigationProperty.Navigate(
    new Uri("/Pages/UserInterfacePage.xaml", UriKind.Relative));

Something else to note here: This not only affects how the objects are cleaned up by the garbage collector, this affects how they are initially allocated in memory, as I would soon find out.

Everything seemed to worked great. My memory would quickly get cleaned up to near my initial state as I navigated through the graphics intensive pages, until I hit this particular page with that particular call to the dataSource dll to fill in some comboBoxes. Then I got this nasty FatalEngineExecutionError. After days of research and finding vague suggestions, or highly specific solutions that didn't apply to me, as well as unleashing just about every debugging weapon in my personal programming arsenal, I finally decided that the only way I was really going to nail this down was the extreme measure of rebuilding an exact copy of this particular page, element by element, method by method, line by line, until I finally came across the code that threw this exception. It was as tedious and painful as I'm implying, but I finally tracked it down.

It turned out to be in the way the unmanaged dll was allocating memory to write data into the arrays I was sending in for populating. That particular method would actually look at the parameter number and, from that information, allocate an array of a particular size based on the amount of data it expected to write into the array I sent in. The code that crashed:

            uint[] messageList = new uint[2];
            uint[] valueLimits = new uint[2];
            int dataReferenceParameter = 1;
            
            // BUFFERSIZE = 255.
            MainNavigationWindow.MainNavigationProperty.DataSourceWrapper.GetSelectionList(
                           dataReferenceParameter, 
                           ref messageList, 
                           ref valueLimits, 
                           BUFFERSIZE);

This code might seem identical to the sample above, but it has one tiny difference. The array size I allocate is 2 not 3. I did this because I knew that this particular ComboBox would only have two selection items as opposed to the other ComboBoxes on the page which all had three selection items. However the unmanaged code didn't see things the way I saw it. It got the array I handed in, and tried to write a size 3 array into my size 2 allocation, and that was it. * bang! * * crash! * I changed the allocation size to 3, and the error went away.

Now this particular code had already been running without this error for atleast a year. But the simple act of navigating to this page via a Uri as opposed to an Object caused the crash to appear. This implies that the initial object must be allocated differently because of the navigation method I used. Since with my old navigation method, the memory was just piled into place and left to do with as I saw fit for eternity, it didn't seem to matter if it was a bit corrupted in one or two small locations. Once the garbage collector had to actually do something with that memory (such as clean it up), it detected the memory corruption and threw the exception. Ironically, my major memory leak was covering up a fatal memory error!

Obviously we are going to review this interface to avoid such simple assumptions causing such crashes in the future. Hope this helps guide some others to find out what's going on in their own code.

Ruin answered 27/8, 2014 at 9:52 Comment(1)
Thanks for the effort and detailed investigation. It may very well be useful to others.Hoon
M
3

A presentation that might be a nice tutorial on where to start with this kind of issue is this: Hardcore production debugging in .NET by Ingo Rammer.

I do a bit a of C++/CLI coding, and heap corruption doesn't usually result in this error; usually heap corruption either causes a data corruption and a subsequent normal exception or a memory protection error - which probably doesn't mean anything.

In addition to trying .net 4.0 (which loads unmanaged code differently) you should compare x86 and x64 editions of the CLR - if possible - the x64 version has a larger address space and thus completely different malloc (+fragmentation) behavior and so you just might get lucky and have a different (more debuggable) error there (if it occurs at all).

Also, have you turned on unmanaged code debugging in the debugger (a project option), when you run with visual studio on? And do you have Managed Debug Assistants on?

Murrell answered 23/6, 2010 at 6:46 Comment(2)
I haven't turned on unmanaged code debugging yet, that is something I will try thanks to your suggestion. Regarding Managed Debug Assistants, I have not known about them until now, so this is something I will immediately look into. At the moment there are some MDA's checked and others which are cleared (on thrown exceptions). I'll have to research which ones to enable and how to discern information from them.Hoon
I wanted to award you the question bounty because you provided a number of good options to try. Thanks for jumping in.Hoon
C
2

In my case I had installed an exception handler with AppDomain.CurrentDomain.FirstChanceException. This handler was logging some exceptions, and all was fine for a few years (actually this debugging code should not have stayed in production).

But following a configuration error, the logger started to fail, and the handler itself was throwing, which apparently resulted in a FatalExecutionEngineError seemingly coming from nowhere.

So anyone encountering this error could spend a few seconds searching for occurrences of FirstChanceException anywhere in the code and maybe save a few hours of head scratching :)

Clayson answered 18/9, 2017 at 14:36 Comment(0)
E
-2

If you are using thread.sleep() that can be the reason. Unmanaged code can only be sleeped from kernell.32 sleep() function.

Elsieelsinore answered 20/1, 2013 at 19:54 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.