Mixed-mode C++/CLI crashing: heap corruption in atexit (static destructor registration)
Asked Answered
T

1

7

I am working on deploying a program and the codebase is a mixture of C++/CLI and C#. The C++/CLI comes in all flavors: native, mixed (/clr), and safe (/clr:safe). In my development environment I create a DLL of all the C++/CLI code and reference that from the C# code (EXE). This method works flawlessly.

For my releases that I want to release a single executable (simply stating that "why not just have a DLL and EXE separate?" is not acceptable).

So far I have succeeded in compiling the EXE with all the different sources. However, when I run it I get the "XXXX has stopped working" dialog with options to Check online, Close and Debug. The problem details are as follows:

Problem Event Name:       APPCRASH
Fault Module Name:        StackHash_8d25
Fault Module Version:     6.1.7600.16559
Fault Module Timestamp:   4ba9b29c
Exception Code:           c0000374
Exception Offset:         000cdc9b
OS Version:               6.1.7600.2.0.0.256.48
Locale ID:                1033
Additional Information 1: 8d25
Additional Information 2: 8d25552d834e8c143c43cf1d7f83abb8
Additional Information 3: 7450
Additional Information 4: 74509ce510cd821216ce477edd86119c

If I debug and send it to Visual Studio, it reports:

Unhandled exception at 0x77d2dc9b in XXX.exe: A heap has been corrupted

Choosing break results in it stopping at ntdll.dll!77d2dc9b() with no additional information. If I tell Visual Studio to continue, the program starts up fine and seems to work without incident, probably since a debugger is now attached.

What do you make of this? How do I avoid this heap corruption? The program seems to work fine except for this.

My abridged compilation script is as follows (I have omitted my error checking for brevity):

@set TARGET=x86
@set TARGETX=x86
@set OUT=%TARGETX%
@call "%VS90COMNTOOLS%\..\..\VC\vcvarsall.bat" %TARGET%

@set WIMGAPI=C:\Program Files\Windows AIK\SDKs\WIMGAPI\%TARGET%

set CL=/Zi /nologo /W4 /O2 /GS /EHa /MD /MP /D NDEBUG /D _UNICODE /D UNICODE /D INTEGRATED /Fd%OUT%\ /Fo%OUT%\
set INCLUDE=%WIMGAPI%;%INCLUDE%
set LINK=/nologo /LTCG /CLRIMAGETYPE:IJW /MANIFEST:NO /MACHINE:%TARGETX% /SUBSYSTEM:WINDOWS,6.0 /OPT:REF /OPT:ICF /DEFAULTLIB:msvcmrt.lib
set LIB=%WIMGAPI%;%LIB%
set CSC=/nologo /w:4 /d:INTEGRATED /o+ /target:module

:: Compiling resources omitted

@set CL_NATIVE=/c /FI"stdafx-native.h"
@set CL_MIXED=/c /clr /LN /FI"stdafx-mixed.h"
@set CL_PURE=/c /clr:safe /LN /GL /FI"stdafx-pure.h"

@set NATIVE=...
@set MIXED=...
@set PURE=...

cl %CL_NATIVE% %NATIVE%
cl %CL_MIXED% %MIXED%
cl %CL_PURE% %PURE%
link /LTCG /NOASSEMBLY /DLL /OUT:%OUT%\core.netmodule %OUT%\*.obj

csc %CSC% /addmodule:%OUT%\core.netmodule /out:%OUT%\GUI.netmodule /recurse:*.cs

link /FIXED /ENTRY:GUI.Program.Main /OUT:%OUT%\XXX.exe ^
/ASSEMBLYRESOURCE:%OUT%\core.resources,XXX.resources,PRIVATE /ASSEMBLYRESOURCE:%OUT%\GUI.resources,GUI.resources,PRIVATE ^
/ASSEMBLYMODULE:%OUT%\core.netmodule %OUT%\gui.res %OUT%\*.obj %OUT%\GUI.netmodule

Update 1

Upon compiling this with debug symbols and trying again, I do in fact get more information. The call stack is:

msvcr90d.dll!_msize_dbg(void * pUserData, int nBlockUse)  Line 1511 + 0x30 bytes
msvcr90d.dll!_dllonexit_nolock(int (void)* func, void (void)* * * pbegin, void (void)* * * pend)  Line 295 + 0xd bytes
msvcr90d.dll!__dllonexit(int (void)* func, void (void)* * * pbegin, void (void)* * * pend)  Line 273 + 0x11 bytes
XXX.exe!_onexit(int (void)* func)  Line 110 + 0x1b bytes
XXX.exe!atexit(void (void)* func)  Line 127 + 0x9 bytes
XXX.exe!`dynamic initializer for 'Bytes::Null''()  Line 7 + 0xa bytes
mscorwks.dll!6cbd1b5c()
[Frames below may be incorrect and/or missing, no symbols loaded for mscorwks.dll]
...

The line of my code that 'causes' this (dynamic initializer for Bytes::Null) is:

Bytes Bytes::Null;

In the header that is declared as:

class Bytes { public: static Bytes Null; }

I also tried doing a global extern in the header like so:

extern Bytes Null; // header
Bytes Null; // cpp file

Which failed in the same way.

It seems that the CRT atexit function is responsible, being inadvertently required due to the static initializer.


Fix

As Ben Voigt pointed out the use of any CRT functions (including native static initializers) requires proper initialization of the CRT (which happens in mainCRTStartup, WinMainCRTStartup, or _DllMainCRTStartup). I have added a mixed C++/CLI file that has a C++ main or WinMain:

using namespace System;
[STAThread] // required if using an STA COM objects (such as drag-n-drop or file dialogs)
int main() { // or "int __stdcall WinMain(void*, void*, wchar_t**, int)" for GUI applications
    array<String^> ^args_orig = Environment::GetCommandLineArgs();
    int l = args_orig->Length - 1; // required to remove first argument (program name)
    array<String^> ^args = gcnew array<String^>(l);
    if (l > 0) Array::Copy(args_orig, 1, args, 0, l);
    return XXX::CUI::Program::Main(args); // return XXX::GUI::Program::Main(args);
}

After doing this, the program now gets a little further, but still has issues (which will be addressed elsewhere):

  • When the program is solely in C# it works fine, along with whenever it is just calling C++/CLI methods, getting C++/CLI properties, and creating managed C++/CLI objects
  • Events added by C# into the C++/CLI code never fire (even though they should)
  • One other weird error is that an exception happens is a InvalidCastException saying can't cast from X to X (where X is the same as X...)

However since the heap corruption is fixed (by getting the CRT initialized) the question is done.

Tirol answered 8/2, 2011 at 1:40 Comment(0)
F
5

EDIT: Spotted the problem, leaving the suggested debugging steps below in case they help anyone in the future.

The problem is that you've changed the entry point. You should be using the C++/CLI standard library-provided entry point, which sets up internal resources like the onexit list.

Remove the /ENTRY switch and write a simple main function that calls your desired startup routine.


Although using a separate EXE and DLL may not be acceptable for the end product, it would be good to test this simpler configuration and see if you get the same problem.

If you can reproduce the heap corruption with a separate .DLL, you know it's somewhere in your native C++ code and it will be much easier to debug without the C# mixed into the same file.

If you can't reproduce the problem with separate DLL and EXE, then it could be related to the integration process (or it could just be less evident because the layout changes depending on what gets linked).

After you find and squash the heap corruption bug, then you can go back to the single .EXE.

Another approach would be to build the debug database so you can get better stack traces when it does crash. Even release builds (or maybe especially release builds) should be built with debugging info.

Florindaflorine answered 8/2, 2011 at 3:7 Comment(5)
I was testing everything with a DLL/EXE system (within Visual Studio). I was compiling with a PDB for the C code, however I didn't include it during the link step. I have posted the results using full debug information above. What are your thoughts?Tirol
@thaimin: I think I see the problem. Replacing the CRT entrypoint function is a no-no if you intend to also use any standard library functions, including atexit.Florindaflorine
I didn't intentionally use atexit! (although there are other CRT functions used later on). This fix causes new problems to crop up! See the edited question.Tirol
@thaimin: Since we've solved your heap corruption, would you accept an answer to this question and start a new one (or quite possible one for each of the issues you're now seeing: I don't think your remaining problems are from integrating C# and C++ per se, but due to .NET initialization code initializing the parts of COM it uses and your DragDrop code not accounting for this). I know you'd like to just keep hunting bugs, but anyone reading this in the future will appreciate having the atexit fix neatly separated and explained on its own.Florindaflorine
Thanks for the input and for keeping me on track. You comment about COM got me thinking and the fix was adding [STAThread] on the new main functions. I will research into the event issue.Tirol

© 2022 - 2024 — McMap. All rights reserved.