Just from the start: Since March 1st 2017 this is a bug confirmed by Microsoft. Read comments at the end.
Short description:
I have random crashes in larger application using MFC, ATL. In all such cases after ATL subclassing was used for a window upon simple actions with a window (moving, resizing, setting the focus, painting etc.) I get a crash on a random execution address.
First it looked like a wild pointer or heap corruption but I narrowed the complete scenario down to a very simple application using pure ATL and only Windows API.
Requirements / my used scenarios:
- The application was created with VS 2015 Enterprise Update 3.
- The program should be compiled as 32bit.
- Test application uses CRT as a shared DLL.
- The application runs under Windows 10 Build 14393.693 64bit (but we have repros under Windows 8.1 and Windows Server 2012 R2, all 64bit)
- atlthunk.dll has version 10.0.14393.0
What the application does:
It simply creates a frame window and tries to create many static windows with the windows API. After the static window is created, this window is subclassed with the ATL CWindowImpl::SubclassWindow method. After the subclass operation a simple window message is sent.
What happens:
Not on every run, but very often the application crashes upon SendMessage to the subclassed window. On the 257 window ( or another multiple of 256+1) the subclass fails in some way. The ATL thunk that is created is invalid. It seems that the stored execution address of the new subclass-function isn't correct. Sending any the message to the window causes a crash. The callstack is always the same. The last visible and known address in the callstack is in the atlthunk.dll
atlthunk.dll!AtlThunk_Call(unsigned int,unsigned int,unsigned int,long) Unknown
atlthunk.dll!AtlThunk_0x00(struct HWND__ *,unsigned int,unsigned int,long) Unknown
user32.dll!__InternalCallWinProc@20() Unknown
user32.dll!UserCallWinProcCheckWow() Unknown
user32.dll!SendMessageWorker() Unknown
user32.dll!SendMessageW() Unknown
CrashAtlThunk.exe!WindowCheck() Line 52 C++
The thrown exception in the debugger is shown as:
Exception thrown at 0x0BF67000 in CrashAtlThunk.exe:
0xC0000005: Access violation executing location 0x0BF67000.
or another sample
Exception thrown at 0x2D75E06D in CrashAtlThunk.exe:
0xC0000005: Access violation executing location 0x2D75E06D.
What I know about atlthunk.dll:
Atlthunk.dll seems to be only part of 64bit OS. I found it on a Win 8.1 and Win 10 systems.
If atlthunk.dll is available (all Windows 10 machines), this DLL cares about the thunking. If the DLL isn't present, thunking is done in the standard way: allocating a block on the heap, marking it as executable, adding some load and a jump statement.
If the DLL is present. It contains 256 predefined slots for subclassing. If 256 subclasses are done, the DLL reloads itself a second time into memory and uses the next 256 available slots in the DLL.
As far as I see, the atlthunk.dll belongs to the Windows 10 and isn't exchangeable or redistributable.
Things checked:
- Antivirus system was turned of or on, no change
- Data execution protection doesn't matter. (/NXCOMPAT:NO and the EXE is defined as an exclusion in the system settings, crashes too)
- Additional calls to FlushInstructionCache or Sleep calls after the subclass doesn't have any effect.
- Heap integrity isn't a problem here, I rechecked it with more than one tool.
- and a thousands more (I may already forgot what I tested)... ;)
Reproducibility:
The problem is somehow reproducible. It doesn't crashes all the time, it crashes randomly. I have a machine were the code crashes on every third execution.
I can repro it on two desktop stations with i7-4770 and a i7-6700.
Other machines seem not to be affected at all (works always on a Laptop i3-3217, or desktop with i7-870)
About the sample:
For simplicity I use a SEH handler to catch the error. If you debug the application the debugger will show the callstack mentioned above. The program can be launched with an integer on the command line.In this case the program launches itself again with the count decremented by 1.So if you launch CrashAtlThunk 100 it will launch the application 100 times. Upon an error the SEH handler will catch the error and shows the text "Crash" in a message box. If the application runs without errors, the application shows "Succeeded" in a message box. If the application is started without a parameter it is just executed once.
Questions:
- Does anybody else can repro this?
- Does anybody saw similar effects?
- Does anybody know or can imagine a reason for this?
- Does anybody know how to get around this problem?
Notes:
2017-01-20 Support case at Microsoft opened.
The code
// CrashAtlThunk.cpp : Defines the entry point for the application.
//
// Windows Header Files:
#include <windows.h>
// C RunTime Header Files
#include <stdlib.h>
#include <malloc.h>
#include <memory.h>
#include <tchar.h>
#define _ATL_CSTRING_EXPLICIT_CONSTRUCTORS // some CString constructors will be explicit
#include <atlbase.h>
#include <atlstr.h>
#include <atlwin.h>
// Global Variables:
HINSTANCE hInst; // current instance
const int NUM_WINDOWS = 1000;
//------------------------------------------------------
// The problematic code
// After the 256th subclass the application randomly crashes.
class CMyWindow : public CWindowImpl<CMyWindow>
{
public:
virtual BOOL ProcessWindowMessage(_In_ HWND hWnd, _In_ UINT uMsg, _In_ WPARAM wParam, _In_ LPARAM lParam, _Inout_ LRESULT& lResult, _In_ DWORD dwMsgMapID) override
{
return FALSE;
}
};
void WindowCheck()
{
HWND ahwnd[NUM_WINDOWS];
CMyWindow subclass[_countof(ahwnd)];
HWND hwndFrame;
ATLVERIFY(hwndFrame = ::CreateWindow(_T("Static"), _T("Frame"), SS_SIMPLE, 0, 0, 10, 10, NULL, NULL, hInst, NULL));
for (int i = 0; i<_countof(ahwnd); ++i)
{
ATLVERIFY(ahwnd[i] = ::CreateWindow(_T("Static"), _T("DummyWindow"), SS_SIMPLE|WS_CHILD, 0, 0, 10, 10, hwndFrame, NULL, hInst, NULL));
if (ahwnd[i])
{
subclass[i].SubclassWindow(ahwnd[i]);
ATLVERIFY(SendMessage(ahwnd[i], WM_GETTEXTLENGTH, 0, 0)!=0);
}
}
for (int i = 0; i<_countof(ahwnd); ++i)
{
if (ahwnd[i])
::DestroyWindow(ahwnd[i]);
}
::DestroyWindow(hwndFrame);
}
//------------------------------------------------------
int APIENTRY wWinMain(_In_ HINSTANCE hInstance,
_In_opt_ HINSTANCE hPrevInstance,
_In_ LPWSTR lpCmdLine,
_In_ int nCmdShow)
{
hInst = hInstance;
int iCount = _tcstol(lpCmdLine, nullptr, 10);
__try
{
WindowCheck();
if (iCount==0)
{
::MessageBox(NULL, _T("Succeeded"), _T("CrashAtlThunk"), MB_OK|MB_ICONINFORMATION);
}
else
{
TCHAR szFileName[_MAX_PATH];
TCHAR szCount[16];
_itot_s(--iCount, szCount, 10);
::GetModuleFileName(NULL, szFileName, _countof(szFileName));
::ShellExecute(NULL, _T("open"), szFileName, szCount, nullptr, SW_SHOW);
}
}
__except (EXCEPTION_EXECUTE_HANDLER)
{
::MessageBox(NULL, _T("Crash"), _T("CrashAtlThunk"), MB_OK|MB_ICONWARNING);
return FALSE;
}
return 0;
}
Comment after answered by Eugene (Feb. 24th 2017):
I don't want to change my original question, but I want to add some additional information how to get this into a 100% Repro.
1, Change the main function to
int APIENTRY wWinMain(_In_ HINSTANCE hInstance,
_In_opt_ HINSTANCE hPrevInstance,
_In_ LPWSTR lpCmdLine,
_In_ int nCmdShow)
{
// Get the load address of ATLTHUNK.DLL
// HMODULE hMod = LoadLibrary(_T("atlThunk.dll"));
// Now allocate a page at the prefered start address
void* pMem = VirtualAlloc(reinterpret_cast<void*>(0x0f370000), 0x10000, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
DWORD dwLastError = ::GetLastError();
hInst = hInstance;
WindowCheck();
return 0;
}
Uncomment the LoadLibrary call. Compile.
Run the programm once and stop in the debugger. Note the address where the library was loaded (hMod).
Stop the program. Now comment the Library call again and change the
VirtualAlloc
call to the address of the previous hMod value, this is the prefered load address in this window session.Recompile and run. CRASH!
Thanks to eugene.
Up to now. Microsoft ist still investigating about this. They have dumps and all code. But I don't have a final answer. Fact is we have a fatal bug in some Windows 64bit OS.
I currently made the following changes to get around this
Open atlstdthunk.h of VS-2015.
Uncomment the #ifdef block completely that defines USE_ATL_THUNK2. Code lines 25 to 27.
Recompile your program.
This enables the old thunking mechanism well known from VC-2010, VC-2013... and this works crash free for me. As long as there are no other already compiled libraries involved that may subclass or use 256 windows via ATL in any way.
Comment (Mar. 1st 2017):
- Microsoft confirmed that this is a bug. It should be fixed in Windows 10 RS2.
- Mircrosoft agrees that editing the headers in the atlstdthunk.h is a workaround for the problem.
In fact this says. As long as there is no stable patch I can never use the normal ATL thunking again, because I will never know what Window versions out in the world will use my program. Because Windows 8 and Windows 8.1 and Windows 10 prior to RS2 will suffer on this bug.
Final Comment (Mar. 9th 2017):
- Builds with VS-2017 are affected too, there is no difference between VS-2015 and VS-2017
- Microsoft decided that there will be no fix for older OS, regarding this case.
- Neither Windows 8.1, Windows Server 2012 RC2 or other Windows 10 builds will get a patch to fix this issue.
- The issue is to rare and the impact for our company is to small. Also the fix from our side is to simple. Other reports of this bug are not known.
- The case is closed.
My advice for all programers: Change the the atlstdthunk.h in your Visual Studio version VS-2015, VS-2017 (see above). I don't understand Microsoft. This bug is a serious problem in the ATL thunking. It may hit every programmer that uses a greater number of windows and/or subclassing.
We only know of a fix in Windows 10 RS2. So all older OS are affected! So I recommend to disable the use of the atlthunk.dll by commenting out the define noted above.
ShellExecute
on a thread, that never initialized COM. That's not entirely prudent either. – Esker::DestroyWindow
) - which will posts messages to the window - and then letting yoursubclass
array immediately go out of scope. This will mean that window destruction messages will have nowhere valid to be processed. Also if there are any pending messages these will have the same problem. – FuranDestroyWindow
is strictly serialized. When it returns, all messages have been sent (they aren't posted) and processed. And if there are indeed pending messages,DispatchMessage
won't be able to find the destination window, and nothing will happen. – Esker