CreateThread() fails on 64 bit Windows, works on 32 bit Windows. Why?
Asked Answered
C

5

6

Operating System: Windows XP 64 bit, SP2.

I have an unusual problem. I am porting some code from 32 bit to 64 bit. The 32 bit code works just fine. But when I call CreateThread() for the 64 bit version the call fails. I have three places where this fails. 2 call CreateThread(). 1 calls beginthreadex() which calls CreateThread().

All three calls fail with error code 0x3E6, "Invalid access to memory location".

The problem is all the input parameters are correct.

HANDLE  h;
DWORD   threadID;

h = CreateThread(0,            // default security
                 0,            // default stack size
                 myThreadFunc, // valid function to call
                 myParam,      // my param
                 0,            // no flags, start thread immediately
                 &threadID);

All three calls to CreateThread() are made from a DLL I've injected into the target program at the start of the program execution (this is before the program has got to the start of main()/WinMain()). If I call CreateThread() from the target program (same params) via say a menu, it works. Same parameters etc. Bizarre.

If I pass NULL instead of &threadID, it still fails.

If I pass NULL as myParam, it still fails.

I'm not calling CreateThread from inside DllMain(), so that isn't the problem. I'm confused and searching on Google etc hasn't shown any relevant answers.

If anyone has seen this before or has any ideas, please let me know.

Thanks for reading.

ANSWER

Short answer: Stack Frames on x64 need to be 16 byte aligned.

Longer answer: After much banging my head against the debugger wall and posting responses to the various suggestions (all of which helped in someway, prodding me to try new directions) I started exploring what-ifs about what was on the stack prior to calling CreateThread(). This proved to be a red-herring but it did lead to the solution.

Adding extra data to the stack changes the stack frame alignment. Sooner or later one of the tests gets you to 16 byte stack frame alignment. At that point the code worked. So I retraced my steps and started putting NULL data onto the stack rather than what I thought was the correct values (I had been pushing return addresses to fake up a call frame). It still worked - so the data isn't important, it must be the actual stack addresses.

I quickly realised it was 16 byte alignment for the stack. Previously I was only aware of 8 byte alignment for data. This microsoft document explains all the alignment requirements.

If the stackframe is not 16 byte aligned on x64 the compiler may put large (8 byte or more) data on the wrong alignment boundaries when it pushes data onto the stack.

Hence the problem I faced - the hooking code was called with a stack that was not aligned on a 16 byte boundary.

Quick summary of alignment requirements, expressed as size : alignment

  • 1 : 1
  • 2 : 2
  • 4 : 4
  • 8 : 8
  • 10 : 16
  • 16 : 16

Anything larger than 8 bytes is aligned on the next power of 2 boundary.

I think Microsoft's error code is a bit misleading. The initial STATUS_DATATYPE_MISALIGNMENT could be expressed as a STATUS_STACK_MISALIGNMENT which would be more helpful. But then turning STATUS_DATATYPE_MISALIGNMENT into ERROR_NOACCESS - that actually disguises and misleads as to what the problem is. Very unhelpful.

Thank you to everyone that posted suggestions. Even if I disagreed with the suggestions, they prompted me to test in a wide variety of directions (including the ones I disagreed with).

Written a more detailed description of the problem of datatype misalignment here: 64 bit porting gotcha #1! x64 Datatype misalignment.

Chaudoin answered 15/6, 2010 at 15:6 Comment(25)
What exactly is myParam (ie., the declaration and initialization of it)?Mastin
myParam is all cases is a pointer to some memory (typically it is "this"). The pointer is always valud when it is passed. Why should that matter? it won't be accessing the contents of it, its just a value to be passed into the thread function.Chaudoin
@Michael, I've tried passing NULL as myParam. Still fails.Chaudoin
can you see what is the error message? You can use this function for this purpose pastebin.com/h72GM9fJObject
@Daniel, the error code 0x3E6 means "Invalid access to memory location". I put that in the main article, perhaps you missed it?Chaudoin
@Stephen: I wasn't sure what might matter about myParam; the whole thing seemed weird until I read a bit more carefully and noticed that the DLL is being 'injected'. Can you explain a bit more about how that works? It seems like that's the key to the problem (not that I know exactly how).Mastin
@Stephen: oh, sorry. You are right.Object
@Michael I'm porting a tool that injects into the entry point of the target application. Its a software engineering tool, not virus/malware. The tool copies the first few bytes, rewrites the first few bytes to point to its own memory that has some simple assembly that loads a DLL, finds the a function in the DLL and calls that function. After the function finishes it restores the entry point of the exe and calls back to the start of the app. The app starts with my DLL loaded and the app being monitored. The CreateThread calls fail inside the call into my DLL.Chaudoin
I've just explored __security_init_cookie() but that is not the cause of the failure.Chaudoin
Run it in the debugger and break on first-chance exceptions; it should show you where the access violation is occurring and what address it is trying to read/write.Riga
@Luke. It isn't access violating. What has been written that makes you think it is?Chaudoin
0x3E6 = 998 = ERROR_NOACCESS According to support.microsoft.com/kb/113996/en-us, this can map to STATUS_DATATYPE_MISALIGNMENT, STATUS_ACCESS_VIOLATION, or STATUS_DATATYPE_MISALIGNMENT_ERROR. So it could be an access violation, but probably more likely to be alignment issue since it only happens on 64-bit machines. In any case, if you run it in a debugger and break on first-chance exceptions you should be able to figure out what is happening.Riga
@Luke. It isn't AV'ing. I've stepped through the whole call. CreateThread internally calls CreateRemoteThread (which surprised me) and does various things which I can't tell from assembly. It doesn't AV, or if it does, its in the Kernel section hidden from me by a syscall instruction. I know its not a misalignment as I can run the very same function call (with the same data and alignments) successfully from later in the program.Chaudoin
I've just stepped through the code again and immediately after the syscall inside NtCreateThread(), RAX has 0xffffffff80000002 which means EAX has 0x80000002. This error code is "out of memory". Any ideas why? I'm guessing this "out of memory" translates into "invalid access to memory location". Why would it say out of memory? The app has just started, not allocated any memory (save for the DLLs and a tiny bit of VM I've asked for - Task Manager confirms this). The machine has 8GB. Any ideas?Chaudoin
Correction: For 32 bit systems 80000002 is STATUS_DATATYPE_MISALIGNMENT. But that doesn't make sense - this code works (with the identical input values, called from the same functions) if called later in the program lifetime.Chaudoin
Are you sure the values are identical? I mean the parameters to CreateThread at the assembly level. It is probably related to whatever you're doing in the injection process.Riga
@Luke, Values: Identical. Absolutely. I've now extended checks. Even if I pass in a procedure that is aligned on a 16 byte boundary, a param that is 16 byte aligned and a DWORD pointer that points to a DWORD on a 16 byte boundary I still get a failure. I thought 8 byte alignment was enough but found a reference to certain structures (for example: CONTEXT) needing 16 byte alignment, so I've gone overboard doing that. No joy, still fails.Chaudoin
It sounds like the DLL injection happens during the earliest stages; maybe it is happening before some required initialization, or maybe the assembly code you are using is wrong somehow for x64. How exactly are you injecting the DLL, and is it necessary to do it before the program even starts up?Riga
@Luke, the DLL is injected - see my reply to @Michael above. The x64 assembly is valid. I keep it simple and do not strive for optimizations. Yes, the DLL has to be injected as early as possible. Prior to getting to the entry point, Windows has already executed a load of code which we never get to see (in the debugger). The entry point is just the first thing a user mode program gets to see.Chaudoin
My current thinking is that MS have modified CreateThread so that it examines its callstack and ensures that all locations are tracable back to kernel32.dll/ntdll.dll. If it can't find a valid callstack it assumes the app has been hijacked and denies calls to CreateThread/CreateRemoteThread. In the case that fails for me, the callstack ends at my injected dynamic assembly code that loads my DLL then calls my code. What does anyone think? Is this bonkers or a valid idea to persue? I'm running out of things to try... its going to be very hard to keep a valid callstack doing this stuff.Chaudoin
I don't even understand what you're trying to say. Obviously kernel32 is going to be in the callstack because that's where CreateThread is implemented. The symbolic name for the status code you're getting is STATUS_DATATYPE_MISALIGNMENT; it is named that way for a reason. Obviously something is off with your code when run on 64-bit machines. I would try to eliminate variables. Simplify the DLL and host application to the bare essentials and see if the problem still happens.Riga
@Luke. You are missing an important part here. The same code runs OK if run at a later point in the program with the same input values - it isn't an alignment issue, or if it is, that error code is referring to something other than the input parameters. I'm talking about the stacktrace from the calling code (when you walk up the callchain normally one of the last entries would be for the kernel32.dll just prior to the program entry point, not the fact that CreateThread is in kernel32.dll (which is at the start of the stacktrace, not the end).Chaudoin
I have it working. I will modify the question with the answer when I have a suitable way of explaining it. Nothing to do with alignments of the inputs, everything to do with the alignment of stack backtrace. It seems at some point RSP needs to be 16 byte aligned. However this needs to be sometime prior to the CreateThread call. Just setting this up prior to calling CreateThread does not cut it. Very odd. Can't say I've seen this in the calling convention documentation. More reading required. I'll update here when I understand this more.Chaudoin
I was very much aware of the 16 byte alignment before I tried to use CreateThead. I cannot run a fairly simple assembler routine that installs its own stacks in 64 bits, a mere return does work however. The 32 bit version works flawlessly to count the primes up till 2,000,000,000 in 4 threads within a second.Doubletime
Do not put the answer into the question. Please take the time to answer your own question, and keep the problem neatly separated. @Stephen (reagrds hijacking) I've been debugging my attempts at running 64 bit threads, including debugging the assembler trampoline on 64 bit Linux. I tend now to believe that there is some extra protection in place that kills my threads. AlbertDoubletime
F
1

The only reason that 64bit would make a difference is that threading on 64bit requires 64bit aligned values. If threadID isn't 64bit aligned, you could cause this problem.


Ok, that idea's not it. Are you sure it's valid to call CreateThread before main/WinMain? It would explain why it works in a menu- because that's after main/WinMain.

In addition, I'd triple-check the lifetime of myParam. CreateThread returns (this I know from experience) long before the function you pass in is called.


Post the thread routine's code (or just a few lines).


It suddenly occurs to me: Are you sure that you're injecting your 64bit code into a 64bit process? Because if you had a 64bit CreateThread call and tried to inject that into a 32bit process running under WOW64, bad things could happen.


Starting to seriously run out of ideas. Does the compiler report any warnings?


Could the bug be due to a bug in the host program, rather than the DLL? There's some other code, such as loading a DLL if you used __declspec(import/export), that occurs before main/WinMain. If that DLLMain, for example, had a bug in it.

Fillian answered 15/6, 2010 at 15:11 Comment(9)
I've just re-tested this. On one of the calls threadID is 64 bit aligned. It still fails. threadID should not be an issue anyway as it is a DWORD value and the parameter pointing to it is a pointer (a 64 bit value).Chaudoin
Alignment has a minimum of sizeof, but in some scenarios it must be extended. Many threading values must have an alignment of (address width). msdn.microsoft.com/en-us/library/ms684122(v=VS.85).aspx Edit: Didn't see that you re-tested it.Fillian
@Stephen - for a quick test try passing NULL for the thread ID pointer and see if the problem goes away.Mastin
If threadID param is set to NULL rather than the address of threadID it still fails.Chaudoin
myParam is valid. Absolutely sure. Its valid for the lifetime of the target application. It can't go out of scope or be deleted. Before main/winMain? I don't know. Its valid for 32 bit and I see no logical reason why it would not be for 64 bit. The WINAPI is letting me do more dangerous things than CreateThread without problems (such as loading my DLL to do the work I need to do). Thus I doubt that is the issue. I do wonder if Windows has added some extra security checks and it is those that are failing for me.Chaudoin
No point posting the thread routine's code - it never gets called, hence why I'm trying to find out why the CreateThread() call fails. The routine will only ever get called if it succeeds. All functions are aligned 64 bits by the compiler, so that isn't the problem (and I have checked that in the debugger).Chaudoin
@DeadMG. Yes I am injecting into a 64 bit process. You can't load a 64 bit DLL into a 32 bit process. Anyway, my injector checks the type of PE file beforehand and will refuse if you offer it a process that is 32 bit.Chaudoin
@DeadMG. No compiler warnings. The same code works if called later on in the program lifetime.Chaudoin
Have you actually run a 64 bit parallel program succesfully under windows? Apparently not, otherwise you wouldn't start with "the only reason .. would .." One reason could be that Microsoft uses a different mechanism in 64 bit and doesn't document it properly. I think this is more speculation than an answer. GroetjesDoubletime
U
0

I ran into this issue today. And I checked every argument feed into _beginthread/CreateThread/NtCreateThread via rohitab's Windows API Monitor v2. Every argument is aligned properly (AFAIK).

API Monitor Screenshot


So, where does STATUS_DATATYPE_MISALIGNMENT come from?

The first few lines of NtCreateThread validate parameters passed from user mode.

ProbeForReadSmallStructure (ThreadContext, sizeof (CONTEXT), CONTEXT_ALIGN);

for i386

#define CONTEXT_ALIGN   (sizeof(ULONG))

for amd64

#define STACK_ALIGN (16UI64)
...
#define CONTEXT_ALIGN STACK_ALIGN

On amd64, if the ThreadContext pointer is not aligned to 16 bytes, NtCreateThread will return STATUS_DATATYPE_MISALIGNMENT.

CreateThread (actually CreateRemoteThread) allocated ThreadContext from stack, and did nothing special to guarantee the alignment requirement is satisfied. Things will work smoothly if every piece of your code followed Microsoft x64 calling convention, which unfortunately not true for me.

PS: The same code may work on newer Windows (say Vista and newer). I didn't check though. I'm facing this issue on Windows Server 2003 R2 x64.

Uniformitarian answered 27/8, 2020 at 9:42 Comment(0)
L
0

I had the same problem (maybe); CreateThread returned zero and GetLastError gave 0x3E6 ERROR_NOACCESS. BUT it would work if I tried to use it like this.. Invoke Win64.Code.CreateThread(0, 0, PrintSomeThing.Address, 0, 0, ThreadID) But as soon as I tried to put something in the 4th and 5th parameters; it would not work. So I stumbled on this post but I was confused on exactly what the final solution was. And my stack was always 16 byte aligned. So after much trial and error attempts; I found out that the 1st parameter was the trouble maker (for me). It is what I call a ByRef Immediate. This compiler that I am writing has growing pains involved and this was one of them. It generated a location on the stack to write a zero. Then passed that address into the CreateThread function. And that stack address was NOT 16 byte aligned. And when I finally did supply it with the 1st parameter as a 16 byte aligned address; it worked. As far as other programming languages are concerned; I don't know what the address of a ByRef 0 is. But that might be something to look at...

Luxuriance answered 22/1, 2024 at 0:2 Comment(0)
D
-1

I'm in the business of using parallel threads under windows for calculations. No funny business, no dll-calls, and certainly no call-back's. The following works in 32 bits windows. I set up the stack for my calculation, well within the area reserved for my program. All releveant data about area's and start addresses is contained in a data structure that is passed to CreateThread as parameter 3. The address that is called contains a small assembler routine that uses this data stucture. Indeed this routine finds the address to return to on the stack, then the address of the data structure. There is no reason to go far into this. It just works and it calculates the number of primes below 2,000,000,000 just fine, in one thread, in two threads or in 20 threads.

Now CreateThread in 64 bits doesn't push the address of the data structure. That seems implausible so I show you the smoking gun, a dump of a debug session. enter image description here

In the subwindow at the bottom right you see the stack, and there is merely the return address, amidst a sea of zeroes. The mechanism I use to fill in parameters is portable between 32 and 64 bits. No other call exhibits a difference between word-sizes. Moreover why would the code address work but not the data address?

The bottom line: one would expect that CreateThread passes the data parameter on the stack in the same way in 64 bits as in 32 bits, then does a subroutine call. At the assembler level it doesn't work that way. If there are any hidden requirements to e.g. RSP that are automatically fullfilled in C++ that would be very nasty.

P.S. No there are no 16 byte alignment problems. That lies ages behind me.

Doubletime answered 3/9, 2018 at 19:44 Comment(0)
T
-2

Try using _beginthread() or _beginthreadex() instead, you shouldn't be using CreateThread directly.

See this previous question.

Timeous answered 24/7, 2010 at 19:49 Comment(4)
Chris, this is the wrong answer. I know if I should be using CreateThread of beginthreadex() and the answer is CreateThread(). If you read the question, you will see that eventually I answered the question itself (stack misalignment). Finally, calling beginthread() would still fail because as I indicate in the answer the stack is misaligned (and would still be misaligned by the time the call to CreateThread is made from the call to beginthread).Chaudoin
Post your solution comment and set it as an answer then, so people can actually find it (and my comment still holds, don't use CreateThread, you can google why ;-) - Many thanks.Timeous
"You shouldn't" . You can't use that unqualified. CreateThread is a primitive call that is well documented by Microsoft to use in any language. beginthread() is nice to the crun time library. To implement non-c stuff it is merely cruft and ends up calling CreateThread under the hood.Doubletime
I ran into this issue, _beginthread just returns EINVAL without any detail. This answer is not helpful.Uniformitarian

© 2022 - 2025 — McMap. All rights reserved.