Thread-local storage in kernel mode?
Asked Answered
M

3

9

Is there a Thread-Local Storage (TLS) equivalent for kernel-mode drivers in Windows (Win32 to be exact)?

What I try to achieve:

Eventually from within my driver's dispatch routine it may call many other functions (there may be a deep callstack). I want to supply some context information specific to the request being processed. That is, I have some structure, pointer to which should be visible in all the called functions, without explicitly passing it as a parameter to every function.

Using static/global is not a perfect option (multithreading, sync objects and etc.).

If that was a user-mode code - one would obviously use TLS in such a situation. But AFAIK there are no kernel-mode functions like TlsGetValue/TlsSetValue. And this makes sense - for those function to work one has to allocate a process-wide TLS index first. OTOH driver code may be invoked on arbitrary thread, not limited to a specific process.

However I don't actually need a persistent thread-specific storage. I just need a thread-specific storage for my top-level function invocation.

I think I know how to "implement" the TLS, though in a hackish way. Instead of allocating the TLS index I will always use a predefined index (say, index=0). At the top-level function I'll save the stored TLS value, and overwrite it with the needed value. Upon completion the saved value will be restored.

Luckily I know how the TLS is implemented in Win32. There's a TIB structure (thread information block) for each thread. In every thread it may be accessed using FS:[18h] selector. The TIB contains (among other things) an array used by TLS. The rest is pretty straightforward.

However I'd prefer to use an official API to achieve something similar.

  • Is there an official kernel-mode API to achieve what I need?
  • Are there reasons to avoid what I'm planning to do? I know there may potentially be a problem with re-entrance (i.e. some code invokes me, I overwrite the TLS value and then eventually call the originating code, which may rely on the TLS). But this is not possible in my specific case?
  • Are there less dirty ways to solve this?

Thanks in advance.

P.S. One may theoretically use SEH (which also has per-thread information stored). That is, wrap the top-level code by __try/__except, then where the context information is needed - raise the continuable exception with some parameter, in the __except block fill the parameter with the context information, and then resume the execution. And this is a 100% valid program flow, without use of the undocumented features. But nevertheless this seems an ugly hack for me, not to mention the performance complications.

Melody answered 20/3, 2012 at 23:38 Comment(0)
B
8

Rather than using FS:[18h] you should probably use PsGetCurrentThreadTeb. Even then, I think you'd be relying on details that might change in future OS releases (potentially including service packs).

Instead, couldn't you use KeGetCurrentProcessorNumber as an index into an array where you can store a pointer to your context information? (Provided that you're running at DISPATCH_LEVEL or higher, of course, so that you can't be switched to a different processor unexpectedly.)

If you're not guaranteed to be running at DISPATCH_LEVEL, you could use a table or linked list, with each entry (representing a thread that is currently running your code) labelled with the value of PsGetCurrentThread.

Breakthrough answered 21/3, 2012 at 1:4 Comment(6)
Excellent answer, very precise. I agree that relying on particular undocumented feature is not a good practice. However I find it extremely hard to imagine that TIB layout or the way it's accessed may change. In addition, drivers have less strict portability demands IMHO. Using a global array and accessing it via KeGetCurrentProcessorNumber is a very good idea. In my particular case requests may arrive at arvitrary IRQL, and I'm not sure I'm allowed to raise it (I call there other routines), but it definitely worth checking it out.Melody
P.S. A "table of linked lists" is exactly a thing I'm trying to get rid of. Thanks a lot.Melody
Why do you find it hard to imagine the structure of the TIB changing? It must have changed for Win64 right? And it is probably different again for ARM. And even if the fields stay the same, whats to say the kernel doesn't reuse them for something else and blow away your data (or worse, you blow away it's data).Barringer
@Stewart: You're not really into driver development, are you? Driver code is usually much less portable, you don't really expect the driver written for x86 to compile and work for ARM! And, believe it or not, digging into OS internals is also much more popular in driver development. And this is also the reason why it's difficult for MS to change layout of their internal structures, once they're hijacked.Melody
@Melody - No. Not really driver development. Just kernels. I still think that depending on the TIB is a bad idea, whether it changes or not. Do you really think that preventing Microsoft from changing things in the kernel by not following the rules is a good thing?Barringer
@Stewart: no, of course not. But sometimes you're forced to a compromise, choosing between two evils. Which one is worse - this depends on a particular case, there're no universal rules.Melody
I
5

Don't do this with the TEB! The TIB and TEB are usermode structures. A user-mode application can modify that stuff at will from another thread/processor while your driver is running. This would be a privilege escalation vulnerability in your driver.

I would recommend passing down a context structure for ephemeral context related to your request. If you need something more permanent, you could use an AVL table or a Hash table which you clean up when threads exit.

Interlining answered 6/5, 2012 at 17:7 Comment(1)
+1 for spotting the vulnerability. Note, however, that the OP already explained why he can't use a context structure (he needs the context from an object destructor, which can't be passed a parameter) and that he's already using a table and wants something more efficient.Breakthrough
O
4

You can create a structure that holds the incoming request, and you pass this around instead of the actual request, then you just put in any fields you need. Obviously this doesn't completely remove the need to pass an object around, but normally you are passing the request around anyway.

From most of the driver stuff I've seen (which admittedly isn't a huge amount) everything was always request centric. So they always tied things to the request instead of trying to keep it in some other location.

Oatmeal answered 21/3, 2012 at 0:5 Comment(3)
My case is kinda more problematic. I use C++ (don't tell me I'm a pervert), and I have some code executed in object destructors, where I can't pass extra parameters.Melody
+1 for avoiding odd thingies like TLS and like horrors. So there's some context. It's just one object instance pointer. If there's an IO request, load it up with the context instance ptr as well as all the other gunge, just as you say.Samarskite
@Martin James: I have no idea why TLS is considered "odd". It's a very useful thing in user-mode code.Melody

© 2022 - 2024 — McMap. All rights reserved.