How can you find the processor number a thread is running on?
Asked Answered
H

5

16

I have a memory heap manager which partitions the heap into different segments based on the number of processors on the system. Memory can only be allocated on the partition that goes with the currently running thread's processor. This will help allow different processors to continue running even if two different ones want to allocate memory at the same time, at least I believe.

I have found the function GetCurrentProcessorNumber() for Windows, but this only works on Windows Vista and later. Is there a method that works on Windows XP?

Also, can this be done with pthreads on a POSIX system?

Horsewhip answered 6/2, 2010 at 22:55 Comment(1)
you are aware of that most OSs will schedule the same thread on different cores over time?Evalynevan
D
8

For XP, a quick google as revealed this:

https://www.cs.tcd.ie/Jeremy.Jones/GetCurrentProcessorNumberXP.htm Does this help?

Dennisedennison answered 6/2, 2010 at 22:59 Comment(4)
Yes, thank you. This appears to work on both Linux and Windows, as long as it is running on an x86 platform.Horsewhip
@Patrick I don't think this works on Linux, just XP in that form anyway.Dennisedennison
The assembly language itself is not dependent upon the operating systems. As for the difference between _asm, __asm__, asm, etc. on different platforms and compilers, that I can deal with.Horsewhip
@Patrick Ok yep, have just looked it up in the assembly docs, it is an actual instruction not an API call like a first thought... works fine for me on x86-64 linux too!Dennisedennison
L
9

From output of man sched_getcpu:

NAME
       sched_getcpu - determine CPU on which the calling thread is running

SYNOPSIS
       #define _GNU_SOURCE
       #include <utmpx.h>

       int sched_getcpu(void);

DESCRIPTION
   sched_getcpu() returns the number of the CPU
   on which the calling thread is currently executing.

RETURN VALUE
   On success, sched_getcpu() returns a non-negative CPU number.
   On error, -1 is returned and errno is set to indicate the error.

SEE ALSO
   getcpu(2)

Unfortunately, this is Linux specific. I doubt there is a portable way to do this.

Lodestone answered 6/2, 2010 at 23:10 Comment(2)
A quick perusal of the pthread documentation doesn't reveal any calls that are part of the phtread API that do this.Recaption
Thanks Ilia. Though this only works on Linux, it is a nice and clean function call. If/when I need to port to another kernel, I can just change this function call to a modified version of the assembler above.Horsewhip
D
8

For XP, a quick google as revealed this:

https://www.cs.tcd.ie/Jeremy.Jones/GetCurrentProcessorNumberXP.htm Does this help?

Dennisedennison answered 6/2, 2010 at 22:59 Comment(4)
Yes, thank you. This appears to work on both Linux and Windows, as long as it is running on an x86 platform.Horsewhip
@Patrick I don't think this works on Linux, just XP in that form anyway.Dennisedennison
The assembly language itself is not dependent upon the operating systems. As for the difference between _asm, __asm__, asm, etc. on different platforms and compilers, that I can deal with.Horsewhip
@Patrick Ok yep, have just looked it up in the assembly docs, it is an actual instruction not an API call like a first thought... works fine for me on x86-64 linux too!Dennisedennison
M
3

In addition to Antony Vennard's answer and the code on the cited site, here is code that will work for Visual C++ x64 as well (no inline assembler):

DWORD GetCurrentProcessorNumberXP() {
   int CPUInfo[4];   
   __cpuid(CPUInfo, 1);
   // CPUInfo[1] is EBX, bits 24-31 are APIC ID
   if ((CPUInfo[3] & (1 << 9)) == 0) return -1;  // no APIC on chip
   return (unsigned)CPUInfo[1] >> 24;
}

A short look at the implementation of GetCurrentProcessorNumber() on Win7 x64 shows that they use a different mechanism to get the processor number, but in my (few) tests the results were the same for my home-brewn and the official function.

Mcniel answered 18/9, 2013 at 7:53 Comment(1)
Cpuid is a serializing and extremely expensive instruction (think 1000 cycles). Certainly not a suitable choice for the purpose discussed here. Picking a heap at random would be better, assuming you don't spend 1000 cycles in the number generator :-)Loot
T
1

This design smells bad to me. You seem to be making the assumption that a thread will stay associated with a specific CPU. That is not guaranteed. Yes, a thread may normally stay on a single CPU, but it doesn't have to, and eventually your program will have a thread that switches CPU's. It may not happen often, but eventually it will. If your design doesn't take this into account, then you will mostly likely eventually hit some sort of hard to trace bug.

Let me ask this question, what happens if memory is allocated on one CPU and freed on another? How will your heap handle that?

Tejada answered 6/2, 2010 at 23:59 Comment(4)
The freeing processor does not matter. In each block, I save a pointer to the correct partition. I just call the function once per allocation, so this is not a problem. While it is true that the current thread may change processors, this also would not result in any problems with my design (in theory :P). The heap itself is still a locked heap. So, if two different threads want to allocate on the same partition, one will be locked until the other finishes. This design just minimises the chance that one processor will lock the execution of another.Horsewhip
The problem is presumably that a thread might migrate while allocating memory. This may cause a thread to determine it runs on CPU #0, get a pointer to heap #0, then migrate to CPU #1, then try to allocate from heap #0.Robins
That is fine. My heap is a locked heap itself, so even without this processor number black magic, it would work fine. I am optimising it so as not to lock out other processors that could be something more useful. So in the case both of you pointed out, another processor will be locked from allocation. The main point of my design, though, is that this is more unlikely to happen, so is thus worth the effort.Horsewhip
The design is perfectly fine, it just need to assume the memory is shared (i.e. accessing it via CAS) while in fact it'd be almost always exclusive. Hence, no shared writes - and the algorithm scale perfectly fine.Rattrap
S
1

If all you want to do is avoid contention, you don't need to know the current CPU. You could just randomly pick a heap. Or you could have a heap per thread. Although you may get more or less contention that way, you would avoid the overhead of polling the current CPU, which may or may not be significant. Also check out Intel Thread Building Block's scalable_allocator, which may have already solved that problem better than you will.

Stanleigh answered 30/4, 2012 at 14:31 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.