Thread affinity with Windows, MSVC, and OpenMP
Asked Answered
S

1

6

I want to bind the threads in my code to each physical core. With GCC I have successfully done this using sched_setaffinity so I no longer have to set export OMP_PROC_BIND=true. I want to do the same thing in Windows with MSVC. Windows and Linux using a different thread topology. Linux scatters the threads while windows uses a compact form. In other words in Linux with four cores and eight hyper-threads I only need to bind the threads to the first four processing units. In windows I set them to to every other processing unit.

I have successfully done this using SetProcessAffinityMask. I can see from Windows Task Manger when I right click on the processes and click "Set Affinity" that every other CPU is set (0, 2, 4, 6 on my eight hyper thread system). The problem is that the efficiency of my code is unstable when I run. Sometimes it's nearly constant but most of the time it has big changes. I changed the priority to high but it makes no difference. In Linux the efficiency is stable. Maybe Windows is still migrating the threads? Is there something else I need to do to bind the threads in Windows?

Here is the code I'm using

#ifdef _WIN32   
HANDLE process;
DWORD_PTR processAffinityMask = 0;
//Windows uses a compact thread topology.  Set mask to every other thread
for(int i=0; i<ncores; i++) processAffinityMask |= 1<<(2*i);        
//processAffinityMask = 0x55;
process = GetCurrentProcess();
SetProcessAffinityMask(process, processAffinityMask);
#else
cpu_set_t  mask;
CPU_ZERO(&mask);
for(int i=0; i<ncores; i++) CPU_SET(i, &mask);      
sched_setaffinity(0, sizeof(mask), &mask);       
#endif

Edit: here is the code I used now which seems to be stable on Linux and Windows

    #ifdef _WIN32   
    HANDLE process;
    DWORD_PTR processAffinityMask;
    //Windows uses a compact thread topology.  Set mask to every other thread
    for(int i=0; i<ncores; i++) processAffinityMask |= 1<<(2*i);
    process = GetCurrentProcess();
    SetProcessAffinityMask(process, processAffinityMask);
    #pragma omp parallel 
    {
        HANDLE thread = GetCurrentThread();
        DWORD_PTR threadAffinityMask = 1<<(2*omp_get_thread_num());
        SetThreadAffinityMask(thread, threadAffinityMask);
    }
    #else
    cpu_set_t  mask;
    CPU_ZERO(&mask);
    for(int i=0; i<ncores; i++) CPU_SET(i, &mask);
    sched_setaffinity(0, sizeof(mask), &mask);
    #pragma omp parallel 
    {
       cpu_set_t  mask;
       CPU_ZERO(&mask);
       CPU_SET(omp_get_thread_num(),&mask);
       pthread_setaffinity_np(pthread_self(), sizeof(mask), &mask); 
    }
    #endif
Strow answered 21/7, 2014 at 10:6 Comment(9)
On both platforms your code sets the process affinity mask and not the affinity mask for each individual thread, therefore the scheduler is still free to move the threads among the CPUs allowed by the process affinity mask.Snelling
@HristoIliev, I understand what you mean. Do you know how to get the thread handle for each thread created by OpenMP?Strow
Inside the parallel region use GetCurrentThread() to obtain the handle of the current thread and assign it an affinity mask with a single bit set based on the result from omp_get_thread_num().Snelling
@HristoIliev, you mean inside a parallel section? I tried that and it returns the same handle for each thread. I'll keep trying...Strow
I mean to call it from inside a parallel region so that all OpenMP threads make the call. It should not return the same thread handle in that case unless you are assigning to a shared variable.Snelling
@HristoIliev, I added some code to my question showing what I'm trying to do. The handle is private for each thread. It's always -2.Strow
I should read more carefully the pages that I link to :) GetCurrentThread returns a constant pseudo-handle which appears to be (HANDLE)(-2). SetThreadAffinityMask(GetCurrentThread(), ...) should work as expected.Snelling
@HristoIliev, okay, I think I got it now. It's stable currently but I need to test it a few more times. With Linux I guess I use pthread_setaffinity_np? That seems to work so far (and pthread_self() does return a different value for each thread in the parallel region).Strow
I am trying to do similar thing, but having difficulty compiling it in Linux. What should I include in the header? I have #define _GNU_SOURCE #include<sched.h>Confiture
G
1

You should use the SetThreadAffinityMask function (see MSDN reference). You are setting the process's mask.

You can obtain a thread ID in OpenMP with this code:

int tid = omp_get_thread_num();

However the code above provides OpenMP's internal thread ID, and not the system thread ID. This article explains more on the subject:

http://msdn.microsoft.com/en-us/magazine/cc163717.aspx

if you need to explicitly work with those trheads - use the explicit affinity type as explained in this Intel documentation:

https://software.intel.com/sites/products/documentation/studio/composer/en-us/2011Update/compiler_c/optaps/common/optaps_openmp_thread_affinity.htm

Girl answered 21/7, 2014 at 10:48 Comment(16)
Do you have some example code? I mean one which sets the mask for each thread?Strow
It is not much different from what you have -just use thread IDs instead of process ID when calling the function. You get the thread ID when you create a threadGirl
I understand that, but how do I loop though the threads? I mean how do I get the handler for each thread? I only know GetCurrentThread()Strow
The very last parameter to Windows's CreateThread() is a long pointer to a thread ID - that is the way to get the needed info for you. MSDN reference: msdn.microsoft.com/en-us/library/windows/desktop/…Girl
A cursory look on Google provides this link to OpenMP doc (computing.llnl.gov/tutorials/openMP) which lists this, under Run-time Library Routines: section: "Querying a thread's unique identifier (thread ID), a thread's ancestor's identifier, the thread team size"Girl
From the link above - this is how you get the thread ID: tid = omp_get_thread_num();Girl
I'm not using the Intel compiler otherwise I would use KMP_AFFINITY. So I still don't know how to get the Windows thread ID.Strow
There is probably way to get the threads associated with a process and loop over them.Strow
msdn.microsoft.com/en-us/library/windows/desktop/…Strow
Microsoft's OpenMP implementation uses a system-managed thread pool. Enumerating and binding all process threads would affect some that do not belong to the thread pool (e.g. the manager thread, possibly hidden window message loop threads, etc.). I would instead perform the binding inside the parallel region.Snelling
BTW, there are lots of ways of creating bad affinity masks that cause problems. IF you are going to manually be setting thread affinity, you should take a look at the Core Detection sample.Irretrievable
@ChuckWalbourn, thanks for the link! I agree with you. I'm hard coding the topology in right now. The topology is different for AMD and Intel and for Linux and Windows so there are at least four permutations to handle. Additionally, the the topology could change with different version of the OS ard maybe even hardware version. So I need batter way to do this at some point. But at least I know how to bind the threads now in my code :-)Strow
There are some intensely bad assumptions people make when creating these affinity masks, so definitely look at the sample.Irretrievable
Found a mirror of the core detection sample here.Kalakalaazar
Oh. Guess that's your repo, @ChuckWalbourn :)Kalakalaazar
Yes, sorry the MSDN Code Gallery is offline these days... I posted it to GitHub.Irretrievable

© 2022 - 2024 — McMap. All rights reserved.