Thread affinity with Windows, MSVC, and OpenMP

I want to bind the threads in my code to each physical core. With GCC I have successfully done this using sched_setaffinity so I no longer have to set export OMP_PROC_BIND=true. I want to do the same thing in Windows with MSVC. Windows and Linux using a different thread topology. Linux scatters the threads while windows uses a compact form. In other words in Linux with four cores and eight hyper-threads I only need to bind the threads to the first four processing units. In windows I set them to to every other processing unit.

I have successfully done this using SetProcessAffinityMask. I can see from Windows Task Manger when I right click on the processes and click "Set Affinity" that every other CPU is set (0, 2, 4, 6 on my eight hyper thread system). The problem is that the efficiency of my code is unstable when I run. Sometimes it's nearly constant but most of the time it has big changes. I changed the priority to high but it makes no difference. In Linux the efficiency is stable. Maybe Windows is still migrating the threads? Is there something else I need to do to bind the threads in Windows?

Here is the code I'm using

#ifdef _WIN32   
HANDLE process;
DWORD_PTR processAffinityMask = 0;
//Windows uses a compact thread topology.  Set mask to every other thread
for(int i=0; i<ncores; i++) processAffinityMask |= 1<<(2*i);        
//processAffinityMask = 0x55;
process = GetCurrentProcess();
SetProcessAffinityMask(process, processAffinityMask);
#else
cpu_set_t  mask;
CPU_ZERO(&mask);
for(int i=0; i<ncores; i++) CPU_SET(i, &mask);      
sched_setaffinity(0, sizeof(mask), &mask);       
#endif

Edit: here is the code I used now which seems to be stable on Linux and Windows

    #ifdef _WIN32   
    HANDLE process;
    DWORD_PTR processAffinityMask;
    //Windows uses a compact thread topology.  Set mask to every other thread
    for(int i=0; i<ncores; i++) processAffinityMask |= 1<<(2*i);
    process = GetCurrentProcess();
    SetProcessAffinityMask(process, processAffinityMask);
    #pragma omp parallel 
    {
        HANDLE thread = GetCurrentThread();
        DWORD_PTR threadAffinityMask = 1<<(2*omp_get_thread_num());
        SetThreadAffinityMask(thread, threadAffinityMask);
    }
    #else
    cpu_set_t  mask;
    CPU_ZERO(&mask);
    for(int i=0; i<ncores; i++) CPU_SET(i, &mask);
    sched_setaffinity(0, sizeof(mask), &mask);
    #pragma omp parallel 
    {
       cpu_set_t  mask;
       CPU_ZERO(&mask);
       CPU_SET(omp_get_thread_num(),&mask);
       pthread_setaffinity_np(pthread_self(), sizeof(mask), &mask); 
    }
    #endif

Recommended topics

Hot tags