I want to accurately pin my MPI processes to a list of (physical) cores. I refer to the following points of the mpirun --help output:
-cpu-set|--cpu-set <arg0>
Comma-separated list of ranges specifying logical
cpus allocated to this job [default: none]
...
-rf|--rankfile <arg0>
Provide a rankfile file
The topology of my processor is as follows:
-------------------------------------------------------------
CPU type: Intel Core Bloomfield processor
*************************************************************
Hardware Thread Topology
*************************************************************
Sockets: 1
Cores per socket: 4
Threads per core: 2
-------------------------------------------------------------
HWThread Thread Core Socket
0 0 0 0
1 0 1 0
2 0 2 0
3 0 3 0
4 1 0 0
5 1 1 0
6 1 2 0
7 1 3 0
-------------------------------------------------------------
Socket 0: ( 0 4 1 5 2 6 3 7 )
-------------------------------------------------------------
Now, if I start my programm using mpirun -np 2 --cpu-set 0,1 --report-bindings ./solver the program starts normally but without considering the --cpu-set argument I provided. On the other hand starting my program with mpirun -np 2 --rankfile rankfile --report-bindings ./solver gives me the following output:
[neptun:14781] [[16333,0],0] odls:default:fork binding child [[16333,1],0] to slot_list 0
[neptun:14781] [[16333,0],0] odls:default:fork binding child [[16333,1],1] to slot_list 1
Indeed checking with top shows me that mpirun actually uses the specified cores. But how should I interpret this output? Except for the host (neptun) and the specified slots (0,1) I don't have a clue. Same with the other commands I tried out:
$mpirun --np 2 --bind-to-core --report-bindings ./solver
[neptun:15166] [[15694,0],0] odls:default:fork binding child [[15694,1],0] to cpus 0001
[neptun:15166] [[15694,0],0] odls:default:fork binding child [[15694,1],1] to cpus 0002
and
$mpirun --np 2 --bind-to-socket --report-bindings ./solver
[neptun:15188] [[15652,0],0] odls:default:fork binding child [[15652,1],0] to socket 0 cpus 000f
[neptun:15188] [[15652,0],0] odls:default:fork binding child [[15652,1],1] to socket 0 cpus 000f
With --bind-to-core, the top command once again shows me that cores 0 and 1 are used, but why is the output cpus 0001 and 0002? --bind-to-socket causes even more confusion: 2x 000f?
I use the last paragraph to summarize the questions that arose from my experiments:
- Why isn't my --cpu-set command working?
- How am I supposed to interpret the output resulting from the --report-bindings output?
References
The CPU-Topology was read out using LIKWID Performance Tools, more precisely using likwid-topology
.
LIKWID is licensed under the GPL-3.0 license, see their GitHub for more info.