I'm not sure how to use the physical IDs that I've retrieved from hwloc. For example, I see the following, but if I place rank 0=10.0.2.4 slot=1:8
in the rankfile I get an error that 8
does not exist. However, rank 0=10.0.2.4 slot=1:p8
runs with no problem, but I don't know if I've referenced PU#8
or Core P#8
. How do I bind to either a specific core or a specific hardware thread using the rankfile? Is there some way that I can debug this?
[hamiltont@4 latency]$ hwloc-ls -p
Machine (36GB)
NUMANode P#0 (18GB) + Socket P#1 + L3 (12MB)
L2 (256KB) + L1 (32KB) + Core P#0
PU P#0
PU P#12
L2 (256KB) + L1 (32KB) + Core P#1
PU P#2
PU P#14
L2 (256KB) + L1 (32KB) + Core P#2
PU P#4
PU P#16
L2 (256KB) + L1 (32KB) + Core P#8
PU P#6
PU P#18
L2 (256KB) + L1 (32KB) + Core P#9
PU P#8
PU P#20
L2 (256KB) + L1 (32KB) + Core P#10
PU P#10
PU P#22
NUMANode P#1 (18GB) + Socket P#0 + L3 (12MB)
L2 (256KB) + L1 (32KB) + Core P#0
PU P#1
PU P#13
L2 (256KB) + L1 (32KB) + Core P#1
PU P#3
PU P#15
L2 (256KB) + L1 (32KB) + Core P#2
PU P#5
PU P#17
L2 (256KB) + L1 (32KB) + Core P#8
PU P#7
PU P#19
L2 (256KB) + L1 (32KB) + Core P#9
PU P#9
PU P#21
L2 (256KB) + L1 (32KB) + Core P#10
PU P#11
PU P#23
I see this question as pretty close to what I'm asking, but not quite the same.
rank 0=10.0.2.4 slot=1:p23
with no errors, I've verified that it's referencing the hardware thread ID and not the core, so I don't know how to bind to a core using the physical ID – Enticementrank 0=10.0.2.4 slot=1:4
andrank 0=10.0.2.4 slot=1:5
work, even though there are no cores with that physical ID – Enticement