How does hyperthreading affect parallelization?
Asked Answered
N

1

8

I'm using an OpenMP code on a HyperThreaded CPU.

If all else were equal, how would performance change on a non-HyperThreaded CPU?

I notice a 100% processor utilization, no matter how many threads I run, but changing the number of threads does increase performance. How can this be?

Is the story the same for non-INTEL multithreaded CPUs?

Negris answered 12/2, 2016 at 6:9 Comment(0)
H
6

The performance improvement (if any) from hyperthreading is difficult to predict.

Hyperthreading means that if one thread stalls for (almost) any reason, the CPU will have a pool of instructions from another thread to (attempt to) execute. Even without an actual stall, if two threads are scheduled that use different execution resources, instructions from both can execute simultaneously on the same core. So if, for example, the code is heavily dependent on main-memory latency (e.g., unpredictable read patterns with no prefetching), hyperthreading might increase performance substantially.

In the other direction, if the code is carefully written to cover latency via careful cache use, prefetching, etc., it may gain little or nothing from hyperthreading. Especially with older OSes that don't try to account for hyperthreading in their thread scheduling, the extra threads can actually result in extra context switches, thus slowing overall execution.

Assuming you're starting with completely single-threaded code, and adding some OpenMP directives, my own experience is that Hyperthreading is typically good for improving performance by something on the order or 10%. If the code makes almost any attempt at prefecthing or anything similar, most (if not all) of that advantage evaporates almost immediately.

Henceforward answered 12/2, 2016 at 6:35 Comment(4)
I just found out you can disable hyperthreading in the BIOS. Is this how you're determining the improvement?Negris
@roro: I've done tests both by disabling hyperthreading in the BIOS, and by manually limiting the thread count to the number of physical cores I knew were available and manually adjusting thread affinity.Henceforward
I agree that the performance improvement of HT is difficult to predict. However, all execution resources aren't used until a stall. Instead execution resources (ports, cache, ...) are shared between hyper-threads. So an add μop for HT 1 can execute on Haswell port 0 while an lea μop for HT 2 executes on port 5 in the same clock cycle. Quoting from Intel, "From a microarchitecture perspective, this means that instructions from both logical processors will persist and execute simultaneously on shared execution resources." cs.virginia.edu/~mc2zk/cs451/vol6iss1_art01.pdfDisproportion
I think the point of HT is to increase resource utilization at the execution resource level.Disproportion

© 2022 - 2024 — McMap. All rights reserved.