Single-CPU programs running on Hyper-Threading-enabled quadcore CPU
Asked Answered
T

4

13

I'm a researcher in statistical pattern recognition, and I often run simulations that run for many days. I'm running Ubuntu 12.04 with Linux 3.2.0-24-generic, which, as I understand, supports multicore and hyper-threading. With my Intel Core i7 Sandy Bridge Quadcore with HTT, I often run 4 simulations (programs that take a long time) at the same time. Before I ask my question, here are the things that I already (think I) know.

  • My OS (Ubuntu 12.04) detects 8 CPUs due to hyper-threading.
  • The scheduler in my OS is clever enough never to schedule two programs to run on two logical (virtual) cores belonging to the same physical core, because the OS supports SMP (Simultaneous Multi-Threading).
  • I have read the Wikipedia page on Hyper-Threading.
  • I have read the HowStuffWorks page on Sandy Bridge.

OK, my question is as follows. When I run 4 simulations (programs) on my computer at the same time, they each run on a separate physical core. However, due to hyper-threading, each physical core is split into two logical cores. Therefore, is it true that each of the physical cores is only using half of its full capacity to run each of my simulations?

Thank you very much in advance. If any part of my question is not clear, please let me know.

Triangulate answered 22/5, 2012 at 20:10 Comment(4)
Typically you can turn hyperthreading off in the bios. To say 1/2 is not correct as it will probably queue up for each. Need to run some tests with hyperhtreading on and off. I do some parsing and matching and I found hypertheading off and as many process as cores to be the optimal but you need to test on your calculations.Randolf
This link you said you read clearly describes HT and performance. It clearly states a core does not use 1/2 of it capacity in HT mode. en.wikipedia.org/wiki/Hyper_threading -1Randolf
Thanks Blam for the help. Regarding your second comment, could you point out where on the Wikipedia page does it say that the core does not use 1/2 of its capacity in HT mode?Triangulate
Hyper-threading works by duplicating certain sections of the processor—those that store the architectural state—but not duplicating the main execution resources.Randolf
F
15

This answer is probably late, but I see that nobody offered an accurate description of what's going on under the hood.

To answer your question, no, one thread will not use half a core. One thread can work inside the core at a time, but that one thread can saturate the whole core processing power.

Assume thread 1 and thread 2 belong to core #0. Thread 1 can saturate the whole core's processing power, while thread 2 waits for the other thread to end its execution. It's a serialized execution, not parallel.

At a glance, it looks like that extra thread is useless. I mean the core can process 1 thread at once right?

Correct, but there are situations in which the cores are actually idling because of 2 important factors:

  • cache miss
  • branch misprediction

Cache miss

When it receives a task, the CPU searches inside its own cache for the memory addresses it needs to work with. In many scenarios the memory data is so scattered that it is physically impossible to keep all the required address ranges inside the cache (since the cache does have a limited capacity).

When the CPU doesn't find what it needs inside the cache, it has to access the RAM. The RAM itself is fast, but it pales compared to the CPU's on-die cache. The RAM's latency is the main issue here.

While the RAM is being accessed, the core is stalled. It's not doing anything. This is not noticeable because all these components work at a ridiculous speed anyway and you wouldn't notice it through some CPU load software, but it stacks additively. One cache miss after another and another hampers the overall performance quite noticeably. This is where the second thread comes into play. While the core is stalled waiting for data, the second thread moves in to keep the core busy. Thus, you mostly negate the performance impact of core stalls.

I say mostly because the second thread can also stall the core if another cache miss happens, but the likelihood of 2 threads missing the cache in a row instead of 1 thread is much lower.

Branch misprediction

Branch prediction is when you have a code path with more than one possible result. The most basic branching code would be an if statement. Modern CPUs have branch prediction algorithms embedded into their microcode which try to predict the execution path of a piece of code. These predictors are actually quite sophisticated and although I don't have solid data on prediction rate, I do recall reading some articles a while back stating that Intel's Sandy Bridge architecture has an average successful branch prediction rate of over 90%.

When the CPU hits a piece of branching code, it practically chooses one path (path which the predictor thinks is the right one) and executes it. Meanwhile, another part of the core evaluates the branching expression to see if the branch predictor was indeed right or not. This is called speculative execution. This works similarly to 2 different threads: one evaluates the expression, and the other executes one of the possible paths in advance.

From here we have 2 possible scenarios:

  1. The predictor was correct. Execution continues normally from the speculative branch which was already being executed while the code path was being decided upon.
  2. The predictor was wrong. The entire pipeline which was processing the wrong branch has to be flushed and start over from the correct branch. OR, the readily available thread can come in and simply execute while the mess caused by the misprediction is resolved. This is the second use of hyperthreading. Branch prediction on average speeds up execution considerably since it has a very high rate of success. But performance does incur quite a penalty when the prediction is wrong.

Branch prediction is not a major factor of performance degradation since, like I said, the correct prediction rate is quite high. But cache misses are a problem and will continue to be a problem in certain scenarios.

From my experience hyperthreading does help out quite a bit with 3D rendering (which I do as a hobby). I've noticed improvements of 20-30% depending on the size of the scenes and materials/textures required. Huge scenes use huge amounts of RAM making cache misses far more likely. Hyperthreading helps a lot in overcoming these misses.

Flytrap answered 1/4, 2015 at 10:26 Comment(3)
Oh wow thank you for your essay! I'll have to read that again in more detail later.Triangulate
I cannot guarantee that everything is 100% accurate since it's off the top of my head, but that's the general idea. You're very welcome, my friend :).Flytrap
Regarding cache miss: you claim that the likelihood of two threads missing the cache in a row is lower than one. While this may be theoretically correct it may also be a tad misleading. Due to cache coherence thread 1 may have access to what thread 2 just accessed since the data may be located on the same physical core. For the same reason thread 1 might not cache everything it would otherwise do since thread 2 have "eaten up" part of the cache. You also claim that huge scenes plus lots of RAM makes cache misses far more likely, yes but mostly due to linearity of the rendering algorithm.Piceous
P
4

Since you are running on a Linux kernel you are in luck because the scheduler is smart enough to make sure your tasks is divided on between your physical cores.

Linux became hyperthredding aware in kernel 2.4.17 ( ref: http://kerneltrap.org/node/391 )

Note that the reference is from the old O(1) scheduler. Linux now uses the CFS scheduling algorithm which was introduced in kernel 2.6.23 and should be even better.

But as already suggested you can experiment by disabling hyper threading in bios and see if your particular workload runs faster or slower with or without hyperthreading enabled. If you start 8 tasks instead of 4 you will probably find that the total executing time for 8 tasks on hyperthreading is faster than two separate runs with 4 tasks but again the best thing to do is to experiment. Good luck!

Piceous answered 19/1, 2013 at 9:45 Comment(1)
On a side note. I wonder if the time will come that someone writes Linux became self aware in x.x.x ;)Piceous
T
1

If you are really want just 4 dedicated cores, you should be able to disable hyperthreading in your BIOS page. Also, and this part I'm less clear on, I believe that the processor is smart enough to do more work on a single thread if its second logical core is idle.

Truong answered 22/5, 2012 at 20:16 Comment(4)
Thank you for your reply! Regarding your last sentence, that is my suspicion too. However, how much "more work on a single thread if its second logical core is idle" will it do? Will it still be slower than if the physical core were not hyper-threaded?Triangulate
I could research the theoretical aspects, but really your best bet is to run with hyperthreading enabled and see how long it takes, then try it with hyperthreading disabled.Truong
Yep I guess it's best to do that. Thanks again. I just thought this sort of thing would have been explicitly stated by Intel. Surely I'm not the only person who wants to know this sort of thing?Triangulate
So Ray, what are your results?Tutto
G
0

No, it's not exactly true. A hyperthreaded core is not two cores. Some things can run in parallel, but not as much as on two separate cores.

Glennaglennie answered 22/5, 2012 at 20:13 Comment(1)
Hi thank you very much for your reply. What do you mean "not exactly true"? Do you mean not true? Do you mean to say that a hyper-threaded physical core will be using more than half its capacity on a program that is running on only one of its logical cores? If so, how much more? Will it be using nearly all of its capacity on that program? The latter sentence is what I'm hoping to be true.Triangulate

© 2022 - 2024 — McMap. All rights reserved.