Multithreading: What is the point of more threads than cores?

Asked 27/6, 2010 at 2:18 Answered 3/11, 2017 at 10:7

209

I thought the point of a multi-core computer is that it could run multiple threads simultaneously. In that case, if you have a quad-core machine, what's the point of having more than 4 threads running at a time? Wouldn't they just be stealing time (CPU Resources) from each other?

Armlet answered 27/6, 2010 at 2:18 Comment(3)

When was the last time you had Firefox, MS Word, Winamp, Eclipse and a download manager (more than four programs/processes) running simultaneously on your quad core machine? Also, a single application might sometimes spawn more than four threads - how about that? – Derry 28/6, 2010 at 5:41

Stealing is not necessarily bad. You may have a thread with a higher priority for important tasks that need to steal time. – Carreon 2/3, 2016 at 3:10

@Derry I guess that was the question, why single application may want to spawn more threads than cores if it doesn't seem to bring any performance benefit. And your example with more than four programs is not quite relevant here. As you correctly noted, those are processes. OS multitasking feature (process multiplexing) has very little to do with threads within one process. – Foliation 20/3, 2018 at 13:18

135

The answer revolves around the purpose of threads, which is parallelism: to run several separate lines of execution at once. In an 'ideal' system, you would have one thread executing per core: no interruption. In reality this isn't the case. Even if you have four cores and four working threads, your process and it threads will constantly be being switched out for other processes and threads. If you are running any modern OS, every process has at least one thread, and many have more. All these processes are running at once. You probably have several hundred threads all running on your machine right now. You won't ever get a situation where a thread runs without having time 'stolen' from it. (Well, you might if it's running real-time, if you're using a realtime OS or, even on Windows, use a real-time thread priority. But it's rare.)

With that as background, the answer: Yes, more than four threads on a true four-core machine may give you a situation where they 'steal time from each other', but only if each individual thread needs 100% CPU. If a thread is not working 100% (as a UI thread might not be, or a thread doing a small amount of work or waiting on something else) then another thread being scheduled is actually a good situation.

It's actually more complicated than that:

What if you have five bits of work that all need to be done at once? It makes more sense to run them all at once, than to run four of them and then run the fifth later.
It's rare for a thread to genuinely need 100% CPU. The moment it uses disk or network I/O, for example, it may be potentially spend time waiting doing nothing useful. This is a very common situation.
If you have work that needs to be run, one common mechanism is to use a threadpool. It might seem to make sense to have the same number of threads as cores, yet the .Net threadpool has up to 250 threads available per processor. I'm not certain why they do this, but my guess is to do with the size of the tasks that are given to run on the threads.

So: stealing time isn't a bad thing (and isn't really theft, either: it's how the system is supposed to work.) Write your multithreaded programs based on the kind of work the threads will do, which may not be CPU-bound. Figure out the number of threads you need based on profiling and measurement. You may find it more useful to think in terms of tasks or jobs, rather than threads: write objects of work and give them to a pool to be run. Finally, unless your program is truly performance-critical, don't worry too much :)

Clapperclaw answered 27/6, 2010 at 4:37 Comment(2)

+1 for "but only if each individual thread needs 100% CPU". That was the assumption I didn't realize I was making. – Armlet 27/6, 2010 at 7:31

Great answer overall. One thing I'm missing is the mention of the terms "Interrupt signal" and "context switch". Both are fundamental to understanding the above, in my honest opinion. – Luminesce 11/1, 2021 at 23:21

Just because a thread exists doesn't always mean it's actively running. Many applications of threads involve some of the threads going to sleep until it's time for them to do something - for instance, user input triggering threads to wake up, do some processing, and go back to sleep.

Essentially, threads are individual tasks that can operate independently of one another, with no need to be aware of the progress of another task. It's quite possible to have more of these than you have ability to run simultaneously; they're still useful for convenience even if they sometimes have to wait in line behind one another.

Sizzle answered 27/6, 2010 at 2:25 Comment(1)

Well said. The 'one thread per CPU' argument only applies to CPU-bound code. Asynchronous programming is another reason to use threads. – Quant 27/6, 2010 at 19:11

The point is that, despite not getting any real speedup when thread count exceeds core count, you can use threads to disentangle pieces of logic that should not have to be interdependent.

In even a moderately complex application, using a single thread try to do everything quickly makes hash of the 'flow' of your code. The single thread spends most of its time polling this, checking on that, conditionally calling routines as needed, and it becomes hard to see anything but a morass of minutiae.

Contrast this with the case where you can dedicate threads to tasks so that, looking at any individual thread, you can see what that thread is doing. For instance, one thread might block waiting on input from a socket, parse the stream into messages, filter messages, and when a valid message comes along, pass it off to some other worker thread. The worker thread can work on inputs from a number of other sources. The code for each of these will exhibit a clean, purposeful flow, without having to make explicit checks that there isn't something else to do.

Partitioning the work this way allows your application to rely on the operating system to schedule what to do next with the cpu, so you don't have to make explicit conditional checks everywhere in your application about what might block and what's ready to process.

Taphouse answered 27/6, 2010 at 3:53 Comment(5)

This is an interesting thought... I'd always heard that multithreading an app is a net addition of complexity, but what you're saying makes sense. – Armlet 27/6, 2010 at 4:27

Multithreading an app adds complexity if its concerns are not adequately separated. If it's designed with minimal overlap of concerns (and thus shared state) it is a net save in complexity issues. – Steer 27/6, 2010 at 5:12

There are ways to structure single-threaded applications so that control-flow is clearer at the level you write programs. OTOH, if you can structure your threads so that they only pass messages to each other (instead of having shared resources) then it's pretty simple to work out what's going on and make everything work. – Perpend 27/6, 2010 at 10:35

Should point out, though, that using threads can only simplify things up to a certain point. All too often the attempt is made to make two threads do the work that should rightly be done by one, at which the complexity comes back in spades. The symptoms of this are excessive needs for communication and synchronization in order to coordinate some desired outcome. – Taphouse 27/6, 2010 at 14:43

I think it's misleading to say we do not get "any real speedup" if # of threads > # of cores. It's simply not true. As other answers stated due to smart context switching on idle times of threads while waiting for I/O or whatever else, a significant performance improvement can be achieved by using more threads than cores. – Luminesce 11/1, 2021 at 23:29

If a thread is waiting for a resource (such as loading a value from RAM into a register, disk I/O, network access, launch a new process, query a database, or wait for user input), the processor can work on a different thread, and return to the first thread once the resource is available. This reduces the time the CPU spends idle, as the CPU can perform millions of operations instead of sitting idle.

Consider a thread that needs to read data off a hard drive. In 2014, a typical processor core operates at 2.5 GHz and may be able to execute 4 instructions per cycle. With a cycle time of 0.4 ns, the processor can execute 10 instructions per nanosecond. With typical mechanical hard drive seek times are around 10 milliseconds, the processor is capable of executing 100 million instructions in the time it takes to read a value from the hard drive. There may be significant performance improvements with hard drives with a small cache (4 MB buffer) and hybrid drives with a few GB of storage, as data latency for sequential reads or reads from the hybrid section may be several orders of magnitude faster.

A processor core can switch between threads (cost for pausing and resuming a thread is around 100 clock cycles) while the first thread waits for a high latency input (anything more expensive than registers (1 clock) and RAM (5 nanoseconds)) These include disk I/O, network access (latency of 250ms), reading data off a CD or a slow bus, or a database call. Having more threads than cores means useful work can be done while high-latency tasks are resolved.

The CPU has a thread scheduler that assigns priority to each thread, and allows a thread to sleep, then resume after a predetermined time. It is the thread scheduler's job to reduce thrashing, which would occur if each thread executed just 100 instructions before being put to sleep again. The overhead of switching threads would reduce the total useful throughput of the processor core.

For this reason, you may want to break up your problem in to a reasonable number of threads. If you were writing code to perform matrix multiplication, creating one thread per cell in the output matrix might be excessive, whereas one thread per row or per n rows in the output matrix might reduce the overhead cost of creating, pausing, and resuming threads.

This is also why branch prediction is important. If you have an if statement that requires loading a value from RAM but the body of the if and else statements use values already loaded into registers, the processor may execute one or both branches before the condition has been evaluated. Once the condition returns, the processor will apply the result of the corresponding branch and discard the other. Performing potentially useless work here is probably better than switching to a different thread, which could lead to thrashing.

As we have moved away from high clock-speed single-core processors to multi-core processors, chip design has focused on cramming more cores per die, improving on-chip resource sharing between cores, better branch prediction algorithms, better thread switching overhead, and better thread scheduling.

Abulia answered 12/7, 2014 at 19:1 Comment(1)

the same can be done with a single thread and a queue though :\ is there really any benefit to having 80 threads on 2-4 cores, over just having 2-4 cores that just eat up tasks off a queue as soon as they arrive and they have nothing to do? – Hofmann 15/4, 2018 at 19:33

Most of the answers above talk about performance and simultaneous operation. I'm going to approach this from a different angle.

Let's take the case of, say, a simplistic terminal emulation program. You have to do the following things:

watch for incoming characters from the remote system and display them
watch for stuff coming from the keyboard and send them to the remote system

(Real terminal emulators do more, including potentially echoing the stuff you type onto the display as well, but we'll pass over that for now.)

Now the loop for reading from the remote is simple, as per the following pseudocode:

while get-character-from-remote:
    print-to-screen character

The loop for monitoring the keyboard and sending is also simple:

while get-character-from-keyboard:
    send-to-remote character

The problem, though, is that you have to do this simultaneously. The code now has to look more like this if you don't have threading:

loop:
    check-for-remote-character
    if remote-character-is-ready:
        print-to-screen character
    check-for-keyboard-entry
    if keyboard-is-ready:
        send-to-remote character

The logic, even in this deliberately simplified example that doesn't take into account real-world complexity of communications, is quite obfuscated. With threading, however, even on a single core, the two pseudocode loops can exist independently without interlacing their logic. Since both threads will be mostly I/O-bound, they don't put a heavy load on the CPU, even though they are, strictly speaking, more wasteful of CPU resources than the integrated loop would be.

Now of course real-world usage is more complicated than the above. But the complexity of the integrated loop goes up exponentially as you add more concerns to the application. The logic gets ever more fragmented and you have to start using techniques like state machines, coroutines, et al to get things manageable. Manageable, but not readable. Threading keeps the code more readable.

So why would you not use threading?

Well, if your tasks are CPU-bound instead of I/O-bound, threading actually slows your system down. Performance will suffer. A lot, in many cases. ("Thrashing" is a common problem if you drop too many CPU-bound threads. You wind up spending more time changing the active threads than you do running the contents of the threads themselves.) Also, one of the reasons the logic above is so simple is that I've very deliberately chosen a simplistic (and unrealistic) example. If you wanted to echo what was typed to the screen then you've got a new world of hurt as you introduce locking of shared resources. With only one shared resource this isn't so much a problem, but it does start to become a bigger and bigger problem as you have more resources to share.

So in the end, threading is about many things. For example, it's about making I/O-bound processes more responsive (even if less efficient overall) as some have already said. It's also about making logic easier to follow (but only if you minimize shared state). It's about a lot of stuff, and you have to decide if its advantages outweigh its disadvantages on a case by case basis.

Steer answered 27/6, 2010 at 5:10 Comment(0)

I strongly disagree with @kyoryu's assertion that the ideal number is one thread per CPU.

Think about it this way: why do we have multi-processing operating systems? For most of computer history, nearly all computers had one CPU. Yet from the 1960s on, all "real" computers had multi-processing (aka multi-tasking) operating systems.

You run multiple programs so that one can run while others are blocked for things like IO.

lets set aside arguments about whether Windows versions before NT were multi-tasking. Since then, every real OS had multi-tasking. Some don't expose it to users, but its there anyway, doing things like listening to the cellphone radio, talking to the GPS chip, accepting mouse input, etc.

Threads are just tasks that are a bit more efficient. There is no fundamental difference between a task, process, and thread.

A CPU is a terrible thing to waste, so have lots of things ready to use it when you can.

I will agree that with most procedural languages, C, C++, Java etc, writing proper thread safe code is a lot of work. With 6 core CPUs on the market today, and 16 core CPUs not far away, I expect that folks will move away from these old languages, as multi-threading is more and more of a critical requirement.

Disagreement with @kyoryu is just IMHO, the rest is fact.

Yezd answered 27/6, 2010 at 4:53 Comment(3)

If you've got lots of processor-bound threads, then the ideal number is one per CPU (or perhaps one less, to leave one to manage all the I/O and the OS and all that stuff). If you've got IO-bound threads, you can stack quite a lot on a single CPU. Different apps have different mixes of processor-bound and IO-bound tasks; that's totally natural, but why you have to be careful with universal declarations. – Perpend 27/6, 2010 at 10:39

Of course, the most important difference between threads and processes is that on Windows there is no fork(), so process creation is really expensive, leading to over use of threads. – Hydromechanics 27/6, 2010 at 11:55

Except for protein folding, SETI, etc. there are no practical user tasks that are compute bound for very long. There is always a need to get info from the user, talk to the disk, talk to the DBMS, etc. Yeah, the expense of fork() is one of the many things that Cutler cursed NT with that others at DEC knew. – Yezd 27/6, 2010 at 23:51

Although you can certainly use threads for speeding up calculations depending on your hardware, one of their main uses is to do more than one thing at a time for user-friendliness reasons.

For example, if you have to do some processing in the background and also remain responsive to UI input, you can use threads. Without threads, the user interface would hang every time you tried to do any heavy processing.

Also see this related question: Practical uses for threads

Amphisbaena answered 27/6, 2010 at 2:20 Comment(1)

UI handling is a classic example of an IO-bound task. It's not good to have a single CPU core doing both processing and IO tasks. – Perpend 27/6, 2010 at 10:41

Imagine a Web server that has to serve an arbitrary number of requests. You have to serve the requests in parallel because otherwise each new request has to wait until all the other requests have been completed (including sending the response over the Internet). In this case, most web servers have way less cores than the number of requests they usually serve.

It also makes it easier for the developer of the server: You only have to write a thread program that serves a request, you don't have to think about storing multiple requests, the order you serve them, and so on.

Spirant answered 27/6, 2010 at 2:25 Comment(1)

You are writing software for an operation system that supports threading but has no capacity for multiplexing io? I think the web server is probably a bad example since in this case multiplexing io is almost always gonna be more efficient than spawning more threads than cores. – Teriteria 27/6, 2010 at 2:41

Many threads will be asleep, waiting for user input, I/O, and other events.

Quimby answered 27/6, 2010 at 9:54 Comment(1)

For sure. just use Task Manager on Windows or TOP on real OS, and see how many tasks/processes are alseep. Its always 90% or more. – Yezd 27/6, 2010 at 23:53

Threads can help with responsiveness in UI applications. Additionally, you can use threads to get more work out of your cores. For instance, on a single core, you can have one thread doing IO and another doing some computation. If it were single threaded, the core could essentially be idle waiting for the IO to complete. That's a pretty high level example, but threads can definitely be used to pound your cpu a bit harder.

Despinadespise answered 27/6, 2010 at 2:26 Comment(1)

More specifically, one thread can be waiting on I/O while another does computation. If I/O took (significant) CPU cycles, there'd be no benefit to running it in a separate thread. The benefit is that your computation thread can run while your I/O thread is twiddling its thumbs waiting for a big aluminum cylinder to spin into place, or for packets to arrive over the wire from Iceland, or whatever. – Telemachus 27/6, 2010 at 5:23

A processor, or CPU, is the physical chip that is plugged into the system. A processor can have multiple cores (a core is the part of the chip that is capable of executing instructions). A core can appear to the operating system as multiple virtual processors if it is capable of simultaneously executing multiple threads (a thread is a single sequence of instructions).

A process is another name for an application. Generally, processes are independent of each other. If one process dies, it does not cause another process to also die. It is possible for processes to communicate, or share resources such as memory or I/O.

Each process has a separate address space and stack. A process can contain multiple threads, each able to execute instructions simultaneously. All the threads in a process share the same address space, but each thread will have its own stack.

Hopefully with these definitions and further research using these fundamentals will help your understanding.

Swansea answered 27/6, 2010 at 2:50 Comment(2)

I don't see how this addresses his question at all. My interpretation of his question is about thread usage of cores and optimal use of the available resources, or about behaviour of threads as you increase their number, or something along those lines anyway. – Clapperclaw 27/6, 2010 at 4:14

@Clapperclaw perhaps it wasn't a direct answer to my question, but I still feel I learned by reading it. – Armlet 27/6, 2010 at 7:35

The way some API are designed, you have no choice but to run them in a separate thread (anything with blocking operations). An example would be Python's HTTP libraries (AFAIK).

Usually this isn't much of a problem though (if it is a problem, the OS or API should ship with an alternative asynchronous operating mode, ie: select(2)), because it probably means the thread is going to be sleeping during waiting for I/O completion. On the other hand, if something is doing a heavy computation, you have to put it in a separate thread than say, the GUI thread (unless you enjoy manual multiplexing).

Lott answered 27/6, 2010 at 3:42 Comment(0)

The ideal usage of threads is, indeed, one per core.

However, unless you exclusively use asynchronous/non-blocking IO, there's a good chance that you will have threads blocked on IO at some point, which will not use your CPU.

Also, typical programming languages make it somewhat difficult to use 1 thread per CPU. Languages designed around concurrency (such as Erlang) can make it easier to not use extra threads.

Dimity answered 27/6, 2010 at 2:25 Comment(5)

Using threads for periodic tasks is a very common and welcome workflow, and it would be much less than ideal if they stole a core. – Appointment 27/6, 2010 at 4:27

@Nick Bastin: Yes, but it's more efficient to stick those tasks into a task queue and execute them from that queue (or a similar strategy). For optimal efficiency, 1 thread per core beats all, as it prevents overhead from unnecessary context switching and extra stacks being allocated. No matter what, the periodic task must steal a core while 'active,' as the cpu can only actually perform one task per core (plus stuff like hyperthreading if available). – Dimity 27/6, 2010 at 5:37

@Nick Bastin: Unfortunately, as I said in the main answer, most modern languages don't lend themselves well to easily implementing a system which does this effectively is not trivial - you end up doing some amount of fighting typical usage of the language. – Dimity 27/6, 2010 at 5:40

My point isn't that one thread per core isn't optimal, it's that one thread per core is a pipe dream (unless you're embedded) and designing to try to hit it is a waste of time, so you might as well do what makes it easy for you (and isn't any less efficient on a modern scheduler anyhow), rather than trying to optimize the number of threads you're using. Should we spin up threads for no good reason? Of course not, but whether you're unnecessarily wasting computer resources is a concern regardless of threading. – Appointment 27/6, 2010 at 16:19

@Nick Bastin: So, to summarize, one thread per core is ideal, but actually achieving that isn't very likely. I probably should have been stronger than 'somewhat difficult' when talking about how likely it is to actually achieve such a thing. – Dimity 27/6, 2010 at 17:12

In response to your first conjecture: multi-core machines can simultaneously run multiple processes, not just the multiple threads of a single process.

In response to your first question: the point of multiple threads is usually to simultaneously perform multiple tasks within one application. The classic examples on the net are an email program sending and receiving mail, and a web server receiving and sending page requests. (Note that it's essentially impossible to reduce a system like Windows to running only one thread or even only one process. Run the Windows Task Manager and you'll typically see a long list of active processes, many of which will be running multiple threads.)

In response to your second question: most processes/threads are not CPU-bound (ie, not running continuously and uninterrupted), but instead stop and wait frequently for I/O to finish. During that wait, other processes/threads can run without "stealing" from the waiting code (even on a single core machine).

Inch answered 27/6, 2010 at 5:21 Comment(0)

I know this is a super old question with plenty of good answers, but I am here to point out something that is important in current environment:

If you want to design an application for multi-threading, you should not be designing for a specific hardware setting. CPU technology has been advancing quite rapidly for years, and core counts are increasing steadily. If you deliberately design your application such that it uses only 4 threads, then you are potentially restricting yourself in a octa-core system (for example). Now, even 20-core systems are commercially available, so such a design is definitely doing more harm than good.

Jell answered 3/11, 2017 at 10:7 Comment(0)

-1

A thread is an abstraction that enables You to write code as simple as a sequence of operation, blissfully unaware of that the code is executed interlaced with other code, or parked waiting for IO, or (maybe somewhat more aware of) waiting for other thread's events or messages.

Diatonic answered 26/8, 2011 at 14:17 Comment(1)

I might have edited this by adding more examples since the downvotes - but a thread (or process, in this context almost no difference) were not invented in order to scale up the performance, but rather to simplify asynchronous code and avoid writing complicated state machines that had to handle all super-states possible in the program. In fact there was typically one CPU even in large servers. I'm, just curious why my answer is considered anti-helpful? – Diatonic 29/11, 2018 at 18:8

-7

The point is that the vast majority of programmers do not understand how to design a state machine. Being able to put everything in its own thread frees the programmer from having to think about how to efficiently represent the state of different in-progress computations so that they can be interrupted and later resumed.

As an example, consider video compression, a very cpu-intensive task. If you're using a gui tool, you probably want the interface to remain responsive (show progress, respond to cancel requests, window resizing, etc.). So you design your encoder software to process a large unit (one or more frames) at a time and run it in its own thread, separate from the UI.

Of course once you realize it would have been nice to be able to save the in-progress encoding state so you can close the program to reboot or play a resource-hungry game, you realize you should have learned how to design state machines from the beginning. Either that, or you decide to engineer a whole new problem of process-hibernation your OS so you can suspend and resume individual apps to disk...

Encephaloma answered 27/6, 2010 at 9:37 Comment(2)

Not (quite!) worth a -1, but seriously, that is about the most stupidly snide thing I've heard anybody say on this subject. I, for example, have no problems in implementing a state machine. None at all. I just don't like to use them when other tools are there that leave behind clearer and easier to maintain code. State machines have their places, and in those places they can't be matched. Interlacing CPU-intensive operations with GUI updates is not one of those places. At the very least coroutines are a better choice there, with threading being even better. – Steer 27/6, 2010 at 9:55

To everyone mod'ing my answer down, this is NOT an argument against using threads! If you can code a state machine that's great, and sure it often makes sense to run state machines in separate threads even if you don't have to. My comment was that often the choice to use threads is made primarily out of the desire to avoid designing state machines, which many programmers consider "too hard", rather than for any other benefit. – Encephaloma 28/6, 2010 at 20:52

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags