How many threads is too many?
Asked Answered
A

13

411

I am writing a server, and I send each action of into a separate thread when the request is received. I do this because almost every request makes a database query. I am using a threadpool library to cut down on construction/destruction of threads.

My question is: what is a good cutoff point for I/O threads like these? I know it would just be a rough estimate, but are we talking hundreds? Thousands?

How would I go about figuring out what this cutoff would be?


EDIT:

Thank you all for your responses, it seems like I am just going to have to test it to find out my thread count ceiling. The question is though: how do I know I've hit that ceiling? What exactly should I measure?

Ann answered 27/1, 2009 at 0:46 Comment(8)
@ryeguy: The entire point here is you should not be setting any maximum in the threadpool if there is no performance problems to start with. Most of the advice of limiting a threadpool to ~100 threads is ridiculous, most thread pools have /way/ more threads than that and never have a problem.Literally
ryeguy, see addition to my answer below re what to measure.Norwood
Don't forget that Python is by nature, not really multi-thread friendly. At any point of time, a single bytecode opcode is being executed. This is because Python employs Global Interpreter Lock.Koine
how do I know I've hit that ceiling? What exactly should I measure? @ryeguy: The moment you "hit that ceiling" is when the kernel starts randomly killing your processes (thus threads).Wendolynwendt
@Jay D: I'd say the moment you've hit the ceiling is when your performance starts to drop.Iconostasis
Based on experience, the answer is more than just memory and cpu constraints, it must also consider what the program is doing. I find that running Ubuntu, timing, such as when threads become active,becomes much more approximate as busy threads are added.Endocrine
@ASk, he needs threading not to increase CPU performance, but to avoid blocking everything while waiting for IO (a database response). Python threads should be okay for that.Tweedsmuir
@Literally "The entire point here is you should not be setting any maximum in the threadpool" Ummm...say what? Fixed-size thread pools have the benefits of graceful degradation and scalability. E.g. in a network setting, if you're spawning new threads based on client connections, without a fixed pool size you run the very real danger of learning (the hard way) just how many threads your server can handle, and every single connected client will suffer. A fixed-size pool acts like a pipe valve by disallowing your server from trying to bite off more than it can chew.Rogers
N
283

Some people would say that two threads is too many - I'm not quite in that camp :-)

Here's my advice: measure, don't guess. One suggestion is to make it configurable and initially set it to 100, then release your software to the wild and monitor what happens.

If your thread usage peaks at 3, then 100 is too much. If it remains at 100 for most of the day, bump it up to 200 and see what happens.

You could actually have your code itself monitor usage and adjust the configuration for the next time it starts but that's probably overkill.


For clarification and elaboration:

I'm not advocating rolling your own thread pooling subsystem, by all means use the one you have. But, since you were asking about a good cut-off point for threads, I assume your thread pool implementation has the ability to limit the maximum number of threads created (which is a good thing).

I've written thread and database connection pooling code and they have the following features (which I believe are essential for performance):

  • a minimum number of active threads.
  • a maximum number of threads.
  • shutting down threads that haven't been used for a while.

The first sets a baseline for minimum performance in terms of the thread pool client (this number of threads is always available for use). The second sets a restriction on resource usage by active threads. The third returns you to the baseline in quiet times so as to minimise resource use.

You need to balance the resource usage of having unused threads (A) against the resource usage of not having enough threads to do the work (B).

(A) is generally memory usage (stacks and so on) since a thread doing no work will not be using much of the CPU. (B) will generally be a delay in the processing of requests as they arrive as you need to wait for a thread to become available.

That's why you measure. As you state, the vast majority of your threads will be waiting for a response from the database so they won't be running. There are two factors that affect how many threads you should allow for.

The first is the number of DB connections available. This may be a hard limit unless you can increase it at the DBMS - I'm going to assume your DBMS can take an unlimited number of connections in this case (although you should ideally be measuring that as well).

Then, the number of threads you should have depend on your historical use. The minimum you should have running is the minimum number that you've ever had running + A%, with an absolute minimum of (for example, and make it configurable just like A) 5.

The maximum number of threads should be your historical maximum + B%.

You should also be monitoring for behaviour changes. If, for some reason, your usage goes to 100% of available for a significant time (so that it would affect the performance of clients), you should bump up the maximum allowed until it's once again B% higher.


In response to the "what exactly should I measure?" question:

What you should measure specifically is the maximum amount of threads in concurrent use (e.g., waiting on a return from the DB call) under load. Then add a safety factor of 10% for example (emphasised, since other posters seem to take my examples as fixed recommendations).

In addition, this should be done in the production environment for tuning. It's okay to get an estimate beforehand but you never know what production will throw your way (which is why all these things should be configurable at runtime). This is to catch a situation such as unexpected doubling of the client calls coming in.

Norwood answered 27/1, 2009 at 0:51 Comment(6)
If threads are spawned on incoming requests then thread-usage will mirror the number of unserviced requests. There's no way to determine the "optimal" number from this. Indeed you will find more threads cause more resource contention and thus the number of active threads will increase.Agnes
@Andrew, thread creation takes time, and you can determine the optimal number based on historical data [+ N%] (hence measure, don't guess). In addition, more threads only cause resource contention when they're doing work, not waiting on a signal/semaphore.Norwood
Where is this data on 'thread creation' causing a performance problem when using a thread pool? A good thread pool would not be creating and destroying threads in between tasks.Literally
@Pax If all your threads are waiting on the same semaphores to run DB queries then that's the very definition of contention. It's also not true to say threads don't cost anything if they're waiting on a semaphore.Agnes
@Andrew, I can't see why'd you'd semaphore-block the DB queries, any decent DB will allow concurrent access, with many threads waiting on the responses. And threads shouldn't cost any execution time while semaphore-blocked, they should sit in the blocked queue until the semaphore is released.Norwood
Great answer, thanks! For example, ThreadPoolExecutor supports all three mentioned configuration options, and changing them on the fly. Its Javadoc also mentions more configuration trade-offs (around queueing, load-shedding, thread reclamation, etc.).Jaimiejain
W
46

This question has been discussed quite thoroughly and I didn't get a chance to read all the responses. But here's few things to take into consideration while looking at the upper limit on number of simultaneous threads that can co-exist peacefully in a given system.

  1. Thread Stack Size : In Linux the default thread stack size is 8MB (you can use ulimit -a to find it out).
  2. Max Virtual memory that a given OS variant supports. Linux Kernel 2.4 supports a memory address space of 2 GB. with Kernel 2.6 , I a bit bigger (3GB )
  3. [1] shows the calculations for the max number of threads per given Max VM Supported. For 2.4 it turns out to be about 255 threads. for 2.6 the number is a bit larger.
  4. What kindda kernel scheduler you have . Comparing Linux 2.4 kernel scheduler with 2.6 , the later gives you a O(1) scheduling with no dependence upon number of tasks existing in a system while first one is more of a O(n). So also the SMP Capabilities of the kernel schedule also play a good role in max number of sustainable threads in a system.

Now you can tune your stack size to incorporate more threads but then you have to take into account the overheads of thread management(creation/destruction and scheduling). You can enforce CPU Affinity to a given process as well as to a given thread to tie them down to specific CPUs to avoid thread migration overheads between the CPUs and avoid cold cash issues.

Note that one can create thousands of threads at his/her wish , but when Linux runs out of VM it just randomly starts killing processes (thus threads). This is to keep the utility profile from being maxed out. (The utility function tells about system wide utility for a given amount of resources. With a constant resources in this case CPU Cycles and Memory, the utility curve flattens out with more and more number of tasks ).

I am sure windows kernel scheduler also does something of this sort to deal with over utilization of the resources

[1] http://adywicaksono.wordpress.com/2007/07/10/i-can-not-create-more-than-255-threads-on-linux-what-is-the-solutions/

Wendolynwendt answered 18/11, 2010 at 20:6 Comment(5)
Note that these virtual memory limits only apply to 32-bit systems. On 64 bits you won't run out of virtual memory.Brucite
@JanKanis, that's a good point, I remember seeing some analysis when the first 64bit mainframes arrived and someone had calculated that swapping the entire address space to disk would take a month or two (can't remember the exact time but it was something equally ridiculous).Norwood
@Norwood would be curious to read that . Any link to white paper etc ? ThanksWendolynwendt
@JayD: Never did find the whitepaper again (but see ibm.com/docs/en/cics-ts/… for more info). Current fastest SSD is Crucial T700, clocking in at about 12G/s sequential write. 64 bits is 16 exabytes, this is about 16 billion gigabytes so let's round up the speed to 16G/s, because I'm inherently lazy :-). That would be a billion or so seconds, which equates to 31+ years. I think those calculations are correct but feel free to correct me if not.Norwood
@JayD: That's actually much worse than the few months I remember. And, in any case, given current T700 costs (about AUD250/TB), that would set you back about four billion dollars, and that's without the cost of the cases needed to hold those drives. Though I suspect you may get a discount if you're buying that many :-)Norwood
A
20

If your threads are performing any kind of resource-intensive work (CPU/Disk) then you'll rarely see benefits beyond one or two, and too many will kill performance very quickly.

The 'best-case' is that your later threads will stall while the first ones complete, or some will have low-overhead blocks on resources with low contention. Worst-case is that you start thrashing the cache/disk/network and your overall throughput drops through the floor.

A good solution is to place requests in a pool that are then dispatched to worker threads from a thread-pool (and yes, avoiding continuous thread creation/destruction is a great first step).

The number of active threads in this pool can then be tweaked and scaled based on the findings of your profiling, the hardware you are running on, and other things that may be occurring on the machine.

Agnes answered 27/1, 2009 at 1:3 Comment(5)
Yes, and it should be used in conjunction with a queue or pool of requests.Agnes
@Andrew: Why? It should add a task to the thread pool each time it receives a request. It is up to the thread pool to allocate a thread for the task when there is one available.Literally
So what do you do when you have hundreds of requests coming in and are out of threads? Create more? Block? Return an error? Place your requests in a pool that can be as large as need be, and then feed these queued requests to your thread pool as threads become free.Agnes
"a number of threads are created to perform a number of tasks, which are usually organized in a queue. Typically, there are many more tasks than threads. As soon as a thread completes its task, it will request the next task from the queue until all tasks have been completed."Literally
@Andrew: I am not sure what python thread pool the OP is using, but if you want a real world example of this functionality I am describing: msdn.microsoft.com/en-us/library/…Literally
S
14

One thing you should keep in mind is that python (at least the C based version) uses what's called a global interpreter lock that can have a huge impact on performance on mult-core machines.

If you really need the most out of multithreaded python, you might want to consider using Jython or something.

Shelba answered 18/9, 2009 at 6:1 Comment(3)
After reading this, I tried running sieve of Eratosthenes tasks on three threads. Sure enough, it was actually 50% slower than running the same tasks in a single thread. Thanks for the heads up. I was running Eclipse Pydev on a virtual machine that was allocated two CPU's. Next, I'll try a scenario that involves some database calls.Gaptoothed
There are two (at least) types of tasks: CPU bound (e.g. image processing) and I/O bound (e.g. downloading from network). Obviously, GIL "problem" won't affect I/O bound tasks too much. If your tasks are CPU bound then you should consider multiprocessing instead of multithreading.Cosy
yes, python thread have improve if you have lot of network io.I change it to thread and got 10* faster than ordinary code...Luculent
H
9

As Pax rightly said, measure, don't guess. That what I did for DNSwitness and the results were suprising: the ideal number of threads was much higher than I thought, something like 15,000 threads to get the fastest results.

Of course, it depends on many things, that's why you must measure yourself.

Complete measures (in French only) in Combien de fils d'exécution ?.

Hushhush answered 27/1, 2009 at 8:42 Comment(5)
15,000? That's a tad higher than I would have expected as well. Still, if that's what you got, then that's what you got, I can't argue with that.Norwood
For this specific application, most threads are just waiting a response from the DNS server. So, the more parallelism, the better, in wall-clock time.Hushhush
I think that if you have that 15000 threads which are blocking on some external I/O then a better solution would be massively fewer threads but with an asynchronous model. I speak from experience here.Warily
@Warily I have an asynchronous system, but if using too few threads, it is more likely to hang due to internal implementation (networking, nio2, whatever)Feverish
@MladenAdamovic I agree that too few threads can be a problem, but the question at hand is how many threads is too many. The minimum number of threads can be decided by trial and error to get the best results.Warily
B
6

I've written a number of heavily multi-threaded apps. I generally allow the number of potential threads to be specified by a configuration file. When I've tuned for specific customers, I've set the number high enough that my utilization of the all the CPU cores was pretty high, but not so high that I ran into memory problems (these were 32-bit operating systems at the time).

Put differently, once you reach some bottleneck be it CPU, database throughput, disk throughput, etc, adding more threads won't increase the overall performance. But until you hit that point, add more threads!

Note that this assumes the system(s) in question are dedicated to your app, and you don't have to play nicely (avoid starving) other apps.

Budbudapest answered 8/6, 2011 at 18:12 Comment(1)
Can you mention some of the numbers you've seen for thread count? It'd be helpful to just get a sense of it. Thanks.Stereotomy
C
5

The "big iron" answer is generally one thread per limited resource -- processor (CPU bound), arm (I/O bound), etc -- but that only works if you can route the work to the correct thread for the resource to be accessed.

Where that's not possible, consider that you have fungible resources (CPUs) and non-fungible resources (arms). For CPUs it's not critical to assign each thread to a specific CPU (though it helps with cache management), but for arms, if you can't assign a thread to the arm, you get into queuing theory and what's optimal number to keep arms busy. Generally I'm thinking that if you can't route requests based on the arm used, then having 2-3 threads per arm is going to be about right.

A complication comes about when the unit of work passed to the thread doesn't execute a reasonably atomic unit of work. Eg, you may have the thread at one point access the disk, at another point wait on a network. This increases the number of "cracks" where additional threads can get in and do useful work, but it also increases the opportunity for additional threads to pollute each other's caches, etc, and bog the system down.

Of course, you must weigh all this against the "weight" of a thread. Unfortunately, most systems have very heavyweight threads (and what they call "lightweight threads" often aren't threads at all), so it's better to err on the low side.

What I've seen in practice is that very subtle differences can make an enormous difference in how many threads are optimal. In particular, cache issues and lock conflicts can greatly limit the amount of practical concurrency.

Citole answered 15/1, 2012 at 15:13 Comment(0)
H
3

One thing to consider is how many cores exist on the machine that will be executing the code. That represents a hard limit on how many threads can be proceeding at any given time. However, if, as in your case, threads are expected to be frequently waiting for a database to execute a query, you will probably want to tune your threads based on how many concurrent queries the database can process.

Horoscope answered 27/1, 2009 at 0:51 Comment(14)
um, no. The whole point of threads was (before multicore and multiple processors became prevalent) is to be able to mimic having multiple processors on a machine that has just one. That's how you get responsive user interfaces-- a main thread and ancillary threads.Marceau
@mmr: Um no. The idea of threads is to allow for blocking I/O and other tasks.Literally
The statement I made was that the number of cores on a machine represents a hard limit on the number of threads that can be doing work at a given time, which is a fact. Of course other threads can be waiting on I/O operations to complete, and for this question that is an important consideration.Horoscope
@ElectricDialect: What you are saying has no relevance to a thread pool.Literally
Anyway - you have GIL in Python, which makes threads only theoretically parallel. No more than 1 thread can run simultaneously, so it's only the responsiveness and blocking operations that matters.Harmonicon
@Rich B -- umm, no. Os/2 was sold as a multithreaded OS because you could have a responsive UI with threads. Blocking I/O is just one aspect of not having a responsive UI; basically, you just agreed with what I said, without having the history to understand it.Marceau
@mmr: I would never say what you said because so far you haven't said anything accurate.Literally
@Rich B-- um, what? Have you done any UI coding with serious, hard work done in the background? Blocking I/O is just one task, and there can be many tasks (as you said)-- having a responsive UI means threading when you do those tasks. As for OS/2: os2hq.com/os2faq.htmMarceau
@mmr: Your comments like your answer have nothing at all to do with the topic of thread pools. I am not sure what you don't understand here, but I assure I and most other people here are fully aware of what OS2 is. I must say I am a bit intrigued by the fact you keep referencing it though.Literally
@RichB-- Because I was pointing out the flaw of assuming that the number of threads should be linked to the number of cores (ie, that simultaneous execution is the only value to threads). And I maintain that a thread pool is not the right answer to this problem, but that's just my preference.Marceau
@mmr: If you don't think a thread pool is the answer to handling incoming server requests to external I/O, then I guess I have nothing left to say to you. That says it all.Literally
+1 For actually understanding how computers work. @mmr: You need to understand the difference between appears to have multiple processors, and does have multiple processors. @Rich B: A thread pool is just one of many ways to handle a collection of threads. It is a good one, but certainly not the only one.Fawnia
This is wrong. I had an app that required about 75 threads to fully load an 8-core processor. Why? Because of various blocking operations.Budbudapest
Limiting thread count to core count only makes sense when you're doing some intensive calculations (eg. compression). Which network communication isn't, that's mostly waiting on IO.Corey
M
3

I think this is a bit of a dodge to your question, but why not fork them into processes? My understanding of networking (from the hazy days of yore, I don't really code networks at all) was that each incoming connection can be handled as a separate process, because then if someone does something nasty in your process, it doesn't nuke the entire program.

Marceau answered 27/1, 2009 at 0:55 Comment(6)
For Python that's especially true, as multiple processes can run in parallel, while multiple threads - don't. The cost is however quite high. You have to start new Python interpreter each time, and connect to DB with each process (or use some pipes redirection, but it also comes at a price).Harmonicon
Switching between processes is - most of the time - more expensive than switching between threads (whole context switch instead of some registers). At the end it depends heavily on your threading-lib. As the questions revolved around threading, I assume that processes are out of question already.Gussiegussman
Fair enough. I'm not sure why that's why I'm getting a -2 ding to the score, though, unless people really want to see thread-only answers, rather than including other answers that work.Marceau
@mmr: Considering the question was about /thread/ pools, yes, I think people should be expecting an answer about threads.Literally
Process creation can be done once at startup (ie, a process pool instead of a thread pool). Amortized over the application duration, this may be small. They can't share info easily but it DOES buy them the possibility of running on multi-CPUs so this answer is useful. +1.Norwood
@Rich, I've answered many questions about raw JS with the comment "use jQuery" even when they specifically excluded jQuery as an option, the intent being to change their mind. The answer just has to be useful, not 100% relevant.Norwood
F
2

ryeguy, I am currently developing a similar application and my threads number is set to 15. Unfortunately if I increase it at 20, it crashes. So, yes, I think the best way to handle this is to measure whether or not your current configuration allows more or less than a number X of threads.

Fley answered 27/1, 2009 at 12:36 Comment(1)
Adding to your thread count should not randomly crash your app. There's some reason. You would do well to figure out the cause because it may effect you even with fewer threads in some circumstances, who knows.Budbudapest
P
0

My suggestion is to watch the System Load Average if you are on an OS that supports it. The reason is that it will tell you how overloaded (juggling tasks) the CPU is as opposed to how busy (cpu %) it is.

This is the difference of 1 task using 100% of a CPU (efficient) and 400 tasks using 100% of a CPU (degraded - all tasks suffer).

But why does this degrade the system? Largely because of context switching where the CPU must pause one of the threads / tasks to work on another. The switching time is small, but not free, so it accumulates small bits of time with an increased number of threads and eventually this results in a substantial amount of time dedicated to switching contexts.

You can generally measure this by timing tasks with a small number of threads and then increasing it. You'll almost always see the time increase logarithmically rather than linearly :)

For example:

  • 1 task can run in 50 milliseconds (baseline - 100%)
  • 4 concurrent tasks can run in 205 milliseconds (103% of the time)
  • 20 concurrent tasks can run in 1,100 milliseconds (110% of the time)
  • 200 concurrent tasks can run in 12,500 milliseconds (125% of the time)
Philosophize answered 5/10, 2023 at 13:59 Comment(0)
L
-6

In most cases you should allow the thread pool to handle this. If you post some code or give more details it might be easier to see if there is some reason the default behavior of the thread pool would not be best.

You can find more information on how this should work here: http://en.wikipedia.org/wiki/Thread_pool_pattern

Literally answered 27/1, 2009 at 0:53 Comment(1)
@Pax: This would not be the first time the majority of people didn't want to answer the question at hand (or understand it). I am not worried.Literally
A
-12

As many threads as the CPU cores is what I've heard very often.

Airiness answered 27/1, 2009 at 0:48 Comment(14)
@Rich, at least explain why:-). This rule-of-thumb only applies when all threads are CPU-bound; they get one 'CPU' each. When many of the threads are I/O bound, it's usually better to have many more threads than 'CPU's (CPU is quoted since it applies to physical threads of execution, eg cores).Norwood
@Pax: This is not at all how threading works. The number of CPUs has no bearing. Either you need a new thread or not.Literally
You can't run multiple threads on different CPUs/cores in parallel, not in Python! Global Interpreter Lock is there to block it!Harmonicon
@Abgan, I wasn't sure about that, thinking maybe Python would create "real" OS threads (run on multiple CPUs). If what you say is true (I have no reason to doubt), then CPU quantity has no bearing - threading is useful then only when most threads are waiting for something (eg DB I/O).Norwood
@Rich: when (real) threading, CPU count DOES have bearing since you can run multiple non-waiting threads truly concurrently. With one CPU, only one runs and the benefit accrues from having many other threads waiting for a non-CPU resource.Norwood
@Pax: You don't understand the concept of thread pools then I guess.Literally
@Rich, I understand thread pools fine; it appears I (and others here) also understand hardware better than you. With one CPU, only one execution thread can run, even if there's others waiting for a CPU. Two CPUs, two can run. Iff all threads are waiting for a CPU, ideal thread count is equal to...Norwood
...the number of CPUs. When threads are waiting for a non-CPU resource, ideal number is higher.Norwood
@Pax: That doesn't handle the problem of a queue for incoming data requests.Literally
@Rich, see my comment to your answer, 'nuff said.Norwood
I've heard the same thing. On one of many of Jeffrey Richter's interviews he states the same.Tupi
lol i got voted down to obvilion. Obviously I didnt know so correct me>Airiness
@masfenix: Of course you got downvoted. First of all, your answer makes absolutely /no/ sense in relation to thread pools. Second, your answer makes no sense even in the context of regular threading.Literally
The answer does makes sense in the context of splitting up a CPU-bound task, such as compiling a project with make -j8, but not for threads that are waiting for IO.Tweedsmuir

© 2022 - 2024 — McMap. All rights reserved.