Thread Pool vs Many Individual Threads
Asked Answered
B

2

9

I'm in the middle of a problem where I am unable decide which solution to take.

The problem is a bit unique. Lets put it this way, i am receiving data from the network continuously (2 to 4 times per second). Now each data belongs to a different, lets say, group. Now, lets call these groups, group1, group2 and so on.

Each group has a dedicated job queue where data from the network is filtered and added to its corresponding group for processing.

At first I created a dedicated thread per group which would take data from the job queue, process it and then goes to blocking state (using Linked Blocking Queue).

But my senior suggested that i should use thread pools because this way threads wont get blocked and will be usable by other groups for processing.

But here is the thing, the data im getting is fast enough and the time a thread takes to process it is long enough for the thread to, possibly, not go into blocking mode. And this will also guarantee that data gets processed sequentially (job 1 gets done before job 2), which in pooling, very little chances are, might not happen.

My senior is also bent on the fact that pooling will also save us lots of memory because threads are POOLED (im thinking he really went for the word ;) ). While i dont agree to this because, i personally think, pooled or not each thread gets its own stack memory. Unless there is something in thread pools which i am not aware of.

One last thing, I always thought that pooling helps where jobs appear in a big number for short time. This makes sense because thread spawning would be a performance kill because of the time taken to init a thread is lot more than time spent on doing the job. So pooling helps a lot here.

But in my case group1, group2,...,groupN always remain alive. So if there is data or not they will still be there. So thread spawning is not the issue here.

My senior is not convinced and wants me to go with the pooling solution because its memory footprint is great.

So, which path to take?

Thank you.

Blockish answered 28/7, 2012 at 11:50 Comment(4)
In a situation like this there is only one answer. Cold hard facts! Prove which is better with proper controlled models/prototypes. You might both be surprised as each may be valid under differing varying circumstances.Highspirited
CYA memo required - detail your issues/concerns in an email, then use the pooled solution, as recommended by your senior. If there are problems with the pooled design, you are shielded:). Looking at your requirement, I would probably go with a dedicated thread per group - the threads are only created at startup and a thread is easier to debug than an asynch state-engine that has to be continually issued to the pool when something needs doing. If an asynch approach is not used with the threadpool, then any blocking calls will block a pool thread, forcing the system to make more threads:(Blanca
If your app creates one thread per group, and each thread takes data from its own queue and runs infinitely, you effectively have a fixed thread pool. Creating a "real" thread pool with the same number of threads wouldn't change much. What you could gain by using a "real" thread pool is to asign only a small number of threads to pick data from all the queues, and the thread pool could grow and shrink as needed if too many tasks are waiting in the thread pool queue, or if too many threads are idle.Amaty
When using java's ThreadPool, you are going to have a real problem when attempting to sort your submitted tasks based on priority.Doralin
M
6

Good question. Pooling indeed saves you initialization time, as you said. But it has another aspect: resource management. And here I am asking you this- just how many groups (read- dedicated threads) do you have? do they grow dynamically during the execution span of the application?

For example, consider a situation where the answer to this question is yes. new Groups types are added dynamically. In this case, you might not want to dedicate a a thread to each one since there is technically no restrictions on the amount of groups that will be created, you will create a lot of threads and the system will be context switching instead of doing real work. Threadpooling to the rescue- thread pool allows you to specify a restriction on the maxumal number of threads that could be possibly created, with no regard to load. So the application may deny service from certain requests, but the ones that get through are handled properly, without critically depleting the system resources.

Considering the above, I is very possible that in your case, it is very much OK to have a dedicated thread for each group!

The same goes for your senior's conviction that it will save memory.. Indeed, a thread takes up memory on the heap, but is it really so much, if it is a predefined amount, say 5. Even 10- it is probably OK. Anyway, you should not use pooling unless you are a-priory and absolutely convinced that you actually have a problem!

Pooling is a design decision, not an architectural one. You can not-pool at the beggining and proceed with optimizations in case you find pooling to be beneficial after you encountered a performance issue.

Considering the serialization of requests (in order execution) it is no matter whether you are using a threadpool or a dedicated thread. The sequential execution is a property of the queue coupled with a single handler thread.

Murine answered 28/7, 2012 at 12:11 Comment(10)
Thanks for the reply. As for dynamic group creation, no groups are not created dynamically. But there are a lot of groups, about 15-20. They are expected to increase but they will be created and setup in the beginning. Now execution sequence is an issue where pooling might prove rarely troublesome. But you havent told me clearly that threads in a pool take less memory than the ones without pooling. Thanks.Blockish
Threads in a pool are exactly the same as any other thread. They consume the same amount of memory. The advantage of using a pool is that it can avoid constant thread creations and deletions, and that it can keep the number of concurrent threads to a fixed, or reasonable number.Amaty
As @JBNizet said, threads in a threadpool are the same threads. There is just another mechanism that is responsible for their lifetime. Threads in a threadpool do not take less memory than regular threads. Consider the following: Since you have relatively many groups, I can cautiously conclude that you might be better off with a threadpool: as I mentioned, from a resource management perspective, if you have much more threads than cores to run them (and your logic is mostly CPU bound) you will spend to much time in ctx switching. Once again, thorough performance testing is required!Murine
Thank you so much for getting back. Could you do me one favour and please explain how context switching works in pooling and non-pooling scenarios. I am well aware of this concept but don't know how it is applied in both cases differently.Blockish
Why context switching works the same in both cases! The thread pool may reduce context switching by reducing the number of threads. Consider 500 threads running on 2 cores. in this case the system will make a lot of context switching because each thread needs to get its quantum in the CPU. A context switch costs you time. Multiply it by about 500 and you got yourself a performance issue. to mitigate this scenario, applications use the thread pool pattern. Instead of uncontrollably creating threads, you limit the amount of threads that can be spawned. Conituened in next comment..Murine
Limiting the amount of threads limits the amount of context switching. This way the CPU time can be maximally utilized. By giving good service to requests that still have threads to handle them and postponing other requests, but saving the time that would otherwise be spent in context switching.Murine
With the type of app outlined by the OP, I suspect that the threads are going to be mostly I/O bound. That, and even with threads that are CPU-bound AND use so much diferent data that a considerable cache-flush is required on context-switch, you cannot just multiply up the cache-switch overhead by the number of threads! The 500-[number-of-cores] threads that are not scheduled do not incur any overhead!Blanca
Thank you everyone for clearing up my confusions and for teaching me new stuff that i wouldnt have learned without discussion. I ll probably go with pools because, yes, i cant, and shouldnt, spawn a lot many threads. Though it will save me ctxt swtchng but Martin is right about "different data". Data belongs to different groups and works differently so if pooling saves ctxt swtchng the "cache-flush" overhead remains. But it will happen fewer times than ctxt swtch. That leaves me with out of sequence execution issue, but i ll find a work around. Unless someone has already has a good suggestion?Blockish
@MartinJames can you elaborate please, I am not sure I understand your claim. Thanks.Murine
@Blockish BTW, don't forget to accept the answer that suited you most. Be a good SO denizen :-)Murine
T
4

Creating a thread will consume resources, including the default stack per thread (IIR 512Kb, but configurable). So the advantage to pooling is that you incur a limited resource hit. Of course you need to size your pool according to the work that you have to perform.

For your particular problem, I think the key is to actually measure performance/thread usage etc. in each scenario. Unless your running into constraints I perhaps wouldn't worry either way, other than to make sure that you can swap one implementation for another without a major impact on your application. Remember that premature optimisation is the root of all evil. Note that:

"Premature optimization" is a phrase used to describe a situation where a programmer lets performance considerations affect the design of a piece of code. This can result in a design that is not as clean as it could have been or code that is incorrect, because the code is complicated by the optimization and the programmer is distracted by optimizing.

Themistocles answered 28/7, 2012 at 11:57 Comment(2)
So it is right that threads in a pool use less memory than individual ones? Also pooling implementation caused another issue, sequence skip, some times a thread started earlier takes more time resulting in another thread which started later to finishes first. Whats the way around?Blockish
Sequence numbers and a 'resequencer' class/thread that keeps a list/queue of out-of-order outputs until all earlier ones have come in.Blanca

© 2022 - 2024 — McMap. All rights reserved.