Homegrown workqueue vs Intel TBB
Asked Answered
T

5

8

We are considering which parallel framework for C/C++ to use. We have some very special conditions and are not 100% sure, that e.g. TBB can add something "more".

  • There are N running threads and one synchronized workqueue (using pthread mutex).
  • Our jobs are prioritized (int).
  • Jobs are put into the queue and idle thread takes a job with the highest priority.

This repeats until the queue is empty.

Well, and now, I'd like to know if some framework like TBB (Thread Building Blocks) can offer more for this special scenario from the algorithmic point of view?? (So, internals...)

Tinware answered 5/12, 2011 at 14:17 Comment(7)
Do you really need an int to specify the priority? The only times I've ever needed a priority queue there has been only a very few priorities, (like 3!). Restricting the range of priorities to a small number makes priority queue design much easier.Baumgartner
Yes, we could use radix. But unfortunately we do need more than 3 prios but that's not the problem.Tinware
Radix 10? 10 is OK - in Priority Queue ctor, make an array of 10 queues. When a producer pushes, lock the priority queue, add the job to the private queue-array addressed by priority, unlock the priority queue and signal a semaphore. When a consumer turns up, wait on the semaphore, then lock the priority queue and iterate the private queue-array from the highest prio. end, looking for a non-empty queue. When the consumer finds a non-empty queue in the array, it can pop the job from it, unlock the priority queue and process it.Baumgartner
@Martin: We have already implemented all of that. That's not the question, but thank you.Tinware
in that case, it is difficult to see what aspect of your scenario, not already met by your current design, where TBB or any other framework could offer much improvement. Have you identified any particular bottleneck?Baumgartner
@Martin: That's the question, we don't know if we can do better.Tinware
let us continue this discussion in chatTinware
P
3

In my opinion you could gain by replacing the heavy mutex with something more robust, like the spin_rw_mutex: http://threadingbuildingblocks.org/files/documentation/a00163.html. Since most likely insertion/removal operations are fast, you could benefit more from a non-blocking lock.

Porphyrin answered 13/12, 2011 at 17:59 Comment(2)
But when the time spent in kernel "doing (un)locking" is negligible compared with the real processing time than TBB can't offer anything more, even theoretically, right?Tinware
It still depends here if the lock is blocking or not, i.e. if the thread waiting to acquire it is put to sleep. In this case, if the operation to perform inside the lock is very small, it can lead to performance degradation because a context switch is too slow. The tbb lock I mentioned uses busy spinning, so the thread does not give up the CPU while waiting for the lock.Porphyrin
C
5

TBB 4 provides a concurrent_priority_queue (search 'priority' in the reference manual). Furthermore, using TBB is nice if you can design your program with tasks, not threads, in mind. Indeed, it provides a lot of stuff to describe dependancies between tasks. Also, TBB seems to be fairly portable if it's important for you.

Cribb answered 13/12, 2011 at 13:0 Comment(0)
P
3

In my opinion you could gain by replacing the heavy mutex with something more robust, like the spin_rw_mutex: http://threadingbuildingblocks.org/files/documentation/a00163.html. Since most likely insertion/removal operations are fast, you could benefit more from a non-blocking lock.

Porphyrin answered 13/12, 2011 at 17:59 Comment(2)
But when the time spent in kernel "doing (un)locking" is negligible compared with the real processing time than TBB can't offer anything more, even theoretically, right?Tinware
It still depends here if the lock is blocking or not, i.e. if the thread waiting to acquire it is put to sleep. In this case, if the operation to perform inside the lock is very small, it can lead to performance degradation because a context switch is too slow. The tbb lock I mentioned uses busy spinning, so the thread does not give up the CPU while waiting for the lock.Porphyrin
V
0

I'd suggest taking a look at the TBB module summary pages and see if there is anything there that is useful to you.

For example, under the Containers section, there is not a concurrent_queue< T, A >, "A high-performance thread-safe non-blocking concurrent queue." It isn't a priority queue, so you'd have to build that yourself anyway.

On the other hand, under Synchronization, there are a few mutex variants that might make your life easier.

Bottom line: TBB isn't really all that magic, but it may be helpful.

Viscacha answered 5/12, 2011 at 17:4 Comment(0)
I
0

TBB can provide you:

  • Lock-free concurrent_queue. I don't remember anything about priority queue, but as James suggested in comments, you can build it yourself over cocurrent_queue.
  • Memory allocators tuned for performance in multithreaded environment.
  • Synchronization primitives like atomic variables
  • just good ideas how multithreading stuff can be effectively implemented

Please note that even doing everything right and intensively using TBB it's possible you won't notice any performance gain in comparison with your own impelemntation. It highly depends on your system, especially if interthread communication and especially synchronization is a bottleneck. It is usually if your tasks are small and there're lots of them.

Incomprehension answered 13/12, 2011 at 11:53 Comment(0)
S
0

I have worked with a parallelization frameworks including OpenMP, Cilk. That gives a nice abstraction and make parallelizing comparatively easy. However, I doubt if these support priority queues directly or even if you can modify their task Queue.

you can do priority based queue using this customized TaskQueue.

If you want to use TBB, I have used it with OpenMP and it seems to blend in very nicely. Plus, you don't need to worry about concurrent containers. And it is very reliable compared to other implementations available by people.

Hope this helps.

Sidonius answered 17/12, 2011 at 16:37 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.