Intel TBB vs Boost
Asked Answered
K

2

34

I my new application I have flexibility to decide the use of library for multi-threading. So far I was using pthread. Now want to explore cross platform library. I zero in on TBB and Boost. I didn't understand what is the benefit of TBB over Boost. I am trying to find out advantage of TBB over Boost: TBB Excerpts for wiki "Instead the library abstracts access to the multiple processors by allowing the operations to be treated as "tasks", which are allocated to individual cores dynamically by the library's run-time engine, and by automating efficient use of the cache. A TBB program creates, synchronizes and destroys graphs of dependent tasks according to algorithms,"

but do threading library even need to worry about the allocation of threads to cores. Isn't this a job of operating system? So what is the real Benifit of using TBB over Boost?

Klingel answered 20/8, 2011 at 6:0 Comment(7)
You can set thread affinities using pthread as well (eg using the pthread_setaffinity_np call)Fastback
@Foo yes are right. my point is how much of practical use is of doing that. As a programmer you might not want to have a thread scheduling task done at your application. So why TBB shows it as a differentiation over other library?Klingel
there is an observable advantage to properly choosing cores for applications. Take the simple example of a threaded logger. One thread receives data from a network interfaces and puts it on a ring; the other thread reads from the ring and writes to a file (this helps relieve congestion that you see if you use tcpdump). In this case, on a dual-processor system, setting affinities on the same CPU is observably more efficient. If you have hyperthreading enabled, using the virtual core pair is much faster. However, this requires lots of micromanagement, which TBB doesn't require.Fastback
@Foo thanks for nice explanation. I agree there are certain cases in which you want to set the affinity of your thread. But how does TBB will do this management on your(programmers) behalf? In other words programmer have to direct TBB. The same can be done on boost/pthread also! Bigger question is what are the scenario when TBB does better thread scheduling than Operating system?Klingel
@Foo. Even if you use pthread_setaffinity_np() to properly set the affinity of the thread there is no guarantee that the thread will run in interleaving manner, espically in multiprogramming environment.. I will still say it can give you some benefit but have a cap on it.Klingel
You have to represent the problem using their primitives. It will manage the workload (if one thread is backed up, a free thread will steal tasks from the delayed thread).Fastback
@Klingel let us continue this discussion in chatFastback
V
37

but do threading library even need to worry about the allocation of threads to cores. isn't this a job of operating system? So what is the real Benifit of using TBB over Boost?

You are right, a threading library usually should not care about mapping threads to cores. And TBB does not. TBB operates with tasks, not threads. TBB's scheduler utilizes all cores by allocating a pool of threads and letting it dynamically select which tasks to run. This is the main advantage over Boost, with which you will need to map available work to threads manually. And then TBB offers high-level constructs such as parallel_for, parallel_pipeline, etc. that can be used to express most common parallel patterns, and hide all manipulation with tasks.

For example, let's take a piece of code that calculates points of Mandelbrot fractal (taken from http://warp.povusers.org/Mandelbrot/, variable initialization omitted):

for(unsigned y=0; y<ImageHeight; ++y)
{
    double c_im = MaxIm - y*Im_factor;
    for(unsigned x=0; x<ImageWidth; ++x)
    {
        double c_re = MinRe + x*Re_factor;

        double Z_re = c_re, Z_im = c_im;
        bool isInside = true;
        for(unsigned n=0; n<MaxIterations; ++n)
        {
            double Z_re2 = Z_re*Z_re, Z_im2 = Z_im*Z_im;
            if(Z_re2 + Z_im2 > 4)
            {
                isInside = false;
                break;
            }
            Z_im = 2*Z_re*Z_im + c_im;
            Z_re = Z_re2 - Z_im2 + c_re;
        }
        if(isInside) { putpixel(x, y); }
    }
}

Now to make it parallel with TBB, all you need is to convert the outermost loop into tbb::parallel_for (I use a C++11 lambda for brevity):

tbb::parallel_for(0, ImageHeight, [=](unsigned y)
{
    // the rest of code is exactly the same
    double c_im = MaxIm - y*Im_factor;
    for(unsigned x=0; x<ImageWidth; ++x)
    {
        ...
        // if putpixel() is not thread safe, a lock might be needed
        if(isInside) { putpixel(x, y); }
    }
});

TBB will automatically distribute all loop iterations over available cores (and you don't bother how many) and dynamically balance the load so that if some thread has more work to do, other threads don't just wait for it but help, maximizing CPU utilization. Try implementing it with raw threads, and you will feel the difference :)

Vipul answered 20/8, 2011 at 20:22 Comment(4)
@ Alexey thanks For a nice explanation. Now i see tbb looks promising. Can you please suggest some tutorial for tbb?Klingel
Can you tell me [=] what it captures?Stocker
@user: whatever from the calling context is used by the lambda function. [=] tells the compiler that it should capture by value everything it needs. For the code above, the capture set could consist of MaxIm, Im_factor, MinRe, Re_factor, ImageWidth, and MaxIterations, except if any of these are global variables.Vipul
@AlexeyKukanov Global variables or members of *this, I feel compelled to point out. Referring to class member only causes the this pointer to be captured. Probably not an issue with parallel_for and read-only access, of course.Nayarit
D
11

Intel TBB brings in it's own thread pool/scheduler and execution model (including stuff like parallel_for constructs) while Boost has only basic thread management functions (create threads and synchronization primitives, that's it.) Writing a good thread pool using Boost is possible, but difficult -- the TBB comes with a highly optimized thread pool already. So it totally depends on your requirements: If all you need is "portable pthreads", use Boost, if you need more, use Intel TBB.

Dasha answered 20/8, 2011 at 7:59 Comment(3)
thanks for the information boost::threadpool is meant for doing the threadpool stuff in Boost. So it does provide a way to have thread pool. The thing i don't know is how efficient/deficient is that compare to TBB?Klingel
boost::threadpool is inactive since quite some time and by far not as powerful as Intel TBB.Dasha
@Dasha : I assume he meant boost::thread_group, but you're correct -- Boost's class is just a thread container, and does not offer built-in scheduling like TBB does.Field

© 2022 - 2024 — McMap. All rights reserved.