TBB for a workload that keeps changing?

Asked 14/5, 2012 at 23:50 Answered 15/5, 2012 at 23:2

I'm sorry I don't seem to get intel's TBB it seems great & supported but I can't wrap my head around how to use it since I guess I'm not used to thinking of parallelism in terms of tasks but instead saw it as threads.

My current workload has a job that sends work to a queue to keep processing(think of a recursion but instead of calling itself it sends work to a queue). The way I got this working in Java was to create a concurrent queue(non-blocking queue) and threadpoolexecutor that worked the queue/send work back to it. But now I'm trying to do something similar in c++, I found TBB can create pools but its approach is very different(Java threads seem to just keep working as long as their is work in the queue but TBB seems to break the task down at the beginning).

Here's a simple Java example of what I do(before this I set how many threads I want,etc..):

static class DoWork implements Callable<Void> {
    // queue with contexts to process
    private Queue<int> contexts;

    DoWork(Context request) {
        contexts = new ArrayDeque<int>();
        contexts.add(request);
    }

    public Void call() {
        while(!contexts.isEmpty()) {
            //do work 
            contexts.add(new int(data)); //if needs to be send back to the queue to do more work
        }
    }
}

I sure its possible to do this in TBB, but I'm just not sure how because it seems to break up my work at the time I send it. So if it there's 2 items in the queue it may only launch 2 threads but won't grow as more work comes in(even if I have 8 cores).

Can someone help me understand how to achieve my tasks and also maybe suggest a better way to think about TBB coming from using Java's threading environment(also I have no allegiance to TBB, so if there's something easier/better then I'm happy to learn it. I just don't like c++ threadpool because it doesn't seem actively developed)?

Proa answered 14/5, 2012 at 23:50 Comment(10)

What is the Context data type? Was the class originally DoWork<Context>? – Fred 15/5, 2012 at 0:1

@Fred for simplicity, I just made it an int in the example. when I really do it, its a int and a list. sorry should have just turned it int in the example. – Proa 15/5, 2012 at 0:4

Here'a STL translation of your code: codepad.org/vs4S1UtB . You should go from there. – Fred 15/5, 2012 at 0:13

@Fred Thanks so much. I really sorry, I know how to setup the queue, my question was more about having the threads process it using a pool as the size of the queue is dynamically changing.. – Proa 15/5, 2012 at 0:15

threadingbuildingblocks.org/codesamples.php#concurrent_queue – Fred 15/5, 2012 at 0:18

Also take a look at parallel_while. Semantics differ a bit from STL, but you'll only have to change at most 3-4 lines from my translation for the queue and while. – Fred 15/5, 2012 at 0:22

@Fred I saw that and have been looking at the scalable_allocator function, I am just having trouble wraping my head how to apply it. Typically in my java program, I add something to the queue then send it to process and it creates/destroys threads as needed. I just don't see how to keep that continuous processing occur with TBB – Proa 15/5, 2012 at 0:35

So is the private queue being used in a manner that doesn't require explicit mutexing / rw-locking? If so, parallel_while should automate away the locking and the thread pooling. (Note: these are naive assumptions, for I haven't read the tbb code itself). – Fred 15/5, 2012 at 0:40

@Fred yes your right, its just a queue with no mutex/locking, just write data in and pop data out. I'll play with parallel_preorder(its the example with parallel_while), rant/ I'm just finding it hard to think of this as I do with java threading(you know, it grows/shrinks on demand and just keep sending it data..I'm read tbb is easier but it abstracts so much that I feel I'm using a cookie cutter and need to shape my problem to fit their samples) /rant – Proa 15/5, 2012 at 0:51

have you looked at task_group? – Legume 15/5, 2012 at 2:3

The approach based on having a queue of items for parallel processing, where each thread just pops one item from the queue and proceeds (and possibly adds a new item to the end of the queue at some point) is fundamentally wrong since it limits parallelism of the application. The queue becomes a single point of synchronization, and threads need to wait in order to get access to the next item to process. In practice this approach works when tasks (each items' processing job) are quite large and take different times to complete, allowing queue to be less contended as opposed to when (most of the) threads finish at the same time and come to the queue for their next items to process.

If you're writing a somewhat reusable piece of code you can not guarantee that tasks are either large enough or that they vary in size (time to execute).

I assume that you application scales, which means that you start with some significant number of items (much larger than the number of threads) in your queue and while threads do the processing they add enough tasks to the end, so that there's enough job for everyone until application finishes.

If that's the case I would rather suggested that you kept two thread-safe vectors of your items (TBB's concurrent_vectors for instance) for interchangeability. You start with one vector (your initial set of items) and you enque() a task (I think it's described somewhere in chapter 12 of the TBB reference manual), which executes a parallel_for over the initial vector of items. While the first batch is being processed you would push_back the new items onto the second concurrent_vector and when you're done with the first one you enque() a task with a parallel_for over the second vector and start pushing new items back into the first one. You can try and overlap parallel processing of items better by having three vectors instead of two and gradually moving between them while there's still enough work for all thread to be kept busy.

Stravinsky answered 15/5, 2012 at 21:0 Comment(0)

What you're trying to do is exactly the sort of thing TBB's parallel_do is designed for. The "body" invoked by parallel_do is passed a "feeder" argument which you can do feeder.add(...some new task...) on while processing tasks to create new tasks for execution before the current parallel_do completes.

Incinerate answered 15/5, 2012 at 23:2 Comment(0)

Recommended topics

Hot tags