What are the recommended C++ parallelization libraries for large data processing
Asked Answered
B

3

5

Can some one recommend approaches to parallelize in C++, when the data to be acted up on is huge. I have been reading about openMP and Intel's TBB for parallelization in C++, but have not experimented with them yet. Which of these is better for parallel data processing ? Any other libraries/ approaches ?

Benco answered 4/10, 2010 at 15:41 Comment(5)
You might consider CUDA/GPUs if the data is of the right type.Perfumery
Not ncessarily: GPU computing shines when you have lots of computation to relative, with relatively little data I/O as the cost of transferring data to the GPU can be high.Ethanethane
@Dirk: it also only work well when you make the same operation on all data, if each unit of data should follow a logic of its own, it won't work.Castrato
I would also suggest looking into Fast Flow: calvados.di.unipi.it/dokuwiki/doku.php?id=ffnamespace:about it's similar to TBB and great for pipelining.Castrato
Right now I'm looking into MongoDb for a similar project and so far it looks promising: it supports sharding (so you don't have to worry [much] about scaling, load balancing, failover) and it does map/reduce 'out of the box' (although it will only parallelize jobs only if you use sharding and 1 thread/shard).Antiproton
A
6

"large" and "data processing" cover a lot of ground here, and it's hard to give a sensible answer without more information.

If the data processing is "embarrassingly parallel" -- if it involves doing lots and lots of calculations that are completely independant of each other -- then there's a million things that will work and it's just a matter of finding something that matches your code and background.

If it isn't embarrasingly parallel, but nearly so - the computations take a big chunk of data but just distill it into a handfull of numbers - there's fewer, but still lots of options.

If the calculation is more tightly coupled than this - where you need the processors to work on tandem on big chunks of data then you're probably stuck with the standbys - the OpenMP features of your compiler if it will work on a single machine (there's TBB, too, but usually for number crunching OpenMP is faster and easier) or MPI if it needs several machines simultaneously. You mentioned C++; Boost has a very nice MPI layer.

But thinking about which library to use for parallelization is probably thinking about the wrong end of the problem first. In many cases, you don't necessarily need to deal with these layers directly. If the number crunching involves lots of linear algebra (for instance), then PLASMA (for multicore machines - http://icl.cs.utk.edu/plasma/ ) or PetSC, which has support for distributed memory machines, eg, multiple computers ( http://www.mcs.anl.gov/petsc/petsc-as/ ) are good choices, which can completely hide the actual details of the parallel implementation from you. Other sorts of techniques have other libraries, too. It's probably best to think about what sort of analysis you need to do, and look to see if existing toolkits have the amount of parallization you need. Only once you've determined the answer is no should you start to worry about how to roll your own.

Azarcon answered 4/10, 2010 at 17:19 Comment(0)
E
5

Both OpenMP and Intel TBB are for local use as they help in writing multithreaded applications.

If you have truly huge datasets, you may need to split load over several machines -- and then libraries like Open MPI for parallel programming with MPI come into play. Open MPI has a C++ interface, but you now also face a networking component and some administrative issues you do not have with a single computer.

Ethanethane answered 4/10, 2010 at 15:47 Comment(1)
I am experimenting with openMP as a first step on a single machine, would try MPI on multiple machines going forward.Benco
Y
3

MPI is also useful on a single local machine. It will run a job across multiple cores/CPUs, while this is probably overkill compared to threading it does mean you can move the job to a cluster with no changes. Most MPI implementations also optimize a local job to use shared memory instead of TCP for data connections.

Yockey answered 4/10, 2010 at 16:9 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.