Cilk or Cilk++ or OpenMP
Asked Answered
A

2

5

I'm creating a multi-threaded application in Linux. here is the scenario:

Suppose I am having x instance of a class BloomFilter and I have some y GB of data(greater than memory available). I need to test membership for this y GB of data in each of the bloom filter instance. It is pretty much clear that parallel programming will help to speed up the task moreover since I am only reading the data so it can be shared across all processes or threads.

Now I am confused about which one to use Cilk, Cilk++ or OpenMP(which one is better). Also I am confused about which one to go for Multithreading or Multiprocessing

Ahouh answered 8/6, 2012 at 16:37 Comment(0)
K
5

Cilk Plus is the current implementation of Cilk by Intel. They both are multithreaded environment, i.e., multiple threads are spawned during execution.

If you are new to parallel programming probably OpenMP is better for you since it allows an easier parallelization of already developed sequential code. Do you already have a sequential version of your code?

OpenMP uses pragma to instruct the compiler which portions of the code has to run in parallel. If I understand your problem correctly you probably need something like this:

   #pragma omp parallel for firstprivate(array_of_bloom_filters)
   for i in DATA:
      check(i,array_of_bloom_filters);

the instances of different bloom filters are replicated in every thread in order to avoid contention while data is shared among thread.

update: The paper actually consider an application which is very unbalanced, i.e., different taks (allocated on different thread) may incur in very different workload. Citing the paper that you mentioned "a highly unbalanced task graph that challenges scheduling, load balancing, termination detection, and task coarsening strategies". Consider that in order to balance computation among threads it is necessary to reduce the task size and therefore increase the time spent in synchronizations. In other words, good load balancing comes always at a cost. The description of your problem is not very detailed but it seems to me that the problem you have is quite balanced. If this is not the case then go for Cilk, its work stealing approach its probably the best solution for unbalanced workloads.

Kelleykelli answered 9/6, 2012 at 16:7 Comment(3)
what I am looking for is efficiency so which one is better in that case. I referenced this paper (www.cs.unc.edu/~prins/RecentPubs/ijpp10.pdf) and it has observation that OpenMP is slow. So i chose cilk plus. Do you have any comments on this.Ahouh
Yes, I also think that the problem which I am working on is fairly balanced. Correct me if I am wrong: What you are saying is that for a fairly balanced process OpenMP and cilk will give similar results. That is we can choose to work with any of the given packages as far as performance is concerned.Ahouh
Yes, if used correctly they should give similar performance. However, if you are reading data from the disk, I/O might probably be the bottleneck of the computation. If this is the case you might want to try a framework such as Hadoop which is explicitly conceived to handle streaming computations over big data.Kelleykelli
U
2

At the time this was posted, Intel was putting a lot of effort into boosting Cilk(tm) Plus; more recently, some effort has been diverted toward OpenMP 4.0. It's difficult in general to contrast OpenMP with Cilk(tm) Plus.
If it's not possible to distribute work evenly across threads, one would likely set schedule(runtime) in an OpenMP version, and then at run time try various values of environment variable, such as OMP_SCHEDULE=guided, OMP_SCHEDULE=dynamic,2 or OMP_SCHEDULE=auto. Those are the closest OpenMP analogies to the way Cilk(tm) Plus work stealing works. Some sparse matrix functions in Intel MKL library do actually scan the job first and determine how much to allocate to each thread so as to balance work. For this method to be useful, the time spent in serial scanning and allocating has to be of lower order than the time spent in parallel work. Work-stealing, or dynamic scheduling, may lose much of the potential advantage of OpenMP in promoting cache locality by pinning threads with cache locality e.g. by OMP_PROC_BIND=close. Poor cache locality becomes a bigger issue on a NUMA architecture where it may lead to significant time spent on remote memory access. Both OpenMP and Cilk(tm) Plus have facilities for switching between serial and parallel execution.

Urushiol answered 20/1, 2014 at 14:8 Comment(1)
This answer seems to me a little OT (and badly formatted). You may want to have a look here to have some hints on how to write an answer with better possibilities of being read (and upvoted).Loquacious

© 2022 - 2024 — McMap. All rights reserved.