openacc vs openmp & mpi differences ?
Asked Answered
C

4

28

I was wondering what are the major differences between openacc and openmp. What about MPI, cuda and opencl ? I understand the differences between openmp and mpi, especially the part about shared and distributed memory Do any of them allow for a hybrid gpu-cpu processing setup ?

Carricarriage answered 21/10, 2013 at 12:39 Comment(0)
N
42

OpenMP and OpenACC enable directive-based parallel programming.

OpenMP enables parallel programming on shared-memory computing platforms, as for example multi-core CPUs. It is very easy to use, since it is sufficient to tell the compiler some directives (code annotations, or pragmas) on how to extract the parallelism which triggers the synthesis of a parallel version of the input source code.

An example of OpenMP "Hello World" program with pragmas is the following

#include <omp.h>
#include <stdio.h>
#include <stdlib.h>

int main (int argc, char *argv[]) 
{
  int nthreads, tid;

  /* Fork a team of threads giving them their own copies of variables */
  #pragma omp parallel private(nthreads, tid)

  {
     /* Obtain thread number */
     tid = omp_get_thread_num();
     printf("Hello World from thread = %d\n", tid);

     /* Only master thread does this */
     if (tid == 0) 
     {
        nthreads = omp_get_num_threads();
        printf("Number of threads = %d\n", nthreads);
     }

  }  /* All threads join master thread and disband */

}

The source of the above code is OpenMP Exercise from where you will find many other examples. In this "Hello World" example, the master thread will output the number of involved threads, while each thread will print Hello World from thread = xxx.

OpenACC is a collection of compiler directives to specify parts of a C/C++ or Fortran code to be accelerated by an attached accelerator, as a GPU. It follows pretty much the same philosophy of OpenMP and enables creating high-level host+accelerator programs, again without the need of managing the accelerator programming language. For example, OpenACC will let you simply accelerate existing C/C++ codes without needing to learn CUDA (with some performance penalty, of course).

A typical OpenACC code will resemble the following

#pragma acc kernels loop gang(32), vector(16)
for (int j=1; j<n-1; j++)
{
#pragma acc loop gang(16), vector(32)
    for (int i=1; i<m-1; i++)
    {
       Anew[j][i] = 0.25f * (A[j][i+1] + A[j-1][i]);
       ...
    }
}    

The above source code is taken from the blog An OpenACC Example (Part 1), where you could find some more useful material to understand the difference between OpenMP and OpenACC.

Other sources are the following

How does the OpenACC API relate to the OpenMP API?.

OpenACC and OpenMP directives

Shane Cook, CUDA Programming, Morgan Kaufmann (Chapter 10)

Due to its very nature, OpenACC enables hybrid CPU+GPU programming. You can also mix OpenMP and OpenACC directives. For example, in a 4-GPU system, you can create 4 CPU threads to offload computing work to the 4 available GPUs. This is described in the Shane Cook book. However, it should be mentioned that OpenMP 4.0 foresees also directives for offloading work to attached accelerators, see

OpenMP Technical Report 1 on Directives for Attached Accelerators

Nitrobacteria answered 21/10, 2013 at 14:11 Comment(5)
So basically at this moment, OpenACC and OpenMP complement each other. I dont know much about OpenACC, but what i was lead to believe was that OpenACC can produce a program with cpu-gpu hybrid processing, but openMP cannot do that (being limited to only work with multicore machines)Carricarriage
@Carricarriage I have extended my answer. You are right that OpenACC enables hybrid CPU+GPU programming. Take also into account that OpenMP 4.0 foresees also directives to attached accelerators, see OpenMP Technical Report 1 on Directives for Attached Accelerators.Nitrobacteria
ah yes. thanks for the extension ! i get the hang of it now. am used to OpenMP and was thinking of working with CUDA, and chanced upon OpenACC.Carricarriage
How does the OpenACC API relate to the OpenMP API? is brokenPetroglyph
This is really outdated now. OpenMP also targets GPUs and similar devices.Mossback
D
-1

OpenAcc and OpenMPI enable directive based parallel computing. OpenMPI tries to take advantage of multiple CPU cores, OpenAcc tries to utilize the GPU cores.

MPI -- Message parsing Interface , is a programming model specification for Inter node and intra node communication in a cluster. The process of MPI program has a private adress space, which allows the program to run on a distributed memory space (cluster). Typically MPI is used in High performance Computing where communication protocols with high bandwidth and low latency (like Infiniband, etc.. ) are used.

With the recent development in parallel Computing technologies like CUDA and OpenMP, MPI has added features in its specification to take advantage of the parallel computing offered by cpu/gpu cores.

CUDA-Aware-MPI and/or Hybrid Programming models ( MPI + OpenMP) are already in use. This means that end application programmer can write the same MPI program without explicitly handling CUDA or OpenMP. This has reduced burden on the end user.

For Exammple without CUDA_aware-GPU , the code for MPI_Send an d MPI_Recv would be like

//MPI rank 0
cudaMemcpy(s_buf_h,s_buf_d,size,cudaMemcpyDeviceToHost);
MPI_Send(s_buf_h,size,MPI_CHAR,1,100,MPI_COMM_WORLD);

//MPI rank 1
MPI_Recv(r_buf_h,size,MPI_CHAR,0,100,MPI_COMM_WORLD, &status);
cudaMemcpy(r_buf_d,r_buf_h,size,cudaMemcpyHostToDevice);

but with CUDA_awre_MPI

//MPI rank 0
MPI_Send(s_buf_d,size,MPI_CHAR,1,100,MPI_COMM_WORLD);

//MPI rank n-1
MPI_Recv(r_buf_d,size,MPI_CHAR,0,100,MPI_COMM_WORLD, &status);

The MPI libararies will adress the concerns of converting host memory buffers to GPU buffers.

Disreputable answered 10/2, 2016 at 1:56 Comment(1)
This answer has many errors and horros: OpenMPI is an implementation of the MPI standard, which IS NOT directive based. Don't confuse OpenMP with OpenMPI. MPI has not introduced any specification in its standard to cope with GPUs: the CUDA-aware capabilities are choices of MPI implementations, not related to the MPI standard. I think you should refine your answer and gain deeper insight on the subject.Bibliotheca
M
-2

Firstly, I never programmed using OpenMP/MPI/OpenAcc/Cuda. The only API I know is OpenCL, so be careful with what I say below, it needs confirmation :p !

I am more comfortable with OpenCL but I think there is not much difference between Cuda and OpenCL in their compilation process : the compiler will inline the functions (i.e. kernels inside your C code). Then, in your OpenCL / Cuda program you can do CPU operation between two GPU tasks.

For them, there is several memory types :

  • global : read / write by the cpu and the gpu
  • local : read / write by the gpu only.
  • private : the memory of a simple core where all the variables declared inside a kernel are stored (gpu-core only)
  • constant : the memory used for constants definition (gpu-core only)

There would be more to say about it but you can easily find good guides about it on the net.

Then as their compilation is inline, you can do a GPU / CPU program. You can even use OpenMP with OpenCL in a same program, I dont see any problem to it.

Maintain answered 21/10, 2013 at 13:13 Comment(1)
There's a big difference between CUDA and OpenCL in that the former compiles device directives to machine code while the latter stores device directives as strings in the resulting binary, only converting them to device-specific machine code at runtime. Among other things, this means you don't get syntax checking of your OpenCL code until you try to run it.Yukoyukon
T
-2

Read about shared and distributed paradigms, your question can be answered at two grad level courses in more detail, I recommend attending TACC (Texas Advanced Computing Center) summer training if you are really interested in hands on learning

Tinfoil answered 18/5, 2018 at 21:16 Comment(2)
the question is quite old! At the moment I am already heavily involved working with TACC, Pegasus WMS, Cyverse and other supercomputing resources.Carricarriage
This answer is largely useless to most of the world, geographically speaking.Autotomy

© 2022 - 2024 — McMap. All rights reserved.