Have there been any studies comparing OpenCL to OpenMP performance? Specifically I am interested in the overhead cost of launching threads with OpenCL, e.g., if one were to decompose the domain into a very large number of individual work items (each run by a thread doing a small job) versus heavier weight threads in OpenMP were the domain was decomposed into sub domains whose number equals the number of cores.
It seems that the OpenCL programming model is more targeted towards massively parallel chips (GPUs, for instance), rather than CPUs that have fewer but more powerful cores.
Can OpenCL be an effective replacement for OpenMP?