OpenCL vs Intel Cilk Plus Issues, Differences and Capabilities

OpenCL vs Intel Cilk Plus Issues, Differences and Capabilities


I  am curious as to the differences between OpenCL and Intel Cilk Plus. They are both parallel programming paradigms that are receiving wide recognition but technically speaking is one better than the other or are they simply different. Also what yardstick do I use when choosing between the two when solving an embarrassingly parallel problem. Please i need answers.

Thanks!

Yaknan

9 posts / 0 new

Hi Yaknan,

Typically, you should use OpenCL to program GPU device and Intel Cilk Plus to program CPU device, so they are complementary. Intel(R) Core M processor contains both CPU and GPU devices (they are both integrated on the same die), so you can use Intel Cilk Plus to program for the CPU part and OpenCL C to program the GPU part. That way you achieve the best possible performance. Note that you can use OpenCL to program the CPU device as well, but then you will need to optimize OpenCL C separately for CPU device and for the GPU device, since different optimizations apply.

Hope that helps!


Hi Robert,
Thanks for the reply! The thing is I did a research on graphic rendering an embarrassingly parallel problem using OpenCL and intel cilk plus. The OpenCL version of the same problem out performed the intel Cilk plus version which I ran on 4 different specs of PCs with intel and amd processors. So, this made me curious as to why intel cilk when OpenCL performs better and works on both CPUs and GPUs.


Yaknan,

Did you ran OpenCL version that outperformed Cilk Plus code on the GPU device? Which GPU device? The idea is to use Intel(R) Cilk(TM) Plus to fully load the CPU and simultaneously use OpenCL to fully load the GPU. I doubt that OpenCL code running on the CPU device will outperform Cilk Plus, but you could persuade me otherwise with data :) The best performance is achieved if you are able to utilize the full platform, both CPU and GPU at the same time.


Yaknan,

One more thing: take a look at my article GPU-Quicksort in OpenCL 2.0 available here: https://software.intel.com/en-us/articles/gpu-quicksort-in-opencl-20-usi... . Compare that implementation, which by the way is indeed slightly faster than the following Cilk Plus based version when running on Intel Core M Processor Graphics, with the following Cilk Plus quicksort for the CPU. Note, that Cilk Plus version is much simpler. That is the advantage of Cilk Plus - a lot of times you get very good performance for very little work :)

 

#define CUTOFF 32
template <class T>
void parallel_qsort_with_cutoff(T* begin, T* end)
{    
       if (begin != end) {
              --end; // Exclude last element (pivot)
              T* middle = std::partition(begin, end, std::bind2nd(std::less<T>(),*end));        

              std::swap(*end,*middle); // pivot to middle
              if (CUTOFF < middle - begin) {
                     _Cilk_spawn parallel_qsort_with_cutoff(begin, middle);
              } else {
                     std::sort(begin, middle);
              }
              if (CUTOFF < end - middle) {
                     parallel_qsort_with_cutoff(++middle, ++end); // Exclude pivot
              } else {
                     std::sort(++middle, ++end);
              }
              _Cilk_sync;
       }
}


Hi Robert,

Sorry for the late reply. I went through the links you provided and I understand your point. I carried out a comparative research on OpenCL and Intel Cilk plus (Intel Parallel Studio Composer) on a raytracing problem initially designed by Jacco Bikker and remodified by Gary Deng in his Masters' thesis. You can see my earlier post on the subject here the cpu-only setup shows that OpenCL starts slowly but gets faster as the rendering depth increases for several runs especially after the first run when all the compute devices are setup. I feel this is due to OpenCl's use of kernel codes. The intel cilk version of the program was slower as the computing demands increased. Race conditions were checked although I cannot absolutely say that all races were catered for. 

Please find attached visual studio solution files of both the OpenCL and Cilk plus implementations and see for yourself.

Thanks

Attachments: 


In the document "Using Intel C++ with Intel Processor Graphics.pdf" it talks about Processor Graphics Offload.
Can you comment on how this fits into things. i.e. OpenCL vs Cilk with GPU offload.

Is Cilk Intels response to HSA? Will it be possible to have cilk generate HSAIL so it supports AMD APU graphics Offload.

 


OpenCL* is supported across Windows Desktop, Android, Linux and MacOSX and across both CPU and Intel Graphics. OpenCL* is best used for writing C/C++ portable code across the 4 operating systems for visual computing applications and is extended with API extensions and image support, including interoperability with Intel’s graphics and media. It is used in applications programming targeting Intel Xeon Phi coprocessor as well although OpenMP* 4.0 is the preferred programming solution for HPC applications using Intel Xeon Phi coprocessor.

Intel® Cilk Plus is part of Intel's programming tools that target Windows, Android and Linux and extend the operating systems software stack and tools where needed by programmers. Intel tools are targeting both CPU and Intel Graphics and other processing elements on the SOC as well as Intel Xeon Phi coprocessor.

Bottom line: it is really up to you to select the right tool for the job - both OpenCL* and Intel(R) Cilk Plus enable you to take advantage of Intel Processor Graphics and get performance out of Intel hardware.

Intel(R) Cilk Plus is not Intel's response to HSA: Intel does not plan to generate HSAIL: Intel does not endorse the HSA Foundation nor does it plan to join the HSA Foundation. The HSA Foundation lacks participation of major OS vendors such as Apple, Google and Microsoft. Even if the OS vendors will support HSA, it will remain AMD specific.

Intel advocates using OpenCL* and SPIR for cross platform and cross-os portable use of GPU hardware, and/or using OS vendor specific provided solutions (like Apple OpenCL* for MacOS, Microsoft DirectX Compute for client windows systems, and Google Renderscript for Android devices). These standard solutions are enabled for almost all major GPU implementations including all modern Intel products. HSA and HSAIL are like Nvidia* CUDA and PTX, vendor specific. Solutions like Intel(R) Cilk Plus may support standards like SPIR to target AMD GPUs, but this is too early to know.


Thank you, I appreciate your detailed reply.

Leave a Comment

Please sign in to add a comment. Not a member? Join today