Workshop: Optimizing OpenCL applications for Intel® Xeon Phi™ Coprocessor

The Intel® Xeon Phi™ Coprocessor is designed for highly parallel, high performance demanding applications. The OpenCL* standard for heterogeneous computing is targeted to a variety of highly parallel devices. However, OpenCL* is mainly designed for GPUs, including GPU specific features like local memory, work-group, and barriers. Mapping the OpenCL* standard to Intel Xeon Phi coprocessor introduces new design and performance challenges, which are irrelevant to typical GPUs. This tutorial starts with Intel Xeon Phi coprocessor hardware overview, continues with OpenCL* overview, then explains the mapping of the OpenCL API to the Intel Xeon Phi coprocessor, with focus on parallelism.

The main part of this workshop introduces a set of key application level optimizations that may make the difference for your OpenCL application. Most of the optimizations center around exposing maximum parallelization to hardware. Others deal with using the OpenCL* constructs more efficiently on the Intel Xeon Phi coprocessor. You will be surprised by the performance improvements that some simple application level optimizations may provide.

For more resources and downloads go to the Intel® SDK for OpenCL* Applications XE 2013 web site at:

Download the workshop here

For more complete information about compiler optimizations, see our Optimization Notice.