Tutorial: Unlock Intel® GPU capabilities with Intel OpenCL™ Extensions

By Jeffrey McAllister,

Published:05/23/2017   Last Updated:05/22/2017

Download tutorial code and GPU compute samples media data:

IWOCL_2017_Tutorial.tgz (110.95 MB)

yuv_samples.tgz (27.94 MB)


Based on an IWOCL 2017 tutorial Unlock Intel GPUs for High Performance Compute, Media and Computer Vision.  



Intel provides many extensions to the Khronos OpenCL™ standard to help you utilize the full range of hardware capabilities.  

  • Subgroups
  • Video Motion Estimation (VME)
  • VEBox

These extensions are not standalone.  They build upon each other.


The tutorial code focuses on subgroups, VME, and VEBox.  Image processing and sharing extensions are also used in the tutorials code as solution components.

For more information on Intel extensions: /content/www/us/en/develop/articles/opencl-intel-graphics-extensions.html



Intel subgroups are 

  • subset of a work group
  • equal to the SIMD width (8,16,or 32)
  • in the same hardware thread of the EU
  • share thread resources (including register space)
  • execute together 

Intel subgroup functions add

  • barrier, broadcast, reduce, scan 
  • shuffle
  • block read/write

More info: Spec


Video Motion Estimation (VME)

Intel Gen GPUs accelerate the search for motion in video.  This is a core codec component but can also be used in a wide range of applications from custom bitrate control to computer vision.



Intel GPUs contain a specialized IP block designed for video enhancement operations.


For more info:


OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos


Product and Performance Information


Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804