Optimizing Simple OpenCL Kernels: Modulate Kernel Optimization

Optimizing Simple OpenCL Kernels: Modulate Kernel Optimization

Robert Ioffe describes a consistent series of optimizations that improve OpenCL kernel performance on Intel®
Iris™ Graphics or Intel® Iris™ Pro Graphics using Intel® SDK for OpenCL™ Applications 2013. The optimizations we describe are general in nature; developers could apply them to a broad set of
OpenCL™ kernels. After studying the optimizations presented here, the developers will know the fundamentals of mastery
of Intel® Iris™ Graphics for compute purposes. We start with a simple Modulate kernel.

About the Author

Robert Ioffe is a Technical Consulting Engineer at Intel’s Software and Solutions Group. He is an expert in OpenCL programming and OpenCL workload optimization on Intel Iris and Intel Iris Pro Graphics with deep knowledge of Intel Graphics Hardware. He was heavily involved in Khronos standards work, focusing on prototyping the latest features and making sure they can run well on Intel architecture. Most recently he has been working on prototyping Nested Parallelism (enqueue_kernel functions) feature of OpenCL 2.0 and wrote a number of samples that demonstrate Nested Parallelism functionality, including GPU-Quicksort for OpenCL 2.0. He also recorded and released two Optimizing Simple OpenCL Kernels videos and a third video on Nested Parallelism.

You might also be interested in the following:

Optimizing Simple OpenCL Kernels: Sobel Kernel Optimization

Sierpiński Carpet in OpenCL 2.0

GPU-Quicksort in OpenCL 2.0: Nested Parallelism and Work-Group Scan Functions

Download the Code


 A laptop or a workstation with the 4th Generation Intel® Core™ Processor
OpenCL™ Drivers and Runtimes for Intel® Architecture
Intel® SDK for OpenCL™ Applications 
Intel® VTune™ Amplifier XE 2013 
For more info on Intel Processor Graphics

Pour de plus amples informations sur les optimisations de compilation, consultez notre Avertissement concernant les optimisations.