Using Intel® SDK for OpenCL™ Applications 2015 to Accelerate Your Software

By Robert M Ioffe,

Published:10/15/2014   Last Updated:10/15/2014


The goal of this article is to give you a brief introduction to developing OpenCL(tm) applications using Intel(r) SDK for OpenCL Applications 2015. We provide you with the starter files for a Sobel filter application. You will create an OpenCL project in Microsoft Visual Studio*, populate it with the starter files, select the device you are planning to run your OpenCL code on, build and execute you application. Intel(r) SDK for OpenCL Applications 2015 greatly simplifies the setup of your OpenCL projects - no need to setup OpenCL include and library directories. Intel(r) SDK for OpenCL Applications 2015 provides you with built-in code highlighting and code completion for OpenCL as well. It also provides you with code hints for all OpenCL built-in functions and keywords. Intel(r) SDK for OpenCL Applications 2015 contains an OpenCL Code Builder that enables you to rapidly develop OpenCL kernels, debug and analyze them. We are going to cover OpenCL Code Builder in a separate article.

Tutorial Requirements

For this tutorial you will need a Microsoft Visual Studio 2012* or beyond running on a Microsoft Windows* 7 or beyond on a personal computer with an Intel(r) processor with Intel(r) Processor Graphics (Intel(r) Iris(tm) Graphics or Intel(r) Iris(tm) Pro Graphics are highly recommended). You need to install Intel(r) SDK For OpenCL Applications 2015 prior to running this tutorial. We highly recommend installing a capable .ppm file viewer to visually verify the results of running the sample.

A Brief Introduction to the Sobel Operator

This tutorial works with an OpenCL implementation of the Sobel operator (see for more info). The operator uses two 3×3 kernels which are convolved with the original image to calculate approximations of the derivatives - one for horizontal changes, and one for vertical. If we define A as the source image, and Gx and Gy are two images which at each point contain the horizontal and vertical derivative approximations, the computations are as follows: 

where  here denotes the 2-dimensional convolution operation.
The x-coordinate is defined here as increasing in the "right"-direction, and the y-coordinate is defined as increasing in the "down"-direction. At each point in the image, the resulting gradient approximations can be combined to give the gradient magnitude, using:

The value G is calculated for every pixel of the image.

The naïve implementation of the Sobel kernel operating on an uchar buffer follows:

__kernel void Sobel_v1_uchar (__global uchar *pSrcImage, __global uchar *pDstImage)
	uint dstYStride = get_global_size(0);
	uint dstIndex   = get_global_id(1) * dstYStride + get_global_id(0);
	uint srcYStride = dstYStride + 32;
	uint srcIndex   = get_global_id(1) * srcYStride + get_global_id(0) + 16;

	uint a,		b,		c;
	uint d,	/*center*/  f;
	uint g,		h,		i;

	// Read data in	
	a = pSrcImage[srcIndex-1];	b = pSrcImage[srcIndex];	c = pSrcImage[srcIndex+1];
	srcIndex += srcYStride;
	d = pSrcImage[srcIndex-1];	/*center*/	               f = pSrcImage[srcIndex+1];	
	srcIndex += srcYStride;
	g = pSrcImage[srcIndex-1];	h = pSrcImage[srcIndex];	i = pSrcImage[srcIndex+1];	

	uint xVal =  	a* 1 +			c*-1	+
			d* 2 +	/*center*/	f*-2	+
			g* 1 +		    	i*-1;
	uint yVal =	a* 1 + b* 2 + c* 1 +
			g*-1 + h*-2 + i*-1;	

	// Write data out		
	pDstImage[dstIndex] =  min((uint)255, (uint)sqrt(xVal*xVal + yVal*yVal));	

File provided with this tutorial contains three more kernels, progressively optimized for Intel(r) Processor Graphics. With that introduction, let us start the tutorial.

Setting up the Sobel Project in Microsoft Visual Studio*

Download and unpack provided with this tutorial. In Microsoft Visual Studio create a new project by selecting OpenCL\Empty OpenCL Project for Windows:

Create an empty OpenCL Project. Use the Sobel_OCL as the name of the project and point Location to the location of Sobel_OCL directory. Make sure to uncheck 'Create directory for solution' checkbox.

Add existing files to the project:


Select the GPU device by selecting Intel SDK for OpenCL Applications in Configuration Properties and selecting Intel(R) Graphics (-device=GPU) from the scroll down list :

Examine the kernels by selecting in Solution Explorer:

Hover over keywords like __kernel, get_global_id or uint and examine popup tips. Now that we created the solution, added the files, and selected the proper device, we are going to build Release configuration (note that is preprocessed with the OpenCL(tm) API Offline Compiler):

1>------ Build started: Project: Sobel_OCL, Configuration: Release Win32 ------
1>  Preprocessing:
1>  OpenCL Intel(R) Graphics device was found!
1>  Device name: Intel(R) Iris(TM) Pro Graphics 5200
1>  Device version: OpenCL 1.2 
1>  Device vendor: Intel(R) Corporation
1>  Device profile: FULL_PROFILE
1>  fcl build 1 succeeded.
1>  fcl build 2 succeeded.
1>  bcl build succeeded.
1>  Build succeeded!
1>  SobelMain.cpp
1>c:\cygwin\home\rioffe\playground\sobel_ocl\OpenCLUtils.h(149): warning C4996: 'clCreateCommandQueue': was declared deprecated
1>          C:\Intel\INDE\code_builder_4.6.0.118\include\cl/cl.h(1358) : see declaration of 'clCreateCommandQueue'
1>  Generating code
1>  Finished generating code
1>  Sobel_OCL.vcxproj -> C:\cygwin\home\rioffe\playground\Sobel_OCL\Release\Sobel_OCL.exe
1>          1 file(s) copied.
========== Build: 1 succeeded, 0 failed, 0 up-to-date, 0 skipped ==========


Now that the project is built, we are ready to set command line arguments of the executable:

and run the application:

Notice that each successive kernel is progressively more optimized and therefore faster. You can use optimizations shown here to optimize your kernels.

Sobel_v1_uchar                   Simple scalar implementation
Sobel_v2_uchar16                 Process 16 items in a row by using uchar16
Sobel_v3_uchar16_to_float16      Convert to float16 for 2X math performance
Sobel_v4_uchar16_to_float16_16   Process 16 rows at a time

As a final step, inspect the results contained in four *validation.ppm files.

By downloading or copying all or any part of the sample source code, you agree to the terms of the Intel(r) Sample Source Code License Agreement.
For more complete information about compiler optimizations, see our Optimization Notice.




Attachment Size 8.4 MB

Product and Performance Information


Performance varies by use, configuration and other factors. Learn more at