Chapter 1 – Introduction
This document is designed to help users get started writing code and running OpenCL* 1.2 applications using the Intel® SDK for OpenCL™ Applications for Linux* on a system that includes the Intel Xeon® Phi™ Coprocessor.
More specifically, the Intel SDK for OpenCL Applications XE 2013 for Linux used in this whitepaper is version 3.0.67279. The SDK supports both the Intel® Xeon server and Intel Xeon Phi coprocessor.
1.1 – Overview
Open Computing Language (OpenCL*) is an open standard for general-purpose parallel programing of heterogeneous systems. The OpenCL* specification is ratified by the Khronos* group at http://khronos.org/opencl .
The Intel® SDK for OpenCL* Applications XE 2013 for Linux is based on the published OpenCL* 1.2 Khronos Specification.
OpenCL* Applications can be written in C and C++ and can be compiled using any C/C++ compiler. Users can obtain the SDK from http://software.intel.com/en-us/vcsource/tools/opencl-sdk-xe#download . The Release Notes can be found on the website, under the tag “Product Documents”: http://software.intel.com/en-us/vcsource/tools/opencl-sdk-xe .
1.2 – Compatibility
The Intel® SDK for OpenCL™ supports the following Linux operating systems:
- Red Hat Enterprise Linux 64-bit 6.1 kernel 2.6.32-131 (64-bit version)
- SUSE* Linux Enterprise Server 11 SP1 kernel 220.127.116.11-0.7-default (64-bit version)
If you choose an Intel® Compiler then depending on the Intel® Many-core Platform Software Stack (MPSS) running on the above platforms, you need to use the correct compiler (Intel® Composer XE 2013 for Linux OS).
The table below summarizes the versions that are supported.
2.1.4982-15 or 2.1.5889-16
Table 1: Intel OpenCL SDK Compatability.
Note that the Intel OpenCL SDK for Linux OS supports multiple Intel Xeon Phi coprocessors.
The first part of this whitepaper shows how to install the Intel SDK for OpenCL on a Linux OS. The second part shows how to run an OpenCL* sample code on an Intel Xeon Phi Coprocessor.
Chapter 2 – Installing Intel SDK for OpenCL Applications XE 2013 for Linux
To start, you must have installed the latest version of the Intel C/C++ Compiler as well as the Intel MPSS. In this paper, the Intel C/C++ Composer XE 2013 update 4 and the Intel MPSS Gold Update 3 are used. The Installation Notes document is available in the main page under the tag “Product Documents”: http://software.intel.com/en-us/vcsource/tools/opencl-sdk-xe
You can purchase these software development tools from http://software.intel.com/en-us/linux-tool-suites. These instructions assume that you downloaded the Intel® OpenCL* SDK and have the intel_sdk_for_ocl_applications_2013_xe_sdk_3.0.67279_x64 file.
Also, before installing the Intel SDK for OpenCL, please uninstall previous installations of the SDK that are older than Intel SDK for OpenCL Applications XE 2013 beta. You must have root permissions to successfully install the SDK.
Once you have acquired a copy of the SDK, extract the tar file for Intel SDK for OpenCL. This package contains the OpenCL* C header files, development tools and, the OpenCL* runtime and compiler for Linux operating systems.
# tar -xvf intel_sdk_for_ocl_applications_2013_xe_sdk_3.0.67279_x64.tgz
# cd intel_sdk_for_ocl_applications_2013_xe_sdk_3.0.67279_x64
To install the runtime for CPU as well as the Intel Xeon Phi coprocessor run the install-cpu+mic.sh script as root. The runtime will be installed into the /opt/intel/opencl-1.2-3.0.67279 directory.
# sudo ./install-cpu+mic.sh
Alternately, you can use the RPM package manager to install the runtime. To install the runtime for the CPU as well as the coprocessor, run the following command:
- For RedHat Enterprise Linux OS:
# sudo yum install *base*.rpm *intel-cpu*.rpm *intel-mic*.rpm
- For SUSE Linux Enterprise Server OS:
# sudo zypper install *base*.rpm *intel-cpu*.rpm *intel-mic*.rpm
To install the developer tools along with the runtimes, execute the following command:
- For RedHat Enterprise Linux OS:
# sudo yum install *base*.rpm *intel-cpu*.rpm *intel-mic*.rpm *devel*.rpm
- For SUSE Linux Enterprise Server OS:
# sudo zypper install *base*.rpm *intel-cpu*.rpm *intel-mic*.rpm *devel*.rpm
After successfully installing these two packages, you should see the following directories under opencl-1.2-3.0.67279:
# ls /opt/intel/opencl-1.2-3.0.67279 bin doc etc include lib64 libmic
Note that /lib64 and /limbic are the runtime libraries for CPU and coprocessors respectively.
Chapter 3 – Compiling and Running a Sample OpenCL* Program
This section includes a sample OpenCL* program written in C. We will show how to compile and run the program for the Intel Xeon Phi Coprocessor.
The sample program is an implementation of Gaussian Kernel Smoothing or Gaussian Smoothing, which is most commonly used in image processing. Gaussian smoothing tries to reduce the level of noise in an image, thereby making image processing algorithms more robust against noise. This technique finds application in fields such as medical imaging, graphics software and computer vision.
The Gaussian Kernel Smoothing being an image processing application possesses a high of degree of parallelism and hence is well suited for the Intel Xeon Phi coprocessor. Also, the operations in this algorithm closely resemble a 5-point 2D stencil operation which is commonly used in high performance computing. These two traits make this application an excellent candidate as an OpenCL application running on the Intel Xeon Phi coprocessor.
Gaussian smoothing uses a Gaussian function for calculating the transformation to apply to each pixel in an image. A Gaussian function in one dimension has the form:
Similarly, the Gaussian function in two dimensions
Where σis the standard deviation of the Gaussian distribution, x is the distance from the origin in the horizontal axis and y is the distance from the origin in the vertical axis. Using the values from this distribution, a weight matrix, also called a convolution kernel, is created and then applied to the image. For each pixel in the in the image, the pixel’s new values is calculated as a weighted mean of the pixel’s neighborhood.
The weight matrix is such that the weights are inversely proportional to the distance from the center pixel. The original pixel’s value receives the heaviest weight and neighboring pixels receive smaller weights as their distance to the original pixel increases. For this implementation, we use the following weight matrix or convolution kernel.
Table 2: Weight Matrix
In this implementation, the host divides the work between the available Intel Xeon Phi coprocessors such that each coprocessor works equal sets of rows in the image. Each coprocessor applies the Gaussian smoothing to each pixel assigned to it. To deal with the boundaries of the input image, the implementation pads the edges of the input image by duplicating the edge pixels. As with any OpenCL application, the host is charged with setting up the OpenCL platform, choosing devices and cleaning up after the execution is completed.
The pseudo code is shown below:
Host initializes the OpenCL Platform and selects the OpenCL devices. Host reads an NxN input image For each device Transfer (N/#devices) rows of input image from host to device For each device For each pixel assigned to the device Set sum = 0 For original pixel and all 8 neighbors Calculate sum = sum + weight of pixel * pixel value Calculate sum = sum/9 Store the sum in corresponding pixel in output image. For each device Transfer (N/#devices) rows of output image from device to host Host writes a NxN output image to file Host cleans up.
Before compiling the program, called ocl_sample.c, you need to establish the proper environment settings for the Intel C++ Compiler for the coprocessor.
# source /opt/intel/composerxe/bin/compilervars.sh intel64
Build the application ocl_sample.out for the coprocessor.
# icc ocl_sample.cpp -lOpenCL -oocl_sample.out
To run the application on the coprocessor, simply execute the binary, the way you would execute the binary on a Linux system.
# ./ocl_sample.out Platform: Intel(R) OpenCL Number of accelerators found: 2 DEVICE #0: NAME:Intel(R) Many Integrated Core Acceleration Card #COMPUTE UNITS:240 DEVICE #1: NAME:Intel(R) Many Integrated Core Acceleration Card #COMPUTE UNITS:240 Compilation started Compilation done Linking started Linking done Build started Kernel <smoothing_kernel> was successfully vectorized Done. OpenCL Initialization Completed Completed reading Input Image: 'input.pgm' Transferring Data from Host to Device Executing Kernel on selected devices Transferring data from Device to Host Completed writing Output Image: 'output.pgm' Completed execution! Cleaning Up.
This binary expects a grayscale input image named as `input.pgm`. On successful, execution, an output file name ‘output.pgm’ is created. As evident from the file extension, the input image file should be present in the Portable GrayMap (PGM) format.
The input and output images for a sample run are shown below. The edges in the output image have been blurred; however, the regions of the image with constant gray scale values remain unchanged. Thus, as expect, Gaussian Kernel Smoothing blurs edges in the input image.
Figure 1: Input Image (left) and corresponding Output Image (right)
Chapter 4: Tools and Resources
Several tools are available to aid the developer in building OpenCL* applications running on the Intel Xeon Phi coprocessor. For example, the Kernel Builder provided with the Intel SDK for OpenCL*, enables you to build and analyze OpenCL* kernels. The tools provide full offline OpenCL* compilation, and include the OpenCL* syntax checker, cross-hardware platform compiler, Low Level Virtual Machine (LLVM) viewer, assembly code viewer, and intermediate program binary generator. To find more about the Kernel Builder please visit this (http://software.intel.com/sites/landingpage/opencl/user-guide-2013/index.htm#Using_the_Intel_SDK_for_OpenCL_Offline_Compiler_Standalone_Tool.htm) article. Also, Intel® VTune™ Amplifier XE can be used to analyze OpenCL* applications running on the Intel Xeon Phi coprocessor. To find more about profiling OpenCL* applications running on the Intel Xeon Phi coprocessor, please visit this (http://software.intel.com/en-us/articles/performance-tuning-of-opencl-applications-on-intel-xeon-phi-coprocessor-using-intel-vtune-amplifier-xe-2013) article.
There is wealth of information available to the developer in the form of guide, articles and blogs. The following is a small subset of documents that might be particularly helpful for developing OpenCL* codes running on the Intel Xeon Phi coprocessor.
- Intel SDK for OpenCL* applications 2013 XE – user’s Guide : http://software.intel.com/sites/products/documentation/ioclsdk/2013XE/UG/index.htm
- OpenCL* Design and Programming Guide for the Intel Xeon Phi coprocessor: http://software.intel.com/en-us/articles/opencl-design-and-programming-guide-for-the-intel-xeon-phi-coprocessor
- Tutorial: Optimizing OpenCL applications for Intel® Xeon Phi™ Coprocessor http://software.intel.com/en-us/articles/workshop-optimizing-opencl-applications-for-intel-xeon-phi-coprocessor
- Optimization Guide: http://software.intel.com/sites/products/documentation/ioclsdk/2013XE/OG/index.htm
About the Authors
Sumedh Naik received a Bachelor’s degree in Electronics Engineering from Mumbai University, India in 2009 and a Master’s degree in Computer Engineering from Clemson University in December 2012. He joined Intel in 2012 and been working as an Software Engineer, focusing on developing collateral for Intel® Xeon Phi™ coprocessor.
Loc Q Nguyen received an MBA from University of Dallas, a master’s degree in Electrical Engineering fromMcGill University, and a bachelor's degree in Electrical Engineering fromÉcole Polytechnique de Montréal. He is currently a software engineer with Intel Corporation's Software and Services Group. His areas of interest include computer networking, computer graphics, and parallel processing.
OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos
INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.
A "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death. SHOULD YOU PURCHASE OR USE INTEL'S PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS' FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS.
Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined". Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information.
The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.
Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order.
Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or go to: http://www.intel.com/design/literature.htm
This sample source code is released under the Intel Sample Source Code Agreement located at http://software.intel.com/en-us/articles/intel-sample-source-code-license-agreement/
Intel, the Intel logo, Cilk, Xeon and Intel Xeon Phi are trademarks of Intel Corporation in the U.S. and other countries.
*Other names and brands may be claimed as the property of others
Copyright© 2013 Intel Corporation. All rights reserved.
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel.