Intel® SDK for OpenCL™ Applications 2016, now with GPU debugging, is released!

I am glad to announce the availability of Intel® SDK for OpenCL™ Applications 2016 for Windows*, CentOS*, and Ubuntu*. Please visit to download the version for your platform. For details check out Release Notes.

New Features

  • Beta release of Source and Assembly level GPU Kernel Debugging on Windows* operating system

The Best of Both Worlds: Bringing Back the Intel® SDK for OpenCL™ Applications

We are bringing back the standalone Intel® SDK for OpenCL™ Applications. Leah posted a nice article explaining the reasoning behind this decision:

i7-5775c Iris Pro vs CPU performance


I'm benchmarking i7-5775c's 4 CPU cores against its Iris Pro 6200 for simple OpenCL kernels. Guess you can say I want to know when it makes sense to off-load computation onto the IGP. One experiment involves each thread executing many FMA operations on a single input element to measure computational speed. I'm surprised to see the IGP outperform the CPU by nearly 9x, and by 18x with hyper-threading disabled:

OpenCL kernel:
void kernel fmaKernel(global float * out){

  float sum = out[get_global_id(0)];

gpu aperture memory

A few questions about aperture memory,

- using OCL, can i allocate memory in aperture memory?

- does the concept "aperture memory"  apply to intel integrated GPU?   My understand was "aperture memory" is only for memroy on gfx card.

- does using "aperture memory" affect performance significantly?






Line-by-line time profiling with an OpenCL kernel

hi, I am working on a project to optimize an OpenCL code. This kernel is computationally dense, and I'd like to see where is the bottleneck.

I haven't installed Intel's CL libraries yet, but I am wondering if it is possible to do a line-by-line profiling with my OpenCL kerbel when running on the CPU? we have profiled the code with CodeXL on an AMD GPU, but the profiler only reports abstract metrics, which are not exactly helpful in pinpointing the hotspots.

Execution Model For the Intel GPU

     Now I'm confused about  execution Model of the work-items . There are 3 compiled model(SIMD8 SIMD16,SIMD32) for the intel GPU.

     1. For the SIMD-X,does that mean there are X work-items execute simultaneously in one hardware thread?

     2. There is a OpenCL Kernel , compiled to SIMD16. and my work-group is  a “square” work-group < 8, 8, 1 >. For SIMD16, does that mean the first two 8 work-items(< 0~7, 0~1, 1 >) execute simultaneously? 

Code Builder analysis session doesn't show kernel tips


I am working with MS VS 2013 with the intel SDK build (latest) , and when creating a nwe Analysis Session , the kernel section indicates that there are 6 performance tips , but when I click on it it dosn't show any tips , likewise ""Api Calls"" or ""Memory Commands"" sections.

Iscriversi a OpenCL*