OpenCL achieve 800% CPU utilization

Hi all, 

I am curious about the CPU implementation of OpenCL for Intel processors.

I run a small set of benchmark from clpeak on a i7-4770S (4 cores, hyperthreading enabled) under linux.

it shows the CPU utilization can achieve almost 800% (using top), meaning all CPU resource are utilized.

However, when I run the benchmark in clpeak individually, it shows maximum 400%.

Run benchmark consecutively can benefit from OpenCL runtime.

Is that mean when a workload is issued to OpenCL CPU runtime, it will not all of the cores but part of them.

Why does the post-Build event not pick up the OpenCL files where they have been specified

I have my OpenCL kernel source files in a separate directory ("cl") on my file system (Windows 7 machine).

This is no different from having my C++ source files available in yet another directory ("src") on my file system.

Adding them through "Solution:Pop Up Menu>Add Existing Item" puts the files in the respective "OpenCL Files" and "Source Files" container, as expected.

However, the build will fail with an error:

The system cannot find the file specified.

Xeon Phi 64-bit Atomics


the Xeon Phi does only support the following extensions: cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_spir cl_khr_fp64

That means we are not able to use 64-bit atomic operations. Is there any possibility to use atomics for 64-bits? (cl_khr_int64_base_atomics)

That is absolutely necessary for many scientific applications, because float does not suffices in precision.

Regards, Simon Scholl

Broadwell IGP needs more sub_group functions

OpenCL 2.0 has no support for a "ballot" style sub-group function.  A ballot returns bitmask containing the conditional flag for each "lane" in the sub-group.  As long as the sub-group (SIMD) size is 32 or less then this fits in a cl_uint.

Presumably sub-group any() and all() are implemented on Broadwell IGP by returning an ARF flag register?

It would be great if Broadwell IGP unofficially implemented sub_group_any() by returning the actual flag bitmask so that developers could apply popcount() and other operations to the mask.

Kernel optimization with oclopt and ico64


I work with the CLI tool of the Intel OpenCL SDK 1.2 on Scientific Linux. I'm interested in optimize my kernels (1) with the oclopt program and (2) with assembly code for CPU or MIC.

Question (1): How I understand the tool oclopt currently: the tool takes a builded spir code and some optimization methods like prefetching or loop-unrolling and produces an optimized version of it. Example:

Help! While using Kernel Builder for OpenCL API to debug, it shows CPU version are not supported by KDB...

As the image shows below, I've set the input arrayA, B and the output arrayC, then click the debug button, and Error occur.

Since I am new in OpenCL, I don't know how to solve this problem, so is there anyone who can help me? Thanks a lot!


Why I can see 2 platforms?


My computer has only 1 CPU. It has 4 cores (8 threads). But when I use   err = clGetPlatformIDs(0, NULL, &numPlatforms);, I get 2 platforms.

One platform contains 1 CPU and 1 GPU, the other one contains 1 CPU. The 2 CPU are the same CPU.

I don't know why.


The platform number :  2
        PlatformId=0 deviceNums=2 vedor:Intel(R) Corporation
        PlatformId=1 deviceNums=1 vedor:Intel(R) Corporation

Problem with MACROS added with clBuildProgram

Hi to everyone,

I have a problem with the SDK plugin for Visual Studio 2010. In my kernel I add several MACROS using the -D flag inside the options argument of the clBuildProgram function. However, this is not recognized by the Intel OpenCL SDK plugin, therefore it throws several "use of undeclared identifier" errors and I am not able to run my program.

Iscriversi a OpenCL*