OpenCL achieve 800% CPU utilization

Hi all, 

I am curious about the CPU implementation of OpenCL for Intel processors.

I run a small set of benchmark from clpeak on a i7-4770S (4 cores, hyperthreading enabled) under linux.

it shows the CPU utilization can achieve almost 800% (using top), meaning all CPU resource are utilized.

However, when I run the benchmark in clpeak individually, it shows maximum 400%.

Run benchmark consecutively can benefit from OpenCL runtime.

Is that mean when a workload is issued to OpenCL CPU runtime, it will not all of the cores but part of them.

Why does the post-Build event not pick up the OpenCL files where they have been specified

I have my OpenCL kernel source files in a separate directory ("cl") on my file system (Windows 7 machine).

This is no different from having my C++ source files available in yet another directory ("src") on my file system.

Adding them through "Solution:Pop Up Menu>Add Existing Item" puts the files in the respective "OpenCL Files" and "Source Files" container, as expected.

However, the build will fail with an error:

The system cannot find the file specified.

Xeon Phi 64-bit Atomics


the Xeon Phi does only support the following extensions: cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_spir cl_khr_fp64

That means we are not able to use 64-bit atomic operations. Is there any possibility to use atomics for 64-bits? (cl_khr_int64_base_atomics)

That is absolutely necessary for many scientific applications, because float does not suffices in precision.

Regards, Simon Scholl

Blanks debugging Windows in Microsoft Visual Studio


I have installed the latest Intel SDK for OpenCL on my windows 7 machine (Core processor and HD 4000 Graphics).

I had no problem with the plugin for either MVS2010 (pro version) or MVS2013 (community version): the API Tracer, The Queue Viewer, The Object Tree were all giving me information.

Broadwell IGP needs more sub_group functions

OpenCL 2.0 has no support for a "ballot" style sub-group function.  A ballot returns bitmask containing the conditional flag for each "lane" in the sub-group.  As long as the sub-group (SIMD) size is 32 or less then this fits in a cl_uint.

Presumably sub-group any() and all() are implemented on Broadwell IGP by returning an ARF flag register?

It would be great if Broadwell IGP unofficially implemented sub_group_any() by returning the actual flag bitmask so that developers could apply popcount() and other operations to the mask.

What's New? OpenCL™ Runtime 15.1 (CPU only)

  • Removed support for the Intel Xeon Phi coprocessors
  • New performance-related environment variables:
    • CL_CONFIG_CPU_RT_LOOP_UNROLL_FACTOR for loop unrolling of loops with non-constant trip count (CPU only)
    • CL_CONFIG_USE_FAST_RELAXED_MATH for enabling computations with floating-point calculation optimizations (forcing –cl-fast-relaxed-math)
  • Improved MS Visual Studio* debugging of OpenCL kernels on CPU device
  • Desarrolladores
  • Profesores
  • Estudiantes
  • Linux*
  • Diseñador de códigos OpenCL™
  • OpenCL*
  • Kernel optimization with oclopt and ico64


    I work with the CLI tool of the Intel OpenCL SDK 1.2 on Scientific Linux. I'm interested in optimize my kernels (1) with the oclopt program and (2) with assembly code for CPU or MIC.

    Question (1): How I understand the tool oclopt currently: the tool takes a builded spir code and some optimization methods like prefetching or loop-unrolling and produces an optimized version of it. Example:

    Help! While using Kernel Builder for OpenCL API to debug, it shows CPU version are not supported by KDB...

    As the image shows below, I've set the input arrayA, B and the output arrayC, then click the debug button, and Error occur.

    Since I am new in OpenCL, I don't know how to solve this problem, so is there anyone who can help me? Thanks a lot!


    Why I can see 2 platforms?


    My computer has only 1 CPU. It has 4 cores (8 threads). But when I use   err = clGetPlatformIDs(0, NULL, &numPlatforms);, I get 2 platforms.

    One platform contains 1 CPU and 1 GPU, the other one contains 1 CPU. The 2 CPU are the same CPU.

    I don't know why.


    The platform number :  2
            PlatformId=0 deviceNums=2 vedor:Intel(R) Corporation
            PlatformId=1 deviceNums=1 vedor:Intel(R) Corporation

    Suscribirse a OpenCL*