Performance Promise of OpenCV* 3.0 and Intel® INDE OpenCV


The Intel® Integrated Native Developer Experience (Intel® INDE) is a cross-architecture productivity suite that provides developers with tools, support, and IDE integration to create high-performance C++/Java* applications for Windows* on Intel® architecture and Android* on ARM* and Intel® architecture.

  • Desenvolvedores
  • Professores
  • Estudantes
  • Android*
  • Microsoft Windows* (XP, Vista, 7)
  • C/C++
  • Java*
  • Python*
  • Intel® Integrated Native Developer Experience (INDE)
  • OpenCL*
  • Low throughput - how to diagnose?


    I've just recently started programming opencl on my IvyBridge GT2 (16 EU) powered Laptop, however results don't look that promising for my use-case. To narrow things down, I started with a very basic kernel which traverses a buffer holding 2d image data:

    __kernel void image_scaling(__global const char* in, __global char* out, int inputStride)


    unsigned int idx = get_global_id(1) * inputStride + get_global_id(0);

    char input_value = in[idx];

    out[idx] = input_value + 50;



    Broadwell 64-bit ulong performance regression on min/max/compare

    Intel IGP CL team,

    I'm seeing a huge performance regression between Haswell and Broadwell when comparing 64-bit ulong's or performing min/max operations.

    Please check the number of instructions that Broadwell is generating for the two ulong (64-bit) "compare-exchange" sequences below.

    The 64-bit compare-exchange sequences are running half as fast on Broadwell when compared to Haswell. 

    The 32-bit compare-exchanges appear to be correct (they're very fast).

    OpenCL™ Device Fission 助力 CPU 性能

    下载 PDF


    Device Fission 是 OpenCL™ 规范的一种特性,可为 OpenCL 编程人员提供更强的能力和控制,以更好地管理哪些计算单元运行 OpenCL 命令。 从根本上讲,Device Fission 支持将设备再次划分为一个或多个子设备,如果使用得当,这可以提供出色的性能优势,尤其是在运行 CPU 时。

    面向 OpenCL™ 应用的英特尔® 软件开发套件是专为基于英特尔® 架构的平台上的 OpenCL 应用提供的一个全面的软件开发环境。 该软件开发套件支持开发人员在使用 Windows* 和 Linux* 操作系统的英特尔® CPU 上开发 OpenCL 应用并以其为目标。

  • Desenvolvedores
  • Linux*
  • Microsoft Windows* (XP, Vista, 7)
  • Kit de desenvolvimento Intel® para aplicativos OpenCL™
  • OpenCL™ Code Builder
  • Intel® Integrated Native Developer Experience (INDE)
  • OpenCL*
  • Computação paralela
  • Kernel debuuging to find an erroring work item


      During visual studio code builder debugging, I have a clEnqueueNDRangeKernel failing for a certain kernel, with a bad memory access. To find out which work item (of 512) is causing the error, I have had to gradually (and awkwardly) limit the participating work items, until I find the item(s) that are requesting this bad access. I find myself in this position reasonably often: is there a more graceful way to do this?


    GL CL interop on CPU

    I have a question regarding OpenCL / OpenGL interoperation when run on an Intel CPU.

    I just don't quite understand the purpose of the cl_khr_gl_sharing extension on a CPU.

    When a shared GL-CL context is created, must the GL device context and CL context both be associated with the same device?
    Or am I able to have the GL context associated with my GPU, and the CL context associated with my CPU?

    Create kernels from pre-compiled .ir


    I have a question regarding the creation of kernels from pre-compiled .ir files.

    I create a .ir using ioc32 –cmd=compile –input=kernel.cll –device=GPU –ir=kernel.or

    In my code I:

    a) Load the .ir from file

    b) Create a clProgram clCreateProgramWithBinary with my binary (clBuildProgram(my cl_program 1, &my_cldevice, NULL, NULL, NULL))

    c) Call clBuildProgram with the clProgram created from above.


    create buffer ok,but not write really

    I use clCreateBuffer(CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR,) function create 3 buffers, every buf has 300M, the driver version,the Device global alloc mem size is 332185(KB) ,it is no error while runnig,but the result is not correct,the calculation using the first buf is correct ,the other two is wrong!I found that
    the other two bufs have no data to be written。 while I use ,the Device global alloc mem size is 415744 (KB) ,I crate 2 bufs, the result is correct,what's wrong?
    who have the version for win7 hd4000,Thank you!

    Assine o OpenCL*