Need help: I get unexpected results using opencl 2.0 atomics on HD5500?


I am trying opencl 2.0 atomics on HD5500, following the


But I find the atomic operations result is not as expected.     The simplified version test is:

kernel void atomics_test(global int *output, volatile global atomic_int*  atomicBuffer, uint iterations, uint offset)
    for (int j = 0; j < MY_INNER_LOOP; j++)

clEnqueueReadBuffer transfers truncated data on HD4600

In certain cases clEnqueueReadBuffer doesn't transfer all the required data when executed on HD4600. System: Win7 x64, driver version, 32-bit application.

It seems that in case of page-aligned destination buffer and transfer length that is not multiple of 4KB only multiple of 4KB is transfered. Sample code:

Intel Phi does not write data to buffer when clEnqueueReadBuffer is called CentOS

Hello everyone!

I'm running into a problem where data is not being written to my buffer when the kernels finish. I've tested my kernel in isolation in Eclipse running in Ubuntu on an Intel i5 CPU and it seems to output the correct results. When I move it over to CentOS I can't get printf statements to return from the kernel and my output buffers are never written to. Here is an example of my code:

double * coef_elts = (double *) calloc(p * voxels, sizeof(double));

Performance Promise of OpenCV* 3.0 and Intel® INDE OpenCV


The Intel® Integrated Native Developer Experience (Intel® INDE) is a cross-architecture productivity suite that provides developers with tools, support, and IDE integration to create high-performance C++/Java* applications for Windows* on Intel® architecture and Android* on ARM* and Intel® architecture.

  • Sviluppatori
  • Professori
  • Studenti
  • Android*
  • Microsoft Windows* (XP, Vista, 7)
  • C/C++
  • Java*
  • Python*
  • Intel® INDE
  • OpenCL*
  • Low throughput - how to diagnose?


    I've just recently started programming opencl on my IvyBridge GT2 (16 EU) powered Laptop, however results don't look that promising for my use-case. To narrow things down, I started with a very basic kernel which traverses a buffer holding 2d image data:

    __kernel void image_scaling(__global const char* in, __global char* out, int inputStride)


    unsigned int idx = get_global_id(1) * inputStride + get_global_id(0);

    char input_value = in[idx];

    out[idx] = input_value + 50;



    Broadwell 64-bit ulong performance regression on min/max/compare

    Intel IGP CL team,

    I'm seeing a huge performance regression between Haswell and Broadwell when comparing 64-bit ulong's or performing min/max operations.

    Please check the number of instructions that Broadwell is generating for the two ulong (64-bit) "compare-exchange" sequences below.

    The 64-bit compare-exchange sequences are running half as fast on Broadwell when compared to Haswell. 

    The 32-bit compare-exchanges appear to be correct (they're very fast).

    OpenCL™ Device Fission 助力 CPU 性能

    下载 PDF


    Device Fission 是 OpenCL™ 规范的一种特性,可为 OpenCL 编程人员提供更强的能力和控制,以更好地管理哪些计算单元运行 OpenCL 命令。 从根本上讲,Device Fission 支持将设备再次划分为一个或多个子设备,如果使用得当,这可以提供出色的性能优势,尤其是在运行 CPU 时。

    面向 OpenCL™ 应用的英特尔® 软件开发套件是专为基于英特尔® 架构的平台上的 OpenCL 应用提供的一个全面的软件开发环境。 该软件开发套件支持开发人员在使用 Windows* 和 Linux* 操作系统的英特尔® CPU 上开发 OpenCL 应用并以其为目标。

  • Sviluppatori
  • Linux*
  • Microsoft Windows* (XP, Vista, 7)
  • Intel® SDK per applicazioni OpenCL™
  • OpenCL™ Code Builder
  • Intel® INDE
  • OpenCL*
  • Elaborazione parallela
  • Kernel debuuging to find an erroring work item


      During visual studio code builder debugging, I have a clEnqueueNDRangeKernel failing for a certain kernel, with a bad memory access. To find out which work item (of 512) is causing the error, I have had to gradually (and awkwardly) limit the participating work items, until I find the item(s) that are requesting this bad access. I find myself in this position reasonably often: is there a more graceful way to do this?


    Iscriversi a OpenCL*