HD4400 bitwise and operation on uchar2 data

we are seeing different results when implementing "bitwise and" operation in OpenCL kernel working on uchar2 data. The OpenCL kernel code like this:
uchar2 val1;
uchar2 val2;
uchar3 res;
res = val1 & val2;

produce wrong results, while code like below:

res = (uchar2)(val1.x & val2.x, val1.y & val2.y)
produce correct result.

BTW, the same behaviour detected for bitwise or/xor and uchar3/4 data, although attached test case was prepared only for "bitwise and" on uchar2 data. 

setting work_group_size crashes OpenCL on Intel CPU


I am transfering the reduction kernel from amd app sdk.

It requires setting work_group_size when you execute

clEnqueueNDRangeKernel  with local_work_size that is different from 8 it crashes directly in tbb on Intel OpenCL for Intel CPU. The clEnqueueNDRRange successfully launches the kernel.

When you request work_group_size from the device it returns 8192 (should be 8 in this case) and the kernel work group size is 2048. It crashes with both settings.

Works only with the number of the cores.

I have Intel Haswell 4770K.

HD4400 clEnqueueCopyBufferRect issue?


we've detected suspicious behaviour of clEnqueueWriteBufferRect/clEnqueueCopyBufferRect functions which is demonstrated with simple test case attached. The test case depends on OpenCL API only. This work correctly on AMD Tahiti but not on Intel HD4400, HD4600.


The problem is in copying rectangle of interest with some specific parameters from whole image, which is kept in cl buffer.


The short description of test case:

1. create opencl buffer for whole image (not initialized)

GPU HD4600 opencl kernel problem

Hi, i am compiling offline spir kernel.

When i use it on HD4600 GPU i get the following when I invoke clBuildProgram

error: IGILTargetLowering::Call(): unhandled function call!

Call made to: _Z13get_global_idj()
0x7c53480: i64 = GlobalAddress<i64 (i32)* @_Z13get_global_idj> 0 [ORD=1]
error: midlevel compiler failed build.

The same kernel works fine on amd gpu and on intel cpu. Also works fine if the kernel is compiled as spir64

msbuild with visual studio and kernels

Hi i have a kernel that have several includes in it.

Is there automatic way for intel open cl sdk to build them automatically. I have not discovered any. I must add them manually in msbuild. Here is an example.

Nvidia cuda for example has inside GenDepTask that outputs dependencies from the compiler.

ioc.exe does not have anything similar, like cl /showIncludes or gcc -E options.


Iscriversi a OpenCL*