Create kernels from pre-compiled .ir


I have a question regarding the creation of kernels from pre-compiled .ir files.

I create a .ir using ioc32 –cmd=compile –input=kernel.cll –device=GPU –ir=kernel.or

In my code I:

a) Load the .ir from file

b) Create a clProgram clCreateProgramWithBinary with my binary (clBuildProgram(my cl_program 1, &my_cldevice, NULL, NULL, NULL))

c) Call clBuildProgram with the clProgram created from above.


create buffer ok,but not write really

I use clCreateBuffer(CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR,) function create 3 buffers, every buf has 300M, the driver version,the Device global alloc mem size is 332185(KB) ,it is no error while runnig,but the result is not correct,the calculation using the first buf is correct ,the other two is wrong!I found that
the other two bufs have no data to be written。 while I use ,the Device global alloc mem size is 415744 (KB) ,I crate 2 bufs, the result is correct,what's wrong?
who have the version for win7 hd4000,Thank you!

OpenCL.lib Missing

I've recently installed everything that I thought would be necessary for developing a OpenCL application, but apparently I was wrong.  I have the necessary header files and the .dll files, but the .lib file that is required for the linking process for a C++ application is completely MIA.

This is what I downloaded (free edition) and installed as per the website

More flexibility in kernel analysis (Code Builder, local sizes)

Hi folks,

I'm facing a problem on workgroup size definitions during a session of kernel analysis, I'd like to know if it is possible to benchmark all the combinations of local work size possible.

For example, if I want to test the combinations of local sizes between 1,23,50 and 100 I put these values :

VS Memory/Watchlists window content inaccurate during debugging

Using code buiider under windows 7 and VS 2010 sp1....

If  I run a hack kernel like this...

__kernel void test_local()
    __local int s_testValue[10][10];
    __local int* rowStartThisThread = &(s_mcuValues[0][0]);

    s_mcuValues[0][0] = 5;

and set a breakpoint after the setting of the 5, then the VS memory and watchlists don't see the '5' (they only see zeros), however the rowStartThisThread  pointer does show it correctly. Am I making some bad assumptions here?


gemm sample program is black out

I tried General Matrix Multiply (GEMM) sample ( on my pc.

But, if the matrix size is more than 2048 x 2048 program stops with black out.

Black out back in a few seconds and .

I think program stops at EnqueNDRange.

According to my calculations, there is no problem on the memory size, work item size, work group size, etc...

pc specifications

Core i5-4440

memory 4GB

Windows 7 64bit

Why can I increase matrix size?

Handle many kernel arguments in OCL Kernel

Dear all,

I am developing an OpenCL kernel for particle simulation and I face a problem. I have to transfer a lot of arguments to the device (I have not counted properly, but from the original code it could be more than 20 or 30 float arrays). Could you suggest me a way to handle properly such amount of arguments without having to call clSetKernelArgument() and clEnqueueWriteBuffer() more than 20 times?.

Try OpenCL 2.0 on HD5500 iGPUs

Hi Robert, 

From the experience of using recent Intel OpenCL SDK, it shows that the OpenCL 2.0 is only support for CPU.

The OpenCL version for GPU is still 1.2.

However, by checking Khronos OpenCL products,

it seems OpenCL 2.0 now also works on the HD5500 and up iGPUs.

Just want to know whether the latest OpenCL SDK ( or OpenCL Code builder ) support OpenCL 2.0 on HD5500 and up iGPUs ?

Assine o OpenCL*