clCreateKernel error while using Intel motion estimation accelerator

We are using the sample program for motion estimation accelerator from the following site:

We are using a laptop with Intel(R) Core(TM) i7-4750HQ CPU @ 2.00GHz, with Intel(R) Iris(TM) Pro Graphics 5200. The system is running Windows 8.1 Professional 64-bit.

The impact of Shared Memory for workgroup scheduling between Ivybridge HD4000 and haswell HD4600


I have kernel which is defined as local size=16, global size=256, and in each workgroup there are 32KB shared memory allocated.

I run my kernel on Ivybridge 4000, and got the GPU idle state account for 75% percent, which is fine. As per half-slice there are 64KB shared memory, so only two (64KB/32KB) workgroups can be launched per half slice. Each workgroup schedule on one EU, so at most two EUs are active per half-slice, which brings us the idle number 1- (2 active EUs)/(8 EUs per half-slice) = 0.75.

Relatively simply kernel (included) fails to compile on GPU, compiles on CPU & other platforms

I was able to simplify the kernel to a very small state in order to make it easier to track down the bug. Logically, this kernel may not be correct however syntax seems OK to me and it compiles with on the CPU, as well as other platforms (AMD CPU and GPU).

This is on an Intel Core i3 32xx CPU, the IGP is an Intel HD 2500. I'm using the latest driver build (3345), windows 7 x64 and the Kernel Builder x64 application.

Concurrent memory accesses between Intel CPU cores and HD graphics cores


I have tried to use both Intel CPU cores and HD graphics cores simultaneously under Intel OpenCL SDK. The first thing I tried is a simple memory copy kernel to see whether the transfer from global to private memory (and vice versa) occurs simultaneously for both Intel CPU cores and HD graphics cores. Here are parts of my source codes.

#define FLOAT float
__kernel void assign(__global FLOAT *x, __global FLOAT *y)
size_t idx = get_global_id(0);
y[idx] = x[idx];

breakpoints in cl files #included by a another cl file are not hit


my main cl file does not have any kernels, but instead #includes other cl files. This works just fine, except that the debugger will not hit the breakpoints in the #included file. It seems they will only be hit if they are in the top-level cl file, i.e. the file that I provided to clBuildWithSource and as "-s" value to clBuildProgram's build options.

Stepping into a function that resides in a #included file works fine, but in my case the top-level cl file does not have a kernel to begin with. Can you think of a workaround?


unknown optimization on x64

I have written a benchmarking application for opencl . One of the tests include measuring compute capacity(gflops) of the device. When run on windows 32, it gives expected results on sandybridge as

Platform: Intel(R) OpenCL
  Device:       Intel(R) Core(TM) i7-3630QM CPU @ 2.40GHz
    Driver version: 1.2 (Win32)

Max size error on Creating Buffer using Alloc_host_ptr

Hello all,

The max mem alloc size of my cpu device (i5-3470) is 4266006528(less than 4GB) and that of gpu (hd-2500) is 425721856(less than 512MB).

Now i am creating a simple buffer clInput = clCreateBuffer(context, CL_MEM_READ_ONLY | CL_MEM_ALLOC_HOST_PTR, sizeof(type) * elements, NULL, &err);

Visual Studio 2010 opencl offline-compiler compiles always, even when sources have not been changed

the Visual Studio 2010 plugin from the 2013 Opencl SDK proved to be very helpful so far. However, to me it looks like it compiles the cl files always, even when the source files have not been changed. This is a bit annoying and costs a lot of time. Is this expected behavior or have I not configured my project correctly?

The same behavior can be seen in the SimpleOptimizations API example and I guess that's why the cl files are manually excluded from the build.
If this is expected behavior, then I guess, this would be my first feature request.


OpenCL* abonnieren