clEnqueueReadBuffer crashes if using printf and global work size > 1

clEnqueueReadBuffer crashes if using printf and global work size > 1

System: Intel Core 2 Duo SL 9400, Windows 7 64 bits, Intel OpenCL SDK 1.1 Beta

I have a problem I'm unable to debug. My program crashes when calling a kernel with a printf in it.
The program does not crash when I remove the printf statement or when I use global work size = 1 (but in any case it does not print anything).

I could not find any documentation about printf, except that works like in C99 and that requires to enable the extension. Are there any other requirements?

12 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hello goldmund99,

Could you please provide us with a minimal code sample that reproduces your problem?

It does not have to be your full kernel, if that's impossible for you due to security reasons.

Thanks in advance

The kernel is:

#pragma OPENCL EXTENSION cl_intel_printf : enable
__kernel void sensitivity(__global int * dest)
int tid = get_global_id(0);
dest[tid] = tid*2;

Maybe I did some error during testing, now it crashes also with global_work_size = 1 and local_work_size = NULL.
If I remove the printf statement, it works as espected.

More details:
The reported error is "Stack overflow"

The code I use to run the kernel is as follows:

cl_int errorcode = CL_SUCCESS;
n = 1;
std::vector dst_host(n);

cl_context context = clCreateContextFromType(context_properties, CL_DEVICE_TYPE_CPU, NULL, NULL, &errorcode);
cl_command_queue cmd_queue = clCreateCommandQueue(context, device, properties, &errorcode);
cl_mem dest_mem = clCreateBuffer(context, CL_MEM_WRITE_ONLY, n*sizeof(cl_int), &dst_host[0], &errorcode);
cl_program program = clCreateProgramWithSource(context, 1, (const char**)kernel_source, lengths, &errorcode);
errorcode = clBuildProgram(program, NULL, NULL, NULL, NULL, NULL);
cl_kernel kernel = clCreateKernel(program, "sensitivity", &errorcode);
errorcode = clSetKernelArg(kernel,0,sizeof(cl_mem),(const void *)dest_mem);
errorcode = clEnqueueNDRangeKernel(cmd_queue, kernel, 1, NULL, 1, NULL, NULL, NULL);
errorcode = clEnqueueReadBuffer(cmd_queue, dest_mem, CL_TRUE, 0, n*sizeof(cl_int), &dst_host[0],NULL,NULL,NULL); // <- CRASH

Also very strange: when I compile the kernel with the Intel OpenCL SDK Offline Compiler it gives me this warning:

Build succeeded!
:1:26: warning: expected identifier in '#pragma OPENCL' - ignored
Build started
Kernel was successfully vectorized


We will check the warning issue, but it's not related to the failure.

Please check the following line:

errorcode = clEnqueueNDRangeKernel(cmd_queue, kernel, 1, NULL, 1, NULL, NULL, NULL);

According to the spec. the global range parameter should be passed as array and not as constant.
Please use this sequence:

size_t global = 1;
errorcode = clEnqueueNDRangeKernel(cmd_queue, kernel, 1, NULL, &global, NULL, NULL, NULL);

Actually I'm using a pointer as the specification says, I just changed in the post the sentence to made the sentence clearer...sorry for the mistake.
I reported the API code just for reference but it is the same as in provided examples, I checked that the same code works with different kernels and also for the minimal kerner I wrote, if I comment the printf statement.

We will try your code and update you
Thanks, Shiri

Hello goldmund99,

To reproduce your problem quickly, it would help a lot if you could send us the complete source code file that crashes for you. We need the host C++ code that invokes the OpenCL SDK, as well as the kernel (if that's in a separate file). Getting the complete code, as you yourself run it, is very important.

Is this possible?



Thanks. We'll try to reproduce the problem and will get back to you once we have some conclusions.

I have found a workaround for now.
The problem is not clEnqueueReadBuffer, because the crash happens also by calling clFinish after clEnqueueNDRangeKernel.

I was calling clEnqueueNDRangeKernel using NULL as argument for local_work_size. According to the 1.1 specification: "local_work_size can also be a NULL value in which case the OpenCL implementation will determine how to be break the global work-items into appropriate work-group instances."

If i explicit the local work size, for global_work_size = 10 and work_dim I call clEnqueueNDRangeKernel and get:

  • *local_work_size = 0 -> Error: CL_INVALID_WORK_GROUP_SIZE (this is different than giving NULL pointer as argument), conformant with OpenCL 1.1
  • *local_work_size = 1-2 -> Returns CL_SUCCESS but crashes when calling clFinish
  • *local_work_size = 3-4 -> Error: CL_INVALID_WORK_GROUP_SIZE, conformant with OpenCL 1.1
  • *local_work_size = 5 -> Correct results, every 4-5 tries crashes
  • *local_work_size = 6-9 -> Error: CL_INVALID_WORK_GROUP_SIZE, conformant with OpenCL 1.1
  • *local_work_size = 10 -> Correct results, seems to never crash

We were able to reproduce the failure. It apears to be bug in the SDK compiler.
We are looking into that.
Thanks, Shiri

Leave a Comment

Please sign in to add a comment. Not a member? Join today