• 2019 Update 4
  • 03/20/2019
  • Public Content
Contents

Writing to a Shared Resource

According to the OpenCL™ specification, you need to ensure that the commands that change the content of a shared memory object, complete in the previous command queue before the memory object is used by commands, executed in another command-queue. One way to achieve this is using events:
cl_event eventGuard; cl_buffer bufferShared=clCreateBuffer(shared_context,CL_MEM_READ_WRITE…); //Populating the buffer from the host, queue is regular in-order clEnqueueWriteBuffer(cpu_queue, bufferShared,…); //Setting the arguments and processing buffer with a kernel SetKernelArg(kernel, 0, sizeof(cl_mem), (void*)&bufferShared); … clEnqueueNDRangeKernel(cpu_queue, kernel, … &eventGuard); //make sure the first device is done clWaitForEvents(1, &eventGuard); //alternatively you can use clFinish(cpu_queue) if in the same thread … //Now using buffer by second device clEnqueueWriteBuffer(gpu_queue, bufferShared,…); clEnqueueNDRangeKernel(gpu_queue, kernel, … &eventGuard); …
If you want to write data (or output kernel results) to the same buffer
simultaneously
on two devices, use properly aligned, non-overlapping sub-buffers.
cl_buffer bufferShared = clCreateBuffer(shared_context, CL_MEM_ WRITE …); //make sure alignment for the resp devices cl_int gpu_align; clGetDeviceInfo(gpuDeviceId, CL_DEVICE_MEM_BASE_ADDR_ALIGN,…&gpu_align); gpu_align /= 8; //in bytes //make sure that cpuPortion is properly aligned first! cl_buffer_region cpuBufferRegion = { 0, cpuPortion}; cl_buffer_region gpuBufferRegion = { cpuPortion, theRest}; cl_buffer subbufferCPU = clCreateSubBuffer(bufferShared, 0, CL_BUFFER_CREATE_TYPE_REGION, &cpuBufferRegion, &err); cl_buffer subbufferGPU = clCreateSubBuffer(bufferShared, 0, CL_BUFFER_CREATE_TYPE_REGION, &gpuBufferRegion, &err); //now work with 2 sub-buffers on 2 devices simultaneously - (refer to the //prev. section) .. //the sub-resources should be released properly clReleaseMemObject(subbufferCPU); clReleaseMemObject(subbufferGPU); clReleaseMemObject(bufferShared);
See Also
The OpenCL™ 1.2 Specification at http://www.khronos.org/registry/cl/specs/opencl-1.2.pdf

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804