Writing to a Shared Resource

According to the OpenCL* specification, you need to ensure that commands that change the content of a shared memory object, complete in the previous command queue before the memory object is used by commands, executed in another command-queue. One way to achieve this is using events:

cl_event eventGuard;
cl_buffer bufferShared=clCreateBuffer(shared_context,CL_MEM_READ_WRITE…);
//Populating the buffer from the host, queue is regular in-order 
clEnqueueWriteBuffer(cpu_queue, bufferShared,…);
//Setting the arguments and processing buffer with a kernel
SetKernelArg(kernel, 0, sizeof(cl_mem), (void*)&bufferShared);
…
clEnqueueNDRangeKernel(cpu_queue, kernel, … &eventGuard);
//make sure the first device is done
clWaitForEvents(1, &eventGuard);
//alternatively you can use clFinish(cpu_queue) if in the same thread
…
//Now using buffer by second device
clEnqueueWriteBuffer(gpu_queue, bufferShared,…);
clEnqueueNDRangeKernel(gpu_queue, kernel, … &eventGuard);
…

If you want to write data (or output kernel results) to the same buffer simultaneously on two devices, you need to use properly aligned, non-overlapping sub-buffers.

cl_buffer bufferShared = clCreateBuffer(shared_context, CL_MEM_ WRITE …);
//make sure alignment for the resp devices 
cl_int gpu_align;
clGetDeviceInfo(gpuDeviceId, CL_DEVICE_MEM_BASE_ADDR_ALIGN,…&gpu_align);
gpu_align /= 8; //in bytes
//make sure that cpuPortion is properly aligned first!
cl_buffer_region cpuBufferRegion = { 0, cpuPortion};
cl_buffer_region gpuBufferRegion = { cpuPortion, theRest};
cl_buffer subbufferCPU = clCreateSubBuffer(bufferShared, 0, 
	CL_BUFFER_CREATE_TYPE_REGION, &cpuBufferRegion, &err);
cl_buffer subbufferGPU = clCreateSubBuffer(bufferShared, 0, 
	CL_BUFFER_CREATE_TYPE_REGION, &gpuBufferRegion, &err);
//now work with 2 sub-buffers on 2 devices simultaneously - (refer to the //prev. section)
..
//the sub-resources should be released properly
clReleaseMemObject(subbufferCPU);
clReleaseMemObject(subbufferGPU);
clReleaseMemObject(bufferShared);

See Also


Partitioning the Work (suggested next topic)
Tone Mapping Multi-Device SDK sample available for downloads at VCSource portal [Online Article]
Related Documents
The OpenCL* 1.1 Specification at http://www.khronos.org portal [PDF]
Overview Presentations of the OpenCL* Standard at http://www.khronos.org portal [Online Article]
Optimization Notice

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804


Submit feedback on this help topic

Copyright © 2010-2012, Intel Corporation. All rights reserved.