Xeon PHI and enqueueCopyBuffer()

Xeon PHI and enqueueCopyBuffer()

Hi!

I'm having synchronization problems with enqueueCopyBuffer() and Xeon PHI. Suppose the following pseudocode:

/* Begin host code */
   cl::Buffer b0, b1; ///initialize with size N
   cl::CommandQueue c0, c1;
   cl::Kernel k;
   char hB[N]; ///initialize hB[i] to 5

   c0.enqueueWriteBuffer(b0, CL_TRUE, 0, N, static_cast<void *> (hB));

   c0.enqueueNDRangeKernel(k, cl::NullRange, 1, 1); //one only thread

   c1.enqueueCopyBuffer(b0, b1, 0, 0, N);

   c1.enqueueReadBuffer(b1, CL_TRUE, 0, N static_cast<void *> (hB));

/*End host code */

/* The kernel code would be: */
__kernel void add1(__global char *p, int n) {
   for(int i = get_global_id(0); i < n; i += 1) 
       p[i]++; 
} 

 

The cuestion is: After the enqueueReadBuffer call, What values are stored in hB? The answer is 'it depends'. With two Nvidia GPUs the values are 6 and with two Xeon PHI are 5. The problem is that the implementation of the enqueueCopyBuffer in the Xeon Phi is non-blocking and it needs an synchronization barrier between the enqueueNDRangeKernel call and the enqueueCopyBuffer. Is this a normal behavior or it is an implementation error? 

Thanks a lot in advance and good luck :)

 

Moisés

2 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Your bug is not in CopyBuffer, but is absent of synchronization between 2 command queues.

In your case either WriteBuffer or NDRange will start execution on device concurrently with CopyBuffer as command queues c0 and c1 are completely independent.

Leave a Comment

Please sign in to add a comment. Not a member? Join today