What's the best practise for memcpy using Intel Opencl for CPU?

What's the best practise for memcpy using Intel Opencl for CPU?

I am trying to develop opencl code on the intel's cpu, and I have a question on the memcpy using opencl.
Does the Opencl on CPU has a efficient way to copy a sub section of data from a large array into a new buffer?
e.g. for a array that saved the image data with sz 1000x1000, I want to cp a 19x19 section of the image into a new array and do some computing on the section. I could not find a efficient way to do that. Just copy the data one by one is extremly inefficient. And because of the alignment problem I can not use vectors to do the copy. Does anyone know the good practise for memcpy in opencl?


3 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.


clEnqueueReadBufferRect/clEnqueueWriteBufferRect/clEnqueueCopyBufferRect for host side copying.

Thanks, but I am afraid my algorithm cannot do that. Here I would like explain how my algorithms works.
there are multiple kernels, each later kernel will denpends on the previous kernel's output.

data--.> kernel1 --> output1/input for kernel2 --> kernel2 --> output2/input for kernel3 --> kernel3 -->finished

to make the latency minimized( the whole algorithm is part of real time app), I did not call the clwaitforevent until the last kernel was enqueued. And the the memcpy happend in the kernel3. the copy position comes from the output of kernel2. I need copy thousands of small data into a new array so that I can utilize the cache memory. But now I found the memcpy is a problem. the performance is really bad. Can anyone suggest a good way to do the memcpy?


Leave a Comment

Please sign in to add a comment. Not a member? Join today