Interop advice - switching from GPU to CPU ...

Interop advice - switching from GPU to CPU ...

djwarder's picture

Hi there
I'm writing some reasonably complex physics code that I'm hoping to release for use on OpenCL enabled GPUs (both nVidia and ATI), but I was wondering what would be the best way to allow this code to be run on PCs without GPU capability, i.e on an Intel multi-core CPU?

Would it best best to write code for each device or could I have a system that 'falls back' to CPU use if the GPU is not powerful enough or doesn't have enough memory? Also, is it possible to use all devices on a machine, so use the CPU for one piece of code and then the GPU for another piece of code?

Thanks in advance
Dan

3 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.
Brijender Bharti (Intel)'s picture

Hi Dan,
It looks like you have multiple questions:
1. MulticoreCPU w/o GPU capabilities. You can definitely use CPU Opencl to run this code. Make sure to provide somehints to OpencL compiler for Simdification. Secondly, buffer objects are best for CPU, dont use the image objects. There are many optimization techniques availble mentioned in Optimization guide for Intel Opencl and you may want to take look. You may want to look at threading too.
2. Intel HD GPU +CPU: It is quite possible to use it and devide the work in between two. If two are running inidependent tasks then there is no issue. But if there is a pipeline of work between Intel HD 4000 and CPU, make sure you are not doing multiple copies between two devices. Let one device finish the full task and then another device do the rest of the tasks. E.g. CPU -> GPU. Dont try CPU->GPU-CPU-GPU, where the tasks in pipeline are hopping between two devices. Thedata copies may not be good for performance. Also use the USE_HOST_PTR to create the buffers, however you may want to make sure the allocaed buffer is memory aligned. Also if you have a pipeline where you are using DX10 or DX9 surfaces, Intel OpenCL allows you to share the surface between OpenCL and non-opencl task without copying. There is Intel vendor specific extentions for DX9.
3. Intel HD 4000: If your code is running pretty well on other GPUs, i will definitely give it a try on HD4000 and it can wrork pretty well. ONE SUGGESTION, Intel HD has a pretty good and fast shared local memory. Make sure to use it for all small tables or any coeffiecient. it will improve the performance alot.

martinn's picture

Regarding 2. Is it really that case that data most be copied back and forth in the CPU->GPU-CPU-GPU case? I had hoped that the CPU and the built-in GPU could share memory areas and thus allow for zero-copy operation in this case.

Login to leave a comment.