Hi, I have an OpenCL program of simulation that consists in a loop that launch 4 kernels per iteration. The execution can last hours.
I've launched this same application in Nvidia Fermi, ATI Radeon HD, Intel CPU X5650, Intel CPU E5... Now, I'm launching this application in Xeon Phi.
I found that reduction algorithm from NVidia SDK works on HD Graphics 4400 but don't work on Intel CPU i5.
I've expected that Nvidia algorithm works everywhere OR work only on Nvidia hardware so that difference in behavior between CPU and GPU on SAME machine looks strange for me.
Reduction algorithm and C# + OpenCL.NET unit test are in attachments. Unit test fails on Intel CPU with size = 4.
What differences in kernel execution exists between CPU and GPU? How can I fix the problem?
A question from newbie.
I am trying to use a reduction algorithm from NVidia SDK. It works correctly on Nvidia Discrete GPU, Intel HD Graphics 4400, but don't work on Intel CPU (Haswell i5).
Reduction method source:
The standard API for 3D graphics on Android is OpenGL ES, which is the most widely used 3D graphics API on all mobile devices today. Android uses OpenGL ES to accelerate both 2D and 3D graphics. In early releases of Android, OpenGL ES acceleration was somewhat optional, but as Android has evolved and screen sizes have grown, accelerated OpenGL ES has become an essential part of the Android graphics system.
Join us for this second part of the OpenCL and Intel Graphics Webinar Series.
In the November 6 webinar we are covering how to right efficient code for Intel Graphics. In this webinar you will learn how to use Intel® SDK for OpenCL Applications tools to apply the lessons learned in the previous webinar and to create, analyze, and build your OpenCL applications using Intel Graphics faster.
This is Part 2 of a three-part Webinar Series. See November 6 and December 4 for more information about other webinars in the series