Is there a way to get reproducible float results from kernels across all devices?
I'm running an OpenCL kernel on different devices (different CPUs and GPUs), and the computation results for floats differ.
On a system with an i7-3770 CPU, its integrated Intel HD 4000 GPU, and an AMD Capeverde GPU, all possible combinations of OpenCL platform (AMD or Intel) and device lead to bit-by-bit identical results.
On another system with an i3-4010U CPU (with integrated HD 4400 GPU), the Intel OpenCL platform on the GPU produces the same results as the first system, but for the CPU, results differ.
The kernel compiler command line is always "-cl-fp32-correctly-rounded-divide-sqrt".