I have encountered a possible code generation bug in the OpenCL runtime compiler for Intel CPUs on Windows platforms.
Please find attached an archive of source code (ocltest.zip) that reproduces the bug. It is a CMake project and you can build it with, e.g., the following commands:
$ unzip ocltest.zip $ mkdir ocltest-build $ cd ocltest-build/ $ cmake -G "Visual Studio 12 2013" -A "x64" ../ocltest/ $ cmake --build . --config RelWithDebInfo
Note that you need CMake, Visual Studio 2013 (or 2015), and an OpenCL SDK (Intel INDE or CUDA).
If I run the resulting executable (oclellipticpde.exe) on a PC with the following configuration
Intel Core i7 3770K @ 3.50 GHz, 8 GB RAM, Windows 10 x64
it terminates normally and we obtain the following output:
CL_PLATFORM_NAME: Intel(R) OpenCL CL_DEVICE_NAME: Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz CL_DEVICE_VERSION: OpenCL 1.2 (Build 57) CL_DRIVER_VERSION: 126.96.36.199 done
However, if I run the same executable on another PC with
Dual Intel Xeon X5650 @ 2.67 GHz, 24 GB RAM, Windows 7 x64
it crashes after the following output:
CL_PLATFORM_NAME: Intel(R) OpenCL CL_DEVICE_NAME: Intel(R) Xeon(R) CPU X5650 @ 2.67GHz CL_DEVICE_VERSION: OpenCL 1.2 (Build 57) CL_DRIVER_VERSION: 188.8.131.52
I have also attached a log of WinDbg for this. The faulting code locates on a rather low address space that does not correspond to any module so it is likely to be the jitted code.
Since the kernel works on other CPUs (and also on GPUs), I suppose it is correct. I am not quite sure exactly what configuration can cause the crash, I suspect the CPU architecture (i.e., SSE4) matters.
Can anyone reproduce the problem or point out what is wrong with the code or any workaround?