just a brief introduction: I'm the head of the linear algebra library ViennaCL (see http://viennacl.sourceforge.net/ ). We are very happy to see that Intel is going for OpenCL as well, since this allows us to run our OpenCL kernels on Intel hardware efficiently.
We played a bit with the current version of the SDK and took some timings. To our surprise, the Intel backend turned out to be better than the "alpha"-status suggests.
Comparisons are carried out for the AMD APP SDK 2.3 on an up-to-date Funtoo-Linux, a rather recent NVIDIA driver (260.19.21) on Windows 7, and the Intel SDK on a Windows 7, all on the same machine. We couldn't get all three implementations to run under Windows 7, that's why the timings for the APP SDK using the CPU are from Linux. Thus, timings have to be taken with a grain of salt, but a general tendency can be seen.
Test 1: Kernel compilation times. The scalar, vector and matrix kernels are compiled in timings (in seconds) are taken:
AMD: 0.08 / 0.18 / 0.15
NVIDIA: 0.34 / 0.53 / 0.42
INTEL: 0.03 / 0.09 / 0.07
So, the kernel compilation seems to be pretty good on your implementation.
Test 2: OpenCL transfer overhead. A vector with 100.000 entries is set up. Then, each entry is read individually, so 100.000 OpenCL read requests have to be handled:
AMD: 21 us / entry
NVIDIA: 91 us / entry (including PCI-Express overhead w.r.t. GPU)
INTEL: 7.1 us / entry
Even though an overhead of 7.1us per entry is still orders of magnitude larger than just reading for example an STL-vector, we are happy to see that there is improvement on the OpenCL backend :-)
One more important issue: Since ViennaCL aims at high performance scientific computing, we wish to emphasize the scientific community's demand for a Linux version of the SDK.
If you require more feedback or benchmarks, feel free to contact us.