I've tested new 1.5 SDK with some selected OpenCL samples from AMD APP SDK 2.5. These samples allows easy selection of Platform/Device, so very useful for quick-n-dirty tests.
The results are discouraging:
BitonicSort (with -x 16777216 option): 10.85 sec for Intel OpenCL SDK 1.1 and 30.61 sec for version 1.5
EigenValue (-x 20480): 10.75/23.5 sec (1.1/1.5 versions)
Nbody (-x 102400): 6.87/20.0
RadixSort (-x 102400000): 9.08/12.75
SimpleConvolution: (-x 8192 -y 8192 -m 16): 9.83/9.99
Good Result (only one from six samples tested):
Histogram (-x 20480 -y 32768 -i 3): 0.92/0.83
The Histogram sample is the only one 'not vectorized' (Kernel ... not vectorizes message from Offline Compiler)
The samples was run with platform set to Intel (-p 2 on my machine) and timing (-t on). The times are for kernel+data transfer.
Machine is i7-2600K (so, AVX) @4.5GHz, Windows7/x64. Three OpenCL SDKs are installed (Nvidia, AMD and Intel)