Hi, I'm doing some simple OpenCL tests and I noticed that my OpenCL code executes much fasterslower with the AMD APP SDK 2.4. I'm running a 64bit Linux with an Intel Core i5 750 @ 2.67GHz and an NVIDIAGeForce GTX 550 Ti. When I run the following OpenCL program (vector and global work size si 4096, local work size is 256):
#pragma OPENCL EXTENSION cl_khr_fp64 : enable
__kernel __attribute__((vec_type_hint(double)))
void add(__global __read_only double *a,
__global __read_only double *b,
__global __write_only double *c)
{
size_t i = get_global_id(0);
c[i] = a[i] + b[i];
}I get these execution times on average: INTEL: 1.45s
AMD: 0.45s
NVIDIA: 1.1s (probably due to overhead)
native: 2.35s (single CPU)
What could be the reason?
The times I meassured were actually bogus. Correct numbers can be found in post #18. I was also using clWaitForEvents instead of clFinish. According to the Tips and Tricks for Kernel Development it is better to use clFinish with the Intel OpenCL SDK.
Thanks



