Kernel returns wrong results

Kernel returns wrong results

Hello,I created a kernel for summing up some small matrices. The operation is the same for a large set of such matrices. When compiling the kernel, then compiler generates a kernel-object. The compiler says that the kernel was not vectorized.When I execute the kernel, the results are just wrong.Running the same code using the AMD OpenCL SKD gives correct results.The kernel looks like this:__kernel void calcAxA( const int n, const int n0, const int m, const int nm, const __global int* nmMask, const __global double* nmJ, const __global double* nmE, __global double* AxA, __global double* AxE){ int j = get_global_id(0); int j0 = j - n0; if (j0 < 0) return; double axeT[6]; double axaT[6*6]; for (int i = 0; i < 6 * 6; ++i) axaT[i] = 0.0; for (int i = 0; i < 6; ++i) axeT[i] = 0.0; // Sum up in local variables for (int i = 0; i < m; ++i) { int ij = nmMask[i * n + j]; if (ij == -1) continue; int r0 = ij * nParams; int r1 = (nm + ij) * nParams; for (int r = 0; r < 6; ++r) { for (int c = 0; c < 6; ++c) { axaT[6 * r + c] += nmJ[r0 + c] * nmJ[r0 + r] + nmJ[r1 + c] * nmJ[r1 + r]; } axeT[r] += nmJ[r0 + r] * e[2 * ij + 0] + nmJ[r1 + r] * nmE[2 * ij + 1]; } } // Assign sums to global arrays for (int i = 0; i < 6; ++i) { for (int k = 0; k < 6; ++k) { AxA[6 * j0 + (n - n0) * i * 6 + k] = axaT[6 * i + k]; } AxE[6 * j + i] = axeT[i]; }}Other topic:When compiling the cl code, the Intel OpenCL SDK returns the message::1:26: warning: expected identifier in '#pragma OPENCL' - ignoredfor the line#pragma OPENCL EXTENSION cl_khr_fp64 : enable.But I can't find the problem causing the error message. But looking at other posts, the message seems to be pretty common.Any ideas?Thanks,Rasmus

4 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

I am sure you did this but just to confirm you do have "#pragma OPENCL EXTENSION cl_khr_fp64 : enable" at the top of your ".cl" file right? This is required to enable double precision support as conformant to the extension spec:

"OpenCL 1.0 adds support for double precision floating-point as an optional extension. An application that wants to use double will need to include the #pragma OPENCL EXTENSION cl_khr_fp64 : enable directive before any double precision data type is declared in the kernel code."

I am guessing you did this but the compile seems to be sayaing it didn't vectorize your code because you are using double precision support without enabling it.

Yes, the#pragma OPENCL EXTENSION cl_khr_fp64 : enableis at the top of my .cl file. The .cl file contains some more kernels using double data. The compiler vectorizes the other kernels and executing them gives the expected results. But the kernel shown above is not vectorized and returns wrong results.Rasmus

OK - found it finally. It was the improper usage ofCL_MEM_USE_HOST_PTR. If used correctly, everything works as expected.Rasmus

Leave a Comment

Please sign in to add a comment. Not a member? Join today