Xeon Phi wrong behavior

Xeon Phi wrong behavior


I have a new 'strange' behavior of a OpenCL kernel using the Xeon Phi.

In this case,  I have a small example written in HPL that executes perfectly in CPU, GPU but not in XEON PHI.

I've attached the example in a .cpp file. You can download HPL library to test it or you can reproduce it with OpenCL (If you need the OpenCL code, please ask me). The problem is in the following loop:

inline Double ipow(Double aa, Int a, Int b)
  Double q, qaux;
  Int n, n2;
  Int two_pow = 0;
  q = aa;
  n = a;
  while_(two_pow < 100) {
    n2 = n / 2;
    if_(n2 * 2 == n) {
      qaux = 1.5*q;
      q = qaux;
      n = n2;
    else_ {
      n = n * b;
      two_pow = 200;
//    two_pow++;


As I wrote above, if you uncomment the line (superfluous line) the code executes perfectly in the three (Intel CPU, NVIDIA GPU, and XEON PHI) platforms but the line is commented, the code fails in the XEON PHI case. The global and local work spaces are {1,1,1}.

Why it fails for PHI case? Thank you so much


Moisés Viñas



5 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hi Moisés,

I can't compile the attached reproducer for Xeon Phi, I've got compilation error for 'ACCELERATOR' macro. I guess this is because the public version of HPL doesn't support accelerator device type. Could you please provide a more recent version?

Meanwhile for CPU I got:

The first value fluctuates slightly from run to run. Is the output correct?


Hi Yuri,


Yes, the public version is outdated :-/ But the problem will be solved if you change the following files in /src subfolder for these ones: CLbinding.cpp, Device.h and Device.cpp


The results for a[i] = 1.5^7 x 3 for i!=0. For i = 0 is an aleatory number. However, for XEON PHI, the value is 1.5 x 3 Why?


Thanks in advance,


Moisés Viñas



Ok, now it's working for me and I was able to reproduce the issue using latest public release (XE R3).
But it works correctly on our internal development version. So please expect a fix to be available in next release (no estimates about the date, as usual).
If time permits I will also try to add this reproducer to our test system to be sure that regression is not introduced.


Yuri, my Xeon Phi acts strange with a simple convolution kernel. I believe the results it shows are incorrect. Is there a known problem with opencl support for phi?


For kernel enqueue: 

global size set to 1024x1024

local size set to NULL.



__kernel void Convolve(const __global  float * pInput,
                        __constant float * pFilter,
                        __global  float * pOutput,
                        const int nInWidth,
                        const int nFilterWidth)
    const int nWidth = get_global_size(0);

    const int xOut = get_global_id(0);
    const int yOut = get_global_id(1);

    const int xInTopLeft = xOut;
    const int yInTopLeft = yOut;

    float sum = 0;
    for (int r = 0; r < nFilterWidth; r++)
        const int idxFtmp = r * nFilterWidth;

        const int yIn = yInTopLeft + r;
        const int idxIntmp = yIn * nInWidth + xInTopLeft;

        for (int c = 0; c < nFilterWidth; c++)
            const int idxF  = idxFtmp  + c;
            const int idxIn = idxIntmp + c;
            sum += pFilter[idxF]*pInput[idxIn];
    const int idxOut = yOut * nWidth + xOut;
    pOutput[idxOut] = sum;

Leave a Comment

Please sign in to add a comment. Not a member? Join today