mad() for float always returns 0

mad() for float always returns 0

Ritratto di sschuberth


after manually converting most multiply-add expressions in my kernel to mad() calls for testing, the Intel OpenCL implementation started to return wrong results, although the implementations by other vendors work fine. Try this code in the offline compiler tool coming with SDK (64-bit):

__kernel void a(int2 in_res,float2 pos)
int2 coord=convert_int2(pos);
float2 t=pos-convert_float2(coord),s=1.0f-t;

//volatile float value=mad(t.x,t.y,s.x);
volatile float value=t.x*t.y+s.x;

The above will create correct code, something like

_a: # @a
# BB#0:
sub RSP, 36
movq XMM0, QWORD PTR [RSP + 84]
cvttps2dq XMM1, XMM0
cvtdq2ps XMM1, XMM1
subps XMM0, XMM1
movss XMM1, DWORD PTR [RIP + LCPI3_0]
subss XMM1, XMM0
pshufd XMM2, XMM0, 1
mulss XMM2, XMM0
addss XMM2, XMM1
movss DWORD PTR [RSP + 32], XMM2
add RSP, 36

for the non-vectorized version. If you use the commented-out mad() call instead, I get this (again for the non-vectorized version):

_a: # @a
# BB#0:
sub RSP, 36
mov DWORD PTR [RSP + 32], 0
add RSP, 36

So a constant value of 0 is written to "value", which seems quite wrong to me ;-)

The auto-vectorized versions suffer from the same bug, by the way.

PS: Interestingly, passing "-cl-mad-enable" to the compiler does not change anything, it does not make the issue appear for the first version.

3 post / 0 new
Ultimo contenuto
Per informazioni complete sulle ottimizzazioni del compilatore, consultare l'Avviso sull'ottimizzazione
Ritratto di sschuberth

Hm, no reply yet? Can any Intel official reproduce the issue?

Ritratto di Boaz Ouriel (Intel)


Thanks for the detailed report. We will investigate the issue and fix it.


Accedere per lasciare un commento.