mad() for float always returns 0

mad() for float always returns 0


after manually converting most multiply-add expressions in my kernel to mad() calls for testing, the Intel OpenCL implementation started to return wrong results, although the implementations by other vendors work fine. Try this code in the offline compiler tool coming with SDK (64-bit):

__kernel void a(int2 in_res,float2 pos)
int2 coord=convert_int2(pos);
float2 t=pos-convert_float2(coord),s=1.0f-t;

//volatile float value=mad(t.x,t.y,s.x);
volatile float value=t.x*t.y+s.x;

The above will create correct code, something like

_a: # @a
# BB#0:
sub RSP, 36
movq XMM0, QWORD PTR [RSP + 84]
cvttps2dq XMM1, XMM0
cvtdq2ps XMM1, XMM1
subps XMM0, XMM1
movss XMM1, DWORD PTR [RIP + LCPI3_0]
subss XMM1, XMM0
pshufd XMM2, XMM0, 1
mulss XMM2, XMM0
addss XMM2, XMM1
movss DWORD PTR [RSP + 32], XMM2
add RSP, 36

for the non-vectorized version. If you use the commented-out mad() call instead, I get this (again for the non-vectorized version):

_a: # @a
# BB#0:
sub RSP, 36
mov DWORD PTR [RSP + 32], 0
add RSP, 36

So a constant value of 0 is written to "value", which seems quite wrong to me ;-)

The auto-vectorized versions suffer from the same bug, by the way.

PS: Interestingly, passing "-cl-mad-enable" to the compiler does not change anything, it does not make the issue appear for the first version.

3 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hm, no reply yet? Can any Intel official reproduce the issue?


Thanks for the detailed report. We will investigate the issue and fix it.


Leave a Comment

Please sign in to add a comment. Not a member? Join today