OpenCL compiler generating mad instruction

OpenCL compiler generating mad instruction

I have a compute instensive kernel performing lots of multiply and adds.
Am using opencl mad() function and float16 for the compiler to generate avx mad instruction.
But when I see the ASM dump from intel offline compiler, it shows mul and add instructions on YMM registers but no mad at all.

    vmulps    YMM4, YMM3, YMMWORD PTR [R15 + R10 + 32864]
    vaddps    YMM2, YMM4, YMM2
    vunpckhps    YMM4, YMM0, YMM0
    vpermilps    YMM4, YMM4, 0
    vperm2f128    YMM4, YMM4, YMM0, 0
    vmulps    YMM5, YMM4, YMMWORD PTR [R15 + R10 + 32928]
    vaddps    YMM2, YMM5, YMM2
    vshufps    YMM5, YMM0, YMM0, 3

even tried cl-mad-enable (which was default) while building , but no change.
Am I missing something here?!

3 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.
Best Reply


current AVX implementation doesn't have a mad instruction. This guy will be introduced with AVX2.


Thanks for that. I saw FMA intrinsics on some intel page and didn't notice that it belonged to AVX2.
Hope its coming soon.

Leave a Comment

Please sign in to add a comment. Not a member? Join today