The compiler does not optimize at all constants operations using SIMD intrinsics

The compiler does not optimize at all constants operations using SIMD intrinsics

Hi,

I know that SIMD intrinsics in C/C++ are very limited and some qualifiers (constant, volatile, etc.) are dropped off but I find really disappointing that this affects the quality of the assembly code. For example, let's say you have the following simple code:

int a = (5 + 2)/2;

In this case, the compiler computes the constant expression at compile time and it generates just a movement

movl $3, %eax

However, if you provide this SIMD code:

__m512i a = _mm512_div_epi32(_mm512_add_epi32(_mm512_set1_epi32(5), _mm512_set1_epi32(2)), _mm512_set1_epi32(2));

The compiler is not able to simplify the code and it generates:

vmovaps .L_2il0floatpacket.3(%rip), %zmm1
 vpaddd .L_2il0floatpacket.2(%rip), %zmm1, %zmm0
 call __svml_idiv16

which is really inneficient. Note that it is not even able to detect that you are dividing by 2, which should be optimized by a shift.

Of course, I'm compiling with -O3, so I would like to know if it is possible to make the compiler optimize this kind of things in intrinsics since I'm not able to provide a better optimized code.

Kind regards.

Barcelona Supercomputing Center
4 帖子 / 0 全新
最新文章
如需更全面地了解编译器优化,请参阅优化注意事项
iliyapolak的头像

It seems that compiler decided to call svml integer division function which will add the latency of call instruction to the latency of division instruction.

jimdempseyatthecove的头像

Did you get the instruction sequence from a .ASM listing file produced by the compiler?
Or, did you get the instruction sequence from a debugger dissassembly window or VTune dissassembly window?

If from .ASM file, I suggest you capture a release build (with no .ASM file) and look at the resultant code in the debugger (or VTune). You may see additional optimizations.

Jim Dempsey

www.quickthreadprogramming.com
jimdempseyatthecove的头像

Did you get the instruction sequence from a .ASM listing file produced by the compiler?
Or, did you get the instruction sequence from a debugger dissassembly window or VTune dissassembly window?

If from .ASM file, I suggest you capture a release build (with no .ASM file) and look at the resultant code in the debugger (or VTune). You may see additional optimizations.

Jim Dempsey

www.quickthreadprogramming.com

登陆并发表评论。