I know that SIMD intrinsics in C/C++ are very limited and some qualifiers (constant, volatile, etc.) are dropped off but I find really disappointing that this affects the quality of the assembly code. For example, let's say you have the following simple code:
int a = (5 + 2)/2;
In this case, the compiler computes the constant expression at compile time and it generates just a movement
movl $3, %eax
However, if you provide this SIMD code:
__m512i a = _mm512_div_epi32(_mm512_add_epi32(_mm512_set1_epi32(5), _mm512_set1_epi32(2)), _mm512_set1_epi32(2));
The compiler is not able to simplify the code and it generates:
vmovaps .L_2il0floatpacket.3(%rip), %zmm1 vpaddd .L_2il0floatpacket.2(%rip), %zmm1, %zmm0 call __svml_idiv16
which is really inneficient. Note that it is not even able to detect that you are dividing by 2, which should be optimized by a shift.
Of course, I'm compiling with -O3, so I would like to know if it is possible to make the compiler optimize this kind of things in intrinsics since I'm not able to provide a better optimized code.