Intel C++ : _mm256_set1_ps suboptimal ?

Intel C++ : _mm256_set1_ps suboptimal ?

I'm in the process of porting a (huge) piece of code from SSE to AVX, looking at the ASM generated by the compiler (Intel C++ Pro 11.1 build #38 IA32 / Windows) I have just remarked that _mm256_set1_ps spits outthis convoluted sequence :

movss xmm0, DWORD PTR [edi+eax*4]

unpcklps xmm0, xmm0

movlhps xmm0, xmm0

vinsertf128 ymm1, ymm0, xmm0, 1

instead ofthemuch simpler :

vbroadcastss ymm0, DWORD PTR [edi+eax*4]

did I miss something or is it simply something that should be improved in a forthcoming version of the compiler ?

1 post / 0 new
For more complete information about compiler optimizations, see our Optimization Notice.