I'm in the process of porting a (huge) piece of code from SSE to AVX, looking at the ASM generated by the compiler (Intel C++ Pro 11.1 build #38 IA32 / Windows) I have just remarked that _mm256_set1_ps spits outthis convoluted sequence :
movss xmm0, DWORD PTR [edi+eax*4]
unpcklps xmm0, xmm0
movlhps xmm0, xmm0
vinsertf128 ymm1, ymm0, xmm0, 1
instead ofthemuch simpler :
vbroadcastss ymm0, DWORD PTR [edi+eax*4]
did I miss something or is it simply something that should be improved in a forthcoming version of the compiler ?