Bug in SDE emulation of AVX-512 _mm512_permutevar_ps() ?

Bug in SDE emulation of AVX-512 _mm512_permutevar_ps() ?

Hello,

I have an issue with SDE emulating _mm512_permutevar_ps() [aka VPERMPS] in an unexpected way. I understand from the documentation that it should behave as the 512 bit variants of _mm256_permutevar8x32_ps(), and be able to do cross-lane shuffling. So the attached file should reverse the content of the vector. It works with _mm256_permutevar8x32_ps(), but _mm512_permutevar_ps() clearly doesn't produce the expected results, but rather an intra-lane shuffling:

iv:      iv =    0    1    2    3    4    5    6    7 
dv:      dv = 7.000 6.000 5.000 4.000 3.000 2.000 1.000 0.000 
pv:      pv = 0.000 1.000 2.000 3.000 4.000 5.000 6.000 7.000 
iv:      iv =    0    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15 
dv:      dv = 15.000 14.000 13.000 12.000 11.000 10.000 9.000 8.000 7.000 6.000 5.000 4.000 3.000 2.000 1.000 0.000 
pv:      pv = 12.000 13.000 14.000 15.000 8.000 9.000 10.000 11.000 4.000 5.000 6.000 7.000 0.000 1.000 2.000 3.000 

Is the emulation wrong, or did I misunderstand something ?

Cordially,

AnhangGröße
Herunterladen synt.c2.64 KB
2 Beiträge / 0 neu
Letzter Beitrag
Nähere Informationen zur Compiler-Optimierung finden Sie in unserem Optimierungshinweis.

I found the problem. The documentation at <https://software.intel.com/en-us/node/485351> is wrong, as it claims that "_mm512_permutevar_ps" "Shuffle float32 elements across lanes." It doesn't (unlike _mm512_permutevar_epi32, which does...), per "https://software.intel.com/sites/landingpage/IntrinsicsGuide/". The intrinsics that permute accross lane is "_mm512_permutexvar_ps".

Melden Sie sich an, um einen Kommentar zu hinterlassen.