AVX: _mm_alignr_epi8 equivalent for YMM registers ?

AVX: _mm_alignr_epi8 equivalent for YMM registers ?

imagem de emmanuel.attia

Hello, I am looking for an equivalent to the most useful intrinsic_mm_alignr_epi8 with AVX registers (I guess its equivalent to PALIGNR or VPALIGNR for the one who are not familiar with C intrinsics). More precisely i would need the equivalent of an hypothetical _mm256_alignr_ps (i need float granularity, not byte one). Since there is no "slri_si256" or "slli_si256", I have though of a solution with _mm256_permute2_ps but this intrinsic does not seem to be available on my compiler (and maybe neither on my Core i7 2600K). I am using Intel XE 12 Update 4 for Windows. Right now I have used extractf128/insertf128 combined with two alignr_epi8 but the performances are as expected very bad (i.e my AVX code is slower than the SSE one) because of the mixing of XMM and YMM instructions. Best regards Emmanuel

3 posts / 0 new
Último post
Para obter mais informações sobre otimizações de compiladores, consulte Aviso sobre otimizações.
imagem de Brijender Bharti

Hi,
one question, are you writing this code in assembly or in intrinsics? if you are writing in intrinsics, compiler should not mix up XMM and YMM in this particular case and should generate right form of the instruction.
However, i think you can use _mm256_permute_ps (not permute2_ps) intrinsic. But again, you may have to use use permute2f128 or extractf128/insertf128 combination to bring part of the upper lane to lower lane. If you can post the code, it may be easy to suggest you the best way to do so. As you know there is no 256bit alignr. it is only 128bit.

imagem de bronxzv

IIRC vpermil2ps (_mm256_permute2_ps) was removed from the AVX specs at the same time than FMA4 was replaced by FMA3

vpermil2ps is now part of AMD's XOP (to be featured inupcoming Bulldozer products along FMA4), AFAIK no Intel CPU has been announced with support for it, though

it looks like you want to shift values between the 2 128-bit lanes and I'm afraid there isn't any fast option available in standard AVX, yet

Faça login para deixar um comentário.