64 bit element "duplication" inside zmm register for complex multiplication

64 bit element "duplication" inside zmm register for complex multiplication

Alastair M.'s picture

Dear all,

I am wondering about the best known method for implementing the following operation as part of a complex number multiplication. The input value is a zmm register which contains 4 double complex numbers in the following arrangement {c1.re,c1.im,c2.re,c2.im,c3.re,c3.im,c4.re,c4.im}

I want to separate these into two registers containing all four real parts duplicated and also also for imaginary parts duplicated. I.e.

{c1.re,c1.im,c2.re,c2.im,c3.re,c3.im,c4.re,c4.im} -> {c1.re,c1.re,c2.re,c2.re,c3.re,c3.re,c4.re,c4.re} and {c1.im,c1.im,c2.im,c2.im,c3.im,c3.im,c4.im,c4.im}

At present I am using the following pattern:

one_re = (__m512d)_mm512_shuffle_epi32((__m512i)one,0x44);
one_im = (__m512d)_mm512_shuffle_epi32((__m512i)one,0xEE);

It feels like I might be missing something.  Is this the most efficient method for this operation?

Best regards,

Alastair

 

3 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.
Alastair M.'s picture
Best Reply

I found the answer to this using a masked swizzle, which seems very slightly faster.

__mmask8 real_mask = (__mmask8)_mm512_int2mask(170);
__mmask8 imag_mask = (__mmask8)_mm512_int2mask(85);

__m512d input = {0,1,2,3,4,5,6,7};

__m512d real_parts = _mm512_mask_swizzle_pd(input,real_mask,input,_MM_SWIZ_REG_CDAB);
__m512d imag_parts = _mm512_mask_swizzle_pd(input,imag_mask,input,_MM_SWIZ_REG_CDAB);

 

Alastair

 

Taylor Kidd (Intel)'s picture

Alastair,

Thank you for letting the community know.

Regards
--
Taylor

Login to leave a comment.