_mm512_extload_epi32 alignment

_mm512_extload_epi32 alignment

Dear Experts,

As per the documentation of _mm512_extload_epi32 alignment requirement is as below

NOTE

This intrinsic requires the memory address mt to be aligned to the data size granularity dictated by the bc and conv parameters. If a conversion is done from a 8-bit type (uint8, sint8) then the required alignment is 1, 4, or 16 bytes depending on the broadcast (1x16, 4x16, none). For a conversion from 16-bit types the alignment must be 2, 8, or 32 bytes depending on the broadcast. If no conversion is used, the alignment must be 4, 16, or 64 bytes.

 

But _mm512_extload_epi32 uses VMOVDQA32 instruction, my understanding was that aligned move instruction always needed 64 byte alignment. With that in mind is the above documentation correct , if so kindly explain.

 

thanks

kvs

 

6 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

I am checking w/Development. Please stand-by.

I hope this helps. Here is the guidance I received from Development.

> "…aligned move instruction always needed 64 byte alignment…"

This is not always true - on KNC, memory operands require to be aligned to the number of bytes of memory actually accessed:

CHAPTER 2. INSTRUCTIONS TERMINOLOGY AND STATE  [from Xeon Phi™ Instruction Set manual available here (near bottom of page 29)]:
Each source memory operand must have an address that is aligned to the number of bytes of memory actually accessed by the operand (that is, before conversion or broadcast is performed); otherwise, a #GP fault will result.

> "…But _mm512_extload_epi32 uses VMOVDQA32 instruction…

This is not always true - instructions generated for the intrinsic _mm512_extload_epi32 may be different depending on the broadcast parameter of that intrinsic. When it is specified to a non-default value (which is _MM_BROADCAST32_NONE).

Here are three examples:

1. Intrinsic call:    

_mm512_extload_epi32(
      addr,
      _MM_UPCONV_EPI32_SINT8,
      _MM_BROADCAST32_NONE, 0
);

Instruction generated:  vmovdqa32 (%rdi){sint8}, %zmm2
Required alignment:    16 bytes because it accesses 16 bytes of memory due to sint8 up-conversion (it reads 16 uint8 elements and convert them to int32 in resulting vector)

2. Intrinsic call:    

_mm512_extload_epi32(
      addr,
      _MM_UPCONV_EPI32_SINT8,
      _MM_BROADCAST32_NONE, 0
);

Instruction generated:   vmovdqa32 (%rdi){sint8}, %zmm2
Required alignment:    16 bytes because it accesses 16 bytes of memory due to sint8 up-conversion (it reads 16 uint8 elements and convert them to int32 in resulting vector)

3. Intrinsic call:    

_mm512_extload_epi32(
       addr,
       _MM_UPCONV_EPI32_SINT8,
       _MM_BROADCAST_1X16, 0
);

Instruction generated:   vpbroadcastd (%rdi){sint8}, %zmm2
Required alignment:    1 byte because it accesses 1 byte of memory due to sint8 up-conversion and 1to16 broadcast.

Hi Kevin,

Thank you for the detailed clarification, it was certainly helpful.

regards

kvs

Hi Kevin,

Thank you for the detailed clarification, it was certainly helpful.

regards

kvs

You're welcome. Glad it helped.

Leave a Comment

Please sign in to add a comment. Not a member? Join today