Developer Guide and Reference

Contents

Intrinsics for FP Loads and Store Operations

The prototypes for Intel® Advanced Vector Extensions 512 (Intel® AVX-512) intrinsics are located in the
zmmintrin.h
header file.
To use these intrinsics, include the
immintrin.h
file as follows:
#include <immintrin.h>
Intrinsic Name
Operation
Corresponding
Intel® AVX-512 Instruction
_mm512_load_pd
,
_mm512_mask_load_pd
,
_mm512_maskz_load_pd
_mm512_store_pd
,
_mm512_mask_store_pd
Load/store aligned float64 values from memory.
MOVAPD
_mm512_load_ps
,
_mm512_mask_load_ps
,
_mm512_maskz_load_ps
_mm512_store_ps
,
_mm512_mask_store_ps
Load/store aligned float32 values from memory.
MOVAPS
_mm_mask_load_sd
,
_mm_maskz_load_sd
_mm_mask_store_sd
Load/store lower float64 values from memory.
VMOVSD
_mm_mask_load_ss
,
_mm_maskz_load_ss
_mm_mask_store_ss
Load/store lower float32 values from memory.
VMOVSS
_mm512_loadu_pd
,
_mm512_mask_loadu_pd
,
_mm512_maskz_loadu_pd
_mm512_storeu_pd
,
_mm512_mask_storeu_pd
Load/store unaligned float64 values from memory.
VMOVUPD
_mm512_loadu_ps
,
_mm512_mask_loadu_ps
,
_mm512_maskz_loadu_ps
_mm512_storeu_ps
,
_mm512_mask_storeu_ps
Load/store unaligned float32 values from memory.
VMOVUPS
_mm512_stream_pd
Store float64 values using non-temporal hint.
VMOVNTPD
_mm512_stream_ps
Store float32 values using non-temporal hint.
VMOVNTPS
variable
definition
k
writemask used as a selector
a
first source vector element
src
source element to use based on writemask result
mem_addr
pointer to base address in memory
_mm512_load_pd
extern __m512d __cdecl _mm512_load_pd(void const* mem_addr);
Loads 512-bits (composed of eight packed float64 elements) from
mem_addr
into destination.
mem_addr
must be aligned on a 64-byte boundary or a general-protection exception will be generated.
_mm512_mask_load_pd
extern __m512d __cdecl _mm512_mask_load_pd(__m512d src, __mmask8 k, void const* mem_addr);
Loads packed float64 elements from
mem_addr
into destination using writemask
k
(elements are copied from
src
when the corresponding mask bit is not set).
mem_addr
must be aligned on a 64-byte boundary or a general-protection exception will be generated.
_mm512_maskz_load_pd
extern __m512d __cdecl _mm512_maskz_load_pd(__mmask8 k, void const* mem_addr);
Loads packed float64 elements from
mem_addr
into destination using zeromask
k
(elements are zeroed out when the corresponding mask bit is not set).
mem_addr
must be aligned on a 64-byte boundary or a general-protection exception will be generated.
_mm512_load_ps
extern __m512 __cdecl _mm512_load_ps(void const* mem_addr);
Loads 512-bits (composed of sixteen packed float32 elements) from
mem_addr
into destination.
mem_addr must be aligned on a 64-byte boundary or a general-protection exception will be generated.
_mm512_mask_load_ps
extern __m512 __cdecl _mm512_mask_load_ps(__m512 src, __mmask16 k, void const* mem_addr);
Loads packed float32 elements from
mem_addr
into destination using writemask
k
(elements are copied from
src
when the corresponding mask bit is not set).
mem_addr
must be aligned on a 64-byte boundary or a general-protection exception will be generated.
_mm512_maskz_load_ps
extern __m512 __cdecl _mm512_maskz_load_ps(__mmask16 k, void const* mem_addr);
Loads packed float32 elements from
mem_addr
into destination using zeromask
k
(elements are zeroed out when the corresponding mask bit is not set).
mem_addr
must be aligned on a 64-byte boundary or a general-protection exception will be generated.
_mm_mask_load_sd
extern __m128d __cdecl _mm_mask_load_sd(__m128d src, __mmask8 k, const double* mem_addr);
Loads float64 element from
mem_addr
into lower element of destination using writemask
k
(the element is copied from
src
when mask bit 0 is not set), and sets upper destination element to zero.
mem_addr
must be aligned on a 16-byte boundary or a general-protection exception will b