Developer Guide and Reference

Contents

Intrinsics for FP Expand and Load Operations

The prototypes for Intel® Advanced Vector Extensions 512 (Intel® AVX-512) intrinsics are located in the
zmmintrin.h
header file.
To use these intrinsics, include the
immintrin.h
file as follows:
#include <immintrin.h>
Intrinsic Name
Operation
Corresponding
Intel® AVX-512 Instruction
_mm512_expand_pd
,
_mm512_mask_expand_pd
,
_mm512_maskz_expand_pd
Load packed float64 values from dense memory.
VEXPANDPD
_mm512_mask_expandloadu_pd
,
_mm512_maskz_expandloadu_pd
Load packed float64 values from dense memory.
VEXPANDPD
_mm512_expand_ps
,
_mm512_mask_expand_ps
,
_mm512_maskz_expand_ps
Load packed float32 values from dense memory.
VEXPANDPS
_mm512_mask_expandloadu_ps
,
_mm512_maskz_expandloadu_ps
Load packed float32 values from dense memory.
VEXPANDPS
variable
definition
k
writemask used as a selector
a
first source vector element
src
source element to use based on writemask result
mem_addr
pointer to memory address
_mm512_expand_pd
extern __m512d __cdecl _mm512_expand_pd(__m512d a);
Loads contiguous active float64 elements from
a
(those with their respective bit set in mask
k
), and stores the result.
_mm512_mask_expand_pd
extern __m512d __cdecl _mm512_mask_expand_pd(__m512d src, __mmask8 k, __m512d a);
Loads contiguous active float64 elements from
a
(those with their respective bit set in mask
k
), and stores the result using writemask
k
(elements are copied from
src
when the corresponding mask bit is not set).
_mm512_maskz_expand_pd
extern __m512d __cdecl _mm512_maskz_expand_pd(__mmask8 k, __m512d a);
Loads contiguous active float64 elements from
a
(those with their respective bit set in mask
k
), and stores the result using zeromask
k
(elements are zeroed out when the corresponding mask bit is not set).
_mm512_expand_ps
extern __m512 __cdecl _mm512_expand_ps(__m512 a);
Loads contiguous active float32 elements from
a
(those with their respective bit set in mask
k
), and stores the result.
_mm512_mask_expand_ps
extern __m512 __cdecl _mm512_mask_expand_ps(__m512 src, __mmask16 k, __m512 a);
Loads contiguous active float32 elements from
a
(those with their respective bit set in mask
k
), and stores the result using writemask
k
(elements are copied from
src
when the corresponding mask bit is not set).
_mm512_maskz_expand_ps
extern __m512 __cdecl _mm512_maskz_expand_ps(__mmask16 k, __m512 a);
Loads contiguous active float32 elements from
a
(those with their respective bit set in mask
k
), and stores the result using zeromask
k
(elements are zeroed out when the corresponding mask bit is not set).
_mm512_mask_expandloadu_pd
extern __m512d __cdecl _mm512_mask_expandloadu_pd(__m512d src, __mmask8 k, void * mem_addr);
Loads contiguous active float64 elements from unaligned memory at
mem_addr
(those with their respective bit set in mask
k
), and stores the result using writemask
k
(elements are copied from
src
when the corresponding mask bit is not set).
_mm512_maskz_expandloadu_pd
extern __m512d __cdecl _mm512_maskz_expandloadu_pd(__mmask8 k, void * mem_addr);
Loads contiguous active float64 elements from unaligned memory at
mem_addr
(those with their respective bit set in mask
k
), and stores the result using zeromask
k
(elements are zeroed out when the corresponding mask bit is not set).
_mm512_mask_expandloadu_ps
extern __m512 __cdecl _mm512_mask_expandloadu_ps(__m512 src, __mmask16 k, void * mem_addr);
Loads contiguous active float32 elements from unaligned memory at
mem_addr
(those with their respective bit set in mask
k
), and stores the result using writemask
k
(elements are copied from
src
when the corresponding mask bit is not set).