Intel® C++ Compiler Classic Developer Guide and Reference

ID 767249
Date 12/16/2022
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

Intrinsics for Store Operations

The prototypes for Intel® Advanced Vector Extensions 512 (Intel® AVX-512) intrinsics are located in the zmmintrin.h header file.

To use these intrinsics, include the immintrin.h file as follows:

#include <immintrin.h>


variable definition
base_addr

pointer to base address in memory to begin load or store operation

mem_addr

pointer to base address in memory

k

writemask used as a selector

a

first source vector element


_mm_mask_compressstoreu_pd

void _mm_mask_compressstoreu_pd(void* base_addr, __mmask8 k, __m128d a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vcompresspd

Contiguously store the active double-precision (64-bit) floating-point elements in a (those with their respective bit set in writemask k) to unaligned memory at base_addr.



_mm256_mask_compressstoreu_pd

void _mm256_mask_compressstoreu_pd(void* base_addr, __mmask8 k, __m256d a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vcompresspd

Contiguously store the active double-precision (64-bit) floating-point elements in a (those with their respective bit set in writemask k) to unaligned memory at base_addr.



_mm_mask_compressstoreu_ps

void _mm_mask_compressstoreu_ps(void* base_addr, __mmask8 k, __m128 a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vcompressps

Contiguously store the active single-precision (32-bit) floating-point elements in a (those with their respective bit set in writemask k) to unaligned memory at base_addr.



_mm256_mask_compressstoreu_ps

void _mm256_mask_compressstoreu_ps(void* base_addr, __mmask8 k, __m256 a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vcompressps

Contiguously store the active single-precision (32-bit) floating-point elements in a (those with their respective bit set in writemask k) to unaligned memory at base_addr.



_mm_mask_store_pd

void _mm_mask_store_pd(void* mem_addr, __mmask8 k, __m128d a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vmovapd

Store packed double-precision (64-bit) floating-point elements from a into memory using writemask k. mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.



_mm256_mask_store_pd

void _mm256_mask_store_pd(void* mem_addr, __mmask8 k, __m256d a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vmovapd

Store packed double-precision (64-bit) floating-point elements from a into memory using writemask k. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.



_mm_mask_store_ps

void _mm_mask_store_ps(void* mem_addr, __mmask8 k, __m128 a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vmovaps

Store packed single-precision (32-bit) floating-point elements from a into memory using writemask k. mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.



_mm256_mask_store_ps

void _mm256_mask_store_ps(void* mem_addr, __mmask8 k, __m256 a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vmovaps

Store packed single-precision (32-bit) floating-point elements from a into memory using writemask k. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.



_mm_mask_storeu_pd

void _mm_mask_storeu_pd(void* mem_addr, __mmask8 k, __m128d a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vmovupd

Store packed double-precision (64-bit) floating-point elements from a into memory using writemask k. mem_addr does not need to be aligned on any particular boundary.



_mm256_mask_storeu_pd

void _mm256_mask_storeu_pd(void* mem_addr, __mmask8 k, __m256d a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vmovupd

Store packed double-precision (64-bit) floating-point elements from a into memory using writemask k. mem_addr does not need to be aligned on any particular boundary.



_mm_mask_storeu_ps

void _mm_mask_storeu_ps(void* mem_addr, __mmask8 k, __m128 a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vmovups

Store packed single-precision (32-bit) floating-point elements from a into memory using writemask k. mem_addr does not need to be aligned on any particular boundary.



_mm256_mask_storeu_ps

void _mm256_mask_storeu_ps(void* mem_addr, __mmask8 k, __m256 a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vmovups

Store packed single-precision (32-bit) floating-point elements from a into memory using writemask k. mem_addr does not need to be aligned on any particular boundary.



_mm_i32scatter_pd

void _mm_i32scatter_pd(void* base_addr, __m128i vindex, __m128d a, const int scale)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vscatterdpd

Scatter double-precision (64-bit) floating-point elements from a into memory using 32-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). scale should be 1, 2, 4 or 8.



_mm_mask_i32scatter_pd

void _mm_mask_i32scatter_pd(void* base_addr, __mmask8 k, __m128i vindex, __m128d a, const int scale)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vscatterdpd

Scatter double-precision (64-bit) floating-point elements from a into memory using 32-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not stored when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.



_mm256_i32scatter_pd

void _mm256_i32scatter_pd(void* base_addr, __m128i vindex, __m256d a, const int scale)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vscatterdpd

Scatter double-precision (64-bit) floating-point elements from a into memory using 32-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). scale should be 1, 2, 4 or 8.



_mm256_mask_i32scatter_pd

void _mm256_mask_i32scatter_pd(void* base_addr, __mmask8 k, __m128i vindex, __m256d a, const int scale)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vscatterdpd

Scatter double-precision (64-bit) floating-point elements from a into memory using 32-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not stored when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.



_mm_i32scatter_ps

void _mm_i32scatter_ps(void* base_addr, __m128i vindex, __m128 a, const int scale)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vscatterdps

Scatter single-precision (32-bit) floating-point elements from a into memory using 32-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). scale should be 1, 2, 4 or 8.



_mm_mask_i32scatter_ps

void _mm_mask_i32scatter_ps(void* base_addr, __mmask8 k, __m128i vindex, __m128 a, const int scale)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vscatterdps

Scatter single-precision (32-bit) floating-point elements from a into memory using 32-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not stored when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.



_mm256_i32scatter_ps

void _mm256_i32scatter_ps(void* base_addr, __m256i vindex, __m256 a, const int scale)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vscatterdps

Scatter single-precision (32-bit) floating-point elements from a into memory using 32-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). scale should be 1, 2, 4 or 8.



_mm256_mask_i32scatter_ps

void _mm256_mask_i32scatter_ps(void* base_addr, __mmask8 k, __m256i vindex, __m256 a, const int scale)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vscatterdps

Scatter single-precision (32-bit) floating-point elements from a into memory using 32-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not stored when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.



_mm_i64scatter_pd

void _mm_i64scatter_pd(void* base_addr, __m128i vindex, __m128d a, const int scale)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vscatterqpd

Scatter double-precision (64-bit) floating-point elements from a into memory using 64-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). scale should be 1, 2, 4 or 8.



_mm_mask_i64scatter_pd

void _mm_mask_i64scatter_pd(void* base_addr, __mmask8 k, __m128i vindex, __m128d a, const int scale)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vscatterqpd

Scatter double-precision (64-bit) floating-point elements from a into memory using 64-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not stored when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.



_mm256_i64scatter_pd

void _mm256_i64scatter_pd(void* base_addr, __m256i vindex, __m256d a, const int scale)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vscatterqpd

Scatter double-precision (64-bit) floating-point elements from a into memory using 64-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). scale should be 1, 2, 4 or 8.



_mm256_mask_i64scatter_pd

void _mm256_mask_i64scatter_pd(void* base_addr, __mmask8 k, __m256i vindex, __m256d a, const int scale)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vscatterqpd

Scatter double-precision (64-bit) floating-point elements from a into memory using 64-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not stored when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.



_mm_i64scatter_ps

void _mm_i64scatter_ps(void* base_addr, __m128i vindex, __m128 a, const int scale)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vscatterqps

Scatter single-precision (32-bit) floating-point elements from a into memory using 64-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not stored when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.



_mm_mask_i64scatter_ps

void _mm_mask_i64scatter_ps(void* base_addr, __mmask8 k, __m128i vindex, __m128 a, const int scale)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vscatterqps

Scatter single-precision (32-bit) floating-point elements from a into memory using 64-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not stored when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.



_mm256_i64scatter_ps

void _mm256_i64scatter_ps(void* base_addr, __m256i vindex, __m128 a, const int scale)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vscatterqps

Scatter single-precision (32-bit) floating-point elements from a into memory using 64-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not stored when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.



_mm256_mask_i64scatter_ps

void _mm256_mask_i64scatter_ps(void* base_addr, __mmask8 k, __m256i vindex, __m128 a, const int scale)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vscatterqps

Scatter single-precision (32-bit) floating-point elements from a into memory using 64-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not stored when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.



_mm_mask_store_epi32

void _mm_mask_store_epi32(void* mem_addr, __mmask8 k, __m128i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vmovdqa32

Store packed 32-bit integers from a into memory using writemask k. mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.



_mm256_mask_store_epi32

void _mm256_mask_store_epi32(void* mem_addr, __mmask8 k, __m256i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vmovdqa32

Store packed 32-bit integers from a into memory using writemask k. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.



_mm_mask_store_epi64

void _mm_mask_store_epi64(void* mem_addr, __mmask8 k, __m128i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vmovdqa64

Store packed 64-bit integers from a into memory using writemask k. mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.



_mm256_mask_store_epi64

void _mm256_mask_store_epi64(void* mem_addr, __mmask8 k, __m256i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vmovdqa64

Store packed 64-bit integers from a into memory using writemask k. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.



_mm_mask_storeu_epi16

void _mm_mask_storeu_epi16(void* mem_addr, __mmask8 k, __m128i a)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vmovdqu16

Store packed 16-bit integers from a into memory using writemask k. mem_addr does not need to be aligned on any particular boundary.



_mm256_mask_storeu_epi16

void _mm256_mask_storeu_epi16(void* mem_addr, __mmask16 k, __m256i a)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vmovdqu16

Store packed 16-bit integers from a into memory using writemask k. mem_addr does not need to be aligned on any particular boundary.



_mm512_mask_storeu_epi16

void _mm512_mask_storeu_epi16(void* mem_addr, __mmask32 k, __m512i a)

CPUID Flags: AVX512BW

Instruction(s): vmovdqu16

Store packed 16-bit integers from a into memory using writemask k. mem_addr does not need to be aligned on any particular boundary.



_mm_mask_storeu_epi32

void _mm_mask_storeu_epi32(void* mem_addr, __mmask8 k, __m128i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vmovdqu32

Store packed 32-bit integers from a into memory using writemask k. mem_addr does not need to be aligned on any particular boundary.



_mm256_mask_storeu_epi32

void _mm256_mask_storeu_epi32(void* mem_addr, __mmask8 k, __m256i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vmovdqu32

Store packed 32-bit integers from a into memory using writemask k. mem_addr does not need to be aligned on any particular boundary.



_mm_mask_storeu_epi64

void _mm_mask_storeu_epi64(void* mem_addr, __mmask8 k, __m128i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vmovdqu64

Store packed 64-bit integers from a into memory using writemask k. mem_addr does not need to be aligned on any particular boundary.



_mm256_mask_storeu_epi64

void _mm256_mask_storeu_epi64(void* mem_addr, __mmask8 k, __m256i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vmovdqu64

Store packed 64-bit integers from a into memory using writemask k. mem_addr does not need to be aligned on any particular boundary.



_mm_mask_storeu_epi8

void _mm_mask_storeu_epi8(void* mem_addr, __mmask16 k, __m128i a)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vmovdqu8

Store packed 8-bit integers from a into memory using writemask k. mem_addr does not need to be aligned on any particular boundary.



_mm256_mask_storeu_epi8

void _mm256_mask_storeu_epi8(void* mem_addr, __mmask32 k, __m256i a)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vmovdqu8

Store packed 8-bit integers from a into memory using writemask k. mem_addr does not need to be aligned on any particular boundary.



_mm512_mask_storeu_epi8

void _mm512_mask_storeu_epi8(void* mem_addr, __mmask64 k, __m512i a)

CPUID Flags: AVX512BW

Instruction(s): vmovdqu8

Store packed 8-bit integers from a into memory using writemask k. mem_addr does not need to be aligned on any particular boundary.



_mm_mask_compressstoreu_epi32

void _mm_mask_compressstoreu_epi32(void* base_addr, __mmask8 k, __m128i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpcompressd

Contiguously store the active 32-bit integers in a (those with their respective bit set in writemask k) to unaligned memory at base_addr.



_mm256_mask_compressstoreu_epi32

void _mm256_mask_compressstoreu_epi32(void* base_addr, __mmask8 k, __m256i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpcompressd

Contiguously store the active 32-bit integers in a (those with their respective bit set in writemask k) to unaligned memory at base_addr.



_mm_mask_compressstoreu_epi64

void _mm_mask_compressstoreu_epi64(void* base_addr, __mmask8 k, __m128i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpcompressq

Contiguously store the active 64-bit integers in a (those with their respective bit set in writemask k) to unaligned memory at base_addr.



_mm256_mask_compressstoreu_epi64

void _mm256_mask_compressstoreu_epi64(void* base_addr, __mmask8 k, __m256i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpcompressq

Contiguously store the active 64-bit integers in a (those with their respective bit set in writemask k) to unaligned memory at base_addr.



_mm_i32scatter_epi32

void _mm_i32scatter_epi32(void* base_addr, __m128i vindex, __m128i a, const int scale)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpscatterdd

Scatter 32-bit integers from a into memory using 32-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). scale should be 1, 2, 4 or 8.



_mm_mask_i32scatter_epi32

void _mm_mask_i32scatter_epi32(void* base_addr, __mmask8 k, __m128i vindex, __m128i a, const int scale)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpscatterdd

Scatter 32-bit integers from a into memory using 32-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not stored when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.



_mm256_i32scatter_epi32

void _mm256_i32scatter_epi32(void* base_addr, __m256i vindex, __m256i a, const int scale)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpscatterdd

Scatter 32-bit integers from a into memory using 32-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). scale should be 1, 2, 4 or 8.



_mm256_mask_i32scatter_epi32

void _mm256_mask_i32scatter_epi32(void* base_addr, __mmask8 k, __m256i vindex, __m256i a, const int scale)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpscatterdd

Scatter 32-bit integers from a into memory using 32-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not stored when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.



_mm_i32scatter_epi64

void _mm_i32scatter_epi64(void* base_addr, __m128i vindex, __m128i a, const int scale)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpscatterdq

Scatter 64-bit integers from a into memory using 32-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). scale should be 1, 2, 4 or 8.



_mm_mask_i32scatter_epi64

void _mm_mask_i32scatter_epi64(void* base_addr, __mmask8 k, __m128i vindex, __m128i a, const int scale)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpscatterdq

Scatter 64-bit integers from a into memory using 32-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not stored when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.



_mm256_i32scatter_epi64

void _mm256_i32scatter_epi64(void* base_addr, __m128i vindex, __m256i a, const int scale)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpscatterdq

Scatter 64-bit integers from a into memory using 32-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). scale should be 1, 2, 4 or 8.



_mm256_mask_i32scatter_epi64

void _mm256_mask_i32scatter_epi64(void* base_addr, __mmask8 k, __m128i vindex, __m256i a, const int scale)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpscatterdq

Scatter 64-bit integers from a into memory using 32-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not stored when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.



_mm_i64scatter_epi32

void _mm_i64scatter_epi32(void* base_addr, __m128i vindex, __m128i a, const int scale)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpscatterqd

Scatter 32-bit integers from a into memory using 64-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). scale should be 1, 2, 4 or 8.



_mm_mask_i64scatter_epi32

void _mm_mask_i64scatter_epi32(void* base_addr, __mmask8 k, __m128i vindex, __m128i a, const int scale)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpscatterqd

Scatter 32-bit integers from a into memory using 64-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not stored when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.



_mm256_i64scatter_epi32

void _mm256_i64scatter_epi32(void* base_addr, __m256i vindex, __m128i a, const int scale)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpscatterqd

Scatter 32-bit integers from a into memory using 64-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). scale should be 1, 2, 4 or 8.



_mm256_mask_i64scatter_epi32

void _mm256_mask_i64scatter_epi32(void* base_addr, __mmask8 k, __m256i vindex, __m128i a, const int scale)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpscatterqd

Scatter 32-bit integers from a into memory using 64-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not stored when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.



_mm_i64scatter_epi64

void _mm_i64scatter_epi64(void* base_addr, __m128i vindex, __m128i a, const int scale)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpscatterqq

Scatter 64-bit integers from a into memory using 64-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). scale should be 1, 2, 4 or 8.



_mm_mask_i64scatter_epi64

void _mm_mask_i64scatter_epi64(void* base_addr, __mmask8 k, __m128i vindex, __m128i a, const int scale)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpscatterqq

Scatter 64-bit integers from a into memory using 64-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not stored when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.



_mm256_i64scatter_epi64

void _mm256_i64scatter_epi64(void* base_addr, __m256i vindex, __m256i a, const int scale)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpscatterqq

Scatter 64-bit integers from a into memory using 64-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). scale should be 1, 2, 4 or 8.



_mm256_mask_i64scatter_epi64

void _mm256_mask_i64scatter_epi64(void* base_addr, __mmask8 k, __m256i vindex, __m256i a, const int scale)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpscatterqq

Scatter 64-bit integers from a into memory using 64-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not stored when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.