Intrinsics for Miscellaneous Operations

The prototypes for Intel® Advanced Vector Extensions 512 (Intel® AVX-512) intrinsics are located in the zmmintrin.h header file.

To use these intrinsics, include the immintrin.h file as follows:

#include <immintrin.h>


variable definition
src

source element to use based on writemask result

k

writemask used as a selector

a

first source vector element

b

second source vector element

c

third source vector element

rounding

Rounding control values; these can be one of the following (along with the sae suppress all exceptions flag):

  • _MM_FROUND_TO_NEAREST_INT - rounds to nearest even
  • _MM_FROUND_TO_NEG_INF - rounds to negative infinity
  • _MM_FROUND_TO_POS_INF - rounds to positive infinity
  • _MM_FROUND_TO_ZERO - rounds to zero
  • _MM_FROUND_CUR_DIRECTION - rounds using default from MXCSR register

interv

Where _MM_MANTISSA_NORM_ENUM can be one of the following:

  • _MM_MANT_NORM_1_2 - interval [1, 2)
  • _MM_MANT_NORM_p5_2 - interval [1.5, 2)
  • _MM_MANT_NORM_p5_1 - interval [1.5, 1)
  • _MM_MANT_NORM_p75_1p5 - interval [0.75, 1.5)

sc

Where _MM_MANTISSA_SIGN_ENUM can be one of the following:

  • _MM_MANT_SIGN_src - sign = sign(SRC)
  • _MM_MANT_SIGN_zero - sign = 0
  • _MM_MANT_SIGN_nan - DEST = NaN if sign(SRC) = 1


_mm_broadcast_i32x2

__m128i _mm_broadcast_i32x2(__m128i a)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vbroadcasti32x2

Broadcast the lower 2 packed 32-bit integers from a to all elements of "dst.



_mm_mask_broadcast_i32x2

__m128i _mm_mask_broadcast_i32x2(__m128i src, __mmask8 k, __m128i a)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vbroadcasti32x2

Broadcast the lower 2 packed 32-bit integers from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_broadcast_i32x2

__m128i _mm_maskz_broadcast_i32x2(__mmask8 k, __m128i a)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vbroadcasti32x2

Broadcast the lower 2 packed 32-bit integers from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_broadcast_i32x2

__m256i _mm256_broadcast_i32x2(__m128i a)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vbroadcasti32x2

Broadcast the lower 2 packed 32-bit integers from a to all elements of "dst.



_mm256_mask_broadcast_i32x2

__m256i _mm256_mask_broadcast_i32x2(__m256i src, __mmask8 k, __m128i a)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vbroadcasti32x2

Broadcast the lower 2 packed 32-bit integers from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_broadcast_i32x2

__m256i _mm256_maskz_broadcast_i32x2(__mmask8 k, __m128i a)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vbroadcasti32x2

Broadcast the lower 2 packed 32-bit integers from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_broadcast_i32x2

__m512i _mm512_broadcast_i32x2(__m128i a)

CPUID Flags: AVX512DQ

Instruction(s): vbroadcasti32x2

Broadcast the lower 2 packed 32-bit integers from a to all elements of "dst.



_mm512_mask_broadcast_i32x2

__m512i _mm512_mask_broadcast_i32x2(__m512i src, __mmask16 k, __m128i a)

CPUID Flags: AVX512DQ

Instruction(s): vbroadcasti32x2

Broadcast the lower 2 packed 32-bit integers from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_broadcast_i32x2

__m512i _mm512_maskz_broadcast_i32x2(__mmask16 k, __m128i a)

CPUID Flags: AVX512DQ

Instruction(s): vbroadcasti32x2

Broadcast the lower 2 packed 32-bit integers from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_broadcast_i32x4

__m256i _mm256_broadcast_i32x4(__m128i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vbroadcasti32x4

Broadcast the 4 packed 32-bit integers from a to all elements of the return value.



_mm256_mask_broadcast_i32x4

__m256i _mm256_mask_broadcast_i32x4(__m256i src, __mmask8 k, __m128i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vbroadcasti32x4

Broadcast the 4 packed 32-bit integers from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_broadcast_i32x4

__m256i _mm256_maskz_broadcast_i32x4(__mmask8 k, __m128i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vbroadcasti32x4

Broadcast the 4 packed 32-bit integers from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_broadcast_i32x8

__m512i _mm512_broadcast_i32x8(__m256i a)

CPUID Flags: AVX512DQ

Instruction(s): vbroadcasti32x8

Broadcast the 8 packed 32-bit integers from a to all elements of the return value.



_mm512_mask_broadcast_i32x8

__m512i _mm512_mask_broadcast_i32x8(__m512i src, __mmask16 k, __m256i a)

CPUID Flags: AVX512DQ

Instruction(s): vbroadcasti32x8

Broadcast the 8 packed 32-bit integers from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_broadcast_i32x8

__m512i _mm512_maskz_broadcast_i32x8(__mmask16 k, __m256i a)

CPUID Flags: AVX512DQ

Instruction(s): vbroadcasti32x8

Broadcast the 8 packed 32-bit integers from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_broadcast_i64x2

__m256i _mm256_broadcast_i64x2(__m128i a)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vbroadcasti64x2

Broadcast the 2 packed 64-bit integers from a to all elements of the return value.



_mm256_mask_broadcast_i64x2

__m256i _mm256_mask_broadcast_i64x2(__m256i src, __mmask8 k, __m128i a)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vbroadcasti64x2

Broadcast the 2 packed 64-bit integers from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_broadcast_i64x2

__m256i _mm256_maskz_broadcast_i64x2(__mmask8 k, __m128i a)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vbroadcasti64x2

Broadcast the 2 packed 64-bit integers from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_broadcast_i64x2

__m512i _mm512_broadcast_i64x2(__m128i a)

CPUID Flags: AVX512DQ

Instruction(s): vbroadcasti64x2

Broadcast the 2 packed 64-bit integers from a to all elements of the return value.



_mm512_mask_broadcast_i64x2

__m512i _mm512_mask_broadcast_i64x2(__m512i src, __mmask8 k, __m128i a)

CPUID Flags: AVX512DQ

Instruction(s): vbroadcasti64x2

Broadcast the 2 packed 64-bit integers from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_broadcast_i64x2

__m512i _mm512_maskz_broadcast_i64x2(__mmask8 k, __m128i a)

CPUID Flags: AVX512DQ

Instruction(s): vbroadcasti64x2

Broadcast the 2 packed 64-bit integers from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_inserti32x4

__m256i _mm256_inserti32x4(__m256i a, __m128i b, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vinserti32x4

Copy a to the return value, then insert 128 bits (composed of 4 packed 32-bit integers) from b into dst at the location specified by imm.



_mm256_mask_inserti32x4

__m256i _mm256_mask_inserti32x4(__m256i src, __mmask8 k, __m256i a, __m128i b, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vinserti32x4

Copy a to tmp, then insert 128 bits (composed of 4 packed 32-bit integers) from b into tmp at the location specified by imm. Store tmp to the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_inserti32x4

__m256i _mm256_maskz_inserti32x4(__mmask8 k, __m256i a, __m128i b, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vinserti32x4

Copy a to tmp, then insert 128 bits (composed of 4 packed 32-bit integers) from b into tmp at the location specified by imm. Store tmp to the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_inserti32x8

__m512i _mm512_inserti32x8(__m512i a, __m256i b, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vinserti32x8

Copy a to the return value, then insert 256 bits (composed of 8 packed 32-bit integers) from b into dst at the location specified by imm.



_mm512_mask_inserti32x8

__m512i _mm512_mask_inserti32x8(__m512i src, __mmask16 k, __m512i a, __m256i b, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vinserti32x8

Copy a to tmp, then insert 256 bits (composed of 8 packed 32-bit integers) from b into tmp at the location specified by imm. Store tmp to the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_inserti32x8

__m512i _mm512_maskz_inserti32x8(__mmask16 k, __m512i a, __m256i b, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vinserti32x8

Copy a to tmp, then insert 256 bits (composed of 8 packed 32-bit integers) from b into tmp at the location specified by imm. Store tmp to the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_inserti64x2

__m256i _mm256_inserti64x2(__m256i a, __m128i b, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vinserti64x2

Copy a to the return value, then insert 128 bits (composed of 2 packed 64-bit integers) from b into dst at the location specified by imm.



_mm256_mask_inserti64x2

__m256i _mm256_mask_inserti64x2(__m256i src, __mmask8 k, __m256i a, __m128i b, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vinserti64x2

Copy a to tmp, then insert 128 bits (composed of 2 packed 64-bit integers) from b into tmp at the location specified by imm. Store tmp to the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_inserti64x2

__m256i _mm256_maskz_inserti64x2(__mmask8 k, __m256i a, __m128i b, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vinserti64x2

Copy a to tmp, then insert 128 bits (composed of 2 packed 64-bit integers) from b into tmp at the location specified by imm. Store tmp to the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_inserti64x2

__m512i _mm512_inserti64x2(__m512i a, __m128i b, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vinserti64x2

Copy a to the return value, then insert 128 bits (composed of 2 packed 64-bit integers) from b into dst at the location specified by imm.



_mm512_mask_inserti64x2

__m512i _mm512_mask_inserti64x2(__m512i src, __mmask8 k, __m512i a, __m128i b, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vinserti64x2

Copy a to tmp, then insert 128 bits (composed of 2 packed 64-bit integers) from b into tmp at the location specified by imm. Store tmp to the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_inserti64x2

__m512i _mm512_maskz_inserti64x2(__mmask8 k, __m512i a, __m128i b, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vinserti64x2

Copy a to tmp, then insert 128 bits (composed of 2 packed 64-bit integers) from b into tmp at the location specified by imm. Store tmp to the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_shuffle_i32x4

__m256i _mm256_mask_shuffle_i32x4(__m256i src, __mmask8 k, __m256i a, __m256i b, const int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vshufi32x4

Shuffle 128-bits (composed of 4 32-bit integers) selected by imm from a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_shuffle_i32x4

__m256i _mm256_maskz_shuffle_i32x4(__mmask8 k, __m256i a, __m256i b, const int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vshufi32x4

Shuffle 128-bits (composed of 4 32-bit integers) selected by imm from a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_shuffle_i32x4

__m256i _mm256_shuffle_i32x4(__m256i a, __m256i b, const int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vshufi32x4

Shuffle 128-bits (composed of 4 32-bit integers) selected by imm from a and b, and return the results.



_mm256_mask_shuffle_i64x2

__m256i _mm256_mask_shuffle_i64x2(__m256i src, __mmask8 k, __m256i a, __m256i b, const int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vshufi64x2

Shuffle 128-bits (composed of 2 64-bit integers) selected by imm from a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_shuffle_i64x2

__m256i _mm256_maskz_shuffle_i64x2(__mmask8 k, __m256i a, __m256i b, const int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vshufi64x2

Shuffle 128-bits (composed of 2 64-bit integers) selected by imm from a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_shuffle_i64x2

__m256i _mm256_shuffle_i64x2(__m256i a, __m256i b, const int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vshufi64x2

Shuffle 128-bits (composed of 2 64-bit integers) selected by imm from a and b, and return the results.



_mm_mask_blend_pd

__m128d _mm_mask_blend_pd(__mmask8 k, __m128d a, __m128d b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vblendmpd

Blend packed double-precision (64-bit) floating-point elements from a and b using control mask k, and return the results.



_mm256_mask_blend_pd

__m256d _mm256_mask_blend_pd(__mmask8 k, __m256d a, __m256d b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vblendmpd

Blend packed double-precision (64-bit) floating-point elements from a and b using control mask k, and return the results.



_mm_mask_blend_ps

__m128 _mm_mask_blend_ps(__mmask8 k, __m128 a, __m128 b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vblendmps

Blend packed single-precision (32-bit) floating-point elements from a and b using control mask k, and return the results.



_mm256_mask_blend_ps

__m256 _mm256_mask_blend_ps(__mmask8 k, __m256 a, __m256 b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vblendmps

Blend packed single-precision (32-bit) floating-point elements from a and b using control mask k, and return the results.



_mm256_broadcast_f32x2

__m256 _mm256_broadcast_f32x2(__m128 a)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vbroadcastf32x2

Broadcast the lower 2 packed single-precision (32-bit) floating-point elements from a to all elements of the return value.



_mm256_mask_broadcast_f32x2

__m256 _mm256_mask_broadcast_f32x2(__m256 src, __mmask8 k, __m128 a)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vbroadcastf32x2

Broadcast the lower 2 packed single-precision (32-bit) floating-point elements from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_broadcast_f32x2

__m256 _mm256_maskz_broadcast_f32x2(__mmask8 k, __m128 a)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vbroadcastf32x2

Broadcast the lower 2 packed single-precision (32-bit) floating-point elements from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_broadcast_f32x2

__m512 _mm512_broadcast_f32x2(__m128 a)

CPUID Flags: AVX512DQ

Instruction(s): vbroadcastf32x2

Broadcast the lower 2 packed single-precision (32-bit) floating-point elements from a to all elements of the return value.



_mm512_mask_broadcast_f32x2

__m512 _mm512_mask_broadcast_f32x2(__m512 src, __mmask16 k, __m128 a)

CPUID Flags: AVX512DQ

Instruction(s): vbroadcastf32x2

Broadcast the lower 2 packed single-precision (32-bit) floating-point elements from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_broadcast_f32x2

__m512 _mm512_maskz_broadcast_f32x2(__mmask16 k, __m128 a)

CPUID Flags: AVX512DQ

Instruction(s): vbroadcastf32x2

Broadcast the lower 2 packed single-precision (32-bit) floating-point elements from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_broadcast_f32x4

__m256 _mm256_broadcast_f32x4(__m128 a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vbroadcastf32x4

Broadcast the 4 packed single-precision (32-bit) floating-point elements from a to all elements of the return value.



_mm256_mask_broadcast_f32x4

__m256 _mm256_mask_broadcast_f32x4(__m256 src, __mmask8 k, __m128 a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vbroadcastf32x4

Broadcast the 4 packed single-precision (32-bit) floating-point elements from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_broadcast_f32x4

__m256 _mm256_maskz_broadcast_f32x4(__mmask8 k, __m128 a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vbroadcastf32x4

Broadcast the 4 packed single-precision (32-bit) floating-point elements from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_broadcast_f32x8

__m512 _mm512_broadcast_f32x8(__m256 a)

CPUID Flags: AVX512DQ

Instruction(s): vbroadcastf32x8

Broadcast the 8 packed single-precision (32-bit) floating-point elements from a to all elements of the return value.



_mm512_mask_broadcast_f32x8

__m512 _mm512_mask_broadcast_f32x8(__m512 src, __mmask16 k, __m256 a)

CPUID Flags: AVX512DQ

Instruction(s): vbroadcastf32x8

Broadcast the 8 packed single-precision (32-bit) floating-point elements from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_broadcast_f32x8

__m512 _mm512_maskz_broadcast_f32x8(__mmask16 k, __m256 a)

CPUID Flags: AVX512DQ

Instruction(s): vbroadcastf32x8

Broadcast the 8 packed single-precision (32-bit) floating-point elements from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_broadcast_f64x2

__m256d _mm256_broadcast_f64x2(__m128d a)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vbroadcastf64x2

Broadcast the 2 packed double-precision (64-bit) floating-point elements from a to all elements of the return value.



_mm256_mask_broadcast_f64x2

__m256d _mm256_mask_broadcast_f64x2(__m256d src, __mmask8 k, __m128d a)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vbroadcastf64x2

Broadcast the 2 packed double-precision (64-bit) floating-point elements from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_broadcast_f64x2

__m256d _mm256_maskz_broadcast_f64x2(__mmask8 k, __m128d a)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vbroadcastf64x2

Broadcast the 2 packed double-precision (64-bit) floating-point elements from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_broadcast_f64x2

__m512d _mm512_broadcast_f64x2(__m128d a)

CPUID Flags: AVX512DQ

Instruction(s): vbroadcastf64x2

Broadcast the 2 packed double-precision (64-bit) floating-point elements from a to all elements of the return value.



_mm512_mask_broadcast_f64x2

__m512d _mm512_mask_broadcast_f64x2(__m512d src, __mmask8 k, __m128d a)

CPUID Flags: AVX512DQ

Instruction(s): vbroadcastf64x2

Broadcast the 2 packed double-precision (64-bit) floating-point elements from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_broadcast_f64x2

__m512d _mm512_maskz_broadcast_f64x2(__mmask8 k, __m128d a)

CPUID Flags: AVX512DQ

Instruction(s): vbroadcastf64x2

Broadcast the 2 packed double-precision (64-bit) floating-point elements from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_broadcastsd_pd

__m256d _mm256_mask_broadcastsd_pd(__m256d src, __mmask8 k, __m128d a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vbroadcastsd

Broadcast the low double-precision (64-bit) floating-point element from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_broadcastsd_pd

__m256d _mm256_maskz_broadcastsd_pd(__mmask8 k, __m128d a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vbroadcastsd

Broadcast the low double-precision (64-bit) floating-point element from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_broadcastss_ps

__m128 _mm_mask_broadcastss_ps(__m128 src, __mmask8 k, __m128 a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vbroadcastss

Broadcast the low single-precision (32-bit) floating-point element from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_broadcastss_ps

__m128 _mm_maskz_broadcastss_ps(__mmask8 k, __m128 a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vbroadcastss

Broadcast the low single-precision (32-bit) floating-point element from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_broadcastss_ps

__m256 _mm256_mask_broadcastss_ps(__m256 src, __mmask8 k, __m128 a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vbroadcastss

Broadcast the low single-precision (32-bit) floating-point element from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_broadcastss_ps

__m256 _mm256_maskz_broadcastss_ps(__mmask8 k, __m128 a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vbroadcastss

Broadcast the low single-precision (32-bit) floating-point element from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_compress_pd

__m128d _mm_mask_compress_pd(__m128d src, __mmask8 k, __m128d a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vcompresspd

Contiguously store the active double-precision (64-bit) floating-point elements in a (those with their respective bit set in writemask k) to the return value, and pass through the remaining elements from src.



_mm_maskz_compress_pd

__m128d _mm_maskz_compress_pd(__mmask8 k, __m128d a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vcompresspd

Contiguously store the active double-precision (64-bit) floating-point elements in a (those with their respective bit set in zeromask k) to the return value, and set the remaining elements to zero.



_mm256_mask_compress_pd

__m256d _mm256_mask_compress_pd(__m256d src, __mmask8 k, __m256d a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vcompresspd

Contiguously store the active double-precision (64-bit) floating-point elements in a (those with their respective bit set in writemask k) to the return value, and pass through the remaining elements from src.



_mm256_maskz_compress_pd

__m256d _mm256_maskz_compress_pd(__mmask8 k, __m256d a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vcompresspd

Contiguously store the active double-precision (64-bit) floating-point elements in a (those with their respective bit set in zeromask k) to the return value, and set the remaining elements to zero.



_mm_mask_compress_ps

__m128 _mm_mask_compress_ps(__m128 src, __mmask8 k, __m128 a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vcompressps

Contiguously store the active single-precision (32-bit) floating-point elements in a (those with their respective bit set in writemask k) to the return value, and pass through the remaining elements from src.



_mm_maskz_compress_ps

__m128 _mm_maskz_compress_ps(__mmask8 k, __m128 a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vcompressps

Contiguously store the active single-precision (32-bit) floating-point elements in a (those with their respective bit set in zeromask k) to the return value, and set the remaining elements to zero.



_mm256_mask_compress_ps

__m256 _mm256_mask_compress_ps(__m256 src, __mmask8 k, __m256 a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vcompressps

Contiguously store the active single-precision (32-bit) floating-point elements in a (those with their respective bit set in writemask k) to the return value, and pass through the remaining elements from src.



_mm256_maskz_compress_ps

__m256 _mm256_maskz_compress_ps(__mmask8 k, __m256 a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vcompressps

Contiguously store the active single-precision (32-bit) floating-point elements in a (those with their respective bit set in zeromask k) to the return value, and set the remaining elements to zero.



_mm_mask_expand_pd

__m128d _mm_mask_expand_pd(__m128d src, __mmask8 k, __m128d a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vexpandpd

Load contiguous active double-precision (64-bit) floating-point elements from a (those with their respective bit set in mask k), and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_expand_pd

__m128d _mm_maskz_expand_pd(__mmask8 k, __m128d a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vexpandpd

Load contiguous active double-precision (64-bit) floating-point elements from a (those with their respective bit set in mask k), and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_expand_pd

__m256d _mm256_mask_expand_pd(__m256d src, __mmask8 k, __m256d a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vexpandpd

Load contiguous active double-precision (64-bit) floating-point elements from a (those with their respective bit set in mask k), and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_expand_pd

__m256d _mm256_maskz_expand_pd(__mmask8 k, __m256d a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vexpandpd

Load contiguous active double-precision (64-bit) floating-point elements from a (those with their respective bit set in mask k), and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_expand_ps

__m128 _mm_mask_expand_ps(__m128 src, __mmask8 k, __m128 a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vexpandps

Load contiguous active single-precision (32-bit) floating-point elements from a (those with their respective bit set in mask k), and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_expand_ps

__m128 _mm_maskz_expand_ps(__mmask8 k, __m128 a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vexpandps

Load contiguous active single-precision (32-bit) floating-point elements from a (those with their respective bit set in mask k), and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_expand_ps

__m256 _mm256_mask_expand_ps(__m256 src, __mmask8 k, __m256 a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vexpandps

Load contiguous active single-precision (32-bit) floating-point elements from a (those with their respective bit set in mask k), and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_expand_ps

__m256 _mm256_maskz_expand_ps(__mmask8 k, __m256 a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vexpandps

Load contiguous active single-precision (32-bit) floating-point elements from a (those with their respective bit set in mask k), and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_extractf32x4_ps

__m128 _mm256_extractf32x4_ps(__m256 a, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vextractf32x4

Extract 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from a, selected with imm, and store the result in the return value.



_mm256_mask_extractf32x4_ps

__m128 _mm256_mask_extractf32x4_ps(__m128 src, __mmask8 k, __m256 a, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vextractf32x4

Extract 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from a, selected with imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_extractf32x4_ps

__m128 _mm256_maskz_extractf32x4_ps(__mmask8 k, __m256 a, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vextractf32x4

Extract 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from a, selected with imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_extractf32x8_ps

__m256 _mm512_extractf32x8_ps(__m512 a, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vextractf32x8

Extract 256 bits (composed of 8 packed single-precision (32-bit) floating-point elements) from a, selected with imm, and store the result in the return value.



_mm512_mask_extractf32x8_ps

__m256 _mm512_mask_extractf32x8_ps(__m256 src, __mmask8 k, __m512 a, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vextractf32x8

Extract 256 bits (composed of 8 packed single-precision (32-bit) floating-point elements) from a, selected with imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_extractf32x8_ps

__m256 _mm512_maskz_extractf32x8_ps(__mmask8 k, __m512 a, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vextractf32x8

Extract 256 bits (composed of 8 packed single-precision (32-bit) floating-point elements) from a, selected with imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_extractf64x2_pd

__m128d _mm256_extractf64x2_pd(__m256d a, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vextractf64x2

Extract 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from a, selected with imm, and store the result in the return value.



_mm256_mask_extractf64x2_pd

__m128d _mm256_mask_extractf64x2_pd(__m128d src, __mmask8 k, __m256d a, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vextractf64x2

Extract 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from a, selected with imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_extractf64x2_pd

__m128d _mm256_maskz_extractf64x2_pd(__mmask8 k, __m256d a, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vextractf64x2

Extract 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from a, selected with imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_extractf64x2_pd

__m128d _mm512_extractf64x2_pd(__m512d a, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vextractf64x2

Extract 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from a, selected with imm, and store the result in the return value.



_mm512_mask_extractf64x2_pd

__m128d _mm512_mask_extractf64x2_pd(__m128d src, __mmask8 k, __m512d a, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vextractf64x2

Extract 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from a, selected with imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_extractf64x2_pd

__m128d _mm512_maskz_extractf64x2_pd(__mmask8 k, __m512d a, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vextractf64x2

Extract 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from a, selected with imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_fixupimm_pd

__m128d _mm_fixupimm_pd(__m128d a, __m128d b, __m128i c, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfixupimmpd

Fix up packed double-precision (64-bit) floating-point elements in a and b using packed 64-bit integers in c, and return the results. imm is used to set the required flags reporting.



_mm_mask_fixupimm_pd

__m128d _mm_mask_fixupimm_pd(__m128d a, __mmask8 k, __m128d b, __m128i c, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfixupimmpd

Fix up packed double-precision (64-bit) floating-point elements in a and b using packed 64-bit integers in c, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set). imm is used to set the required flags reporting.



_mm_maskz_fixupimm_pd

__m128d _mm_maskz_fixupimm_pd(__mmask8 k, __m128d a, __m128d b, __m128i c, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfixupimmpd

Fix up packed double-precision (64-bit) floating-point elements in a and b using packed 64-bit integers in c, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set). imm is used to set the required flags reporting.



_mm256_fixupimm_pd

__m256d _mm256_fixupimm_pd(__m256d a, __m256d b, __m256i c, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfixupimmpd

Fix up packed double-precision (64-bit) floating-point elements in a and b using packed 64-bit integers in c, and return the results. imm is used to set the required flags reporting.



_mm256_mask_fixupimm_pd

__m256d _mm256_mask_fixupimm_pd(__m256d a, __mmask8 k, __m256d b, __m256i c, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfixupimmpd

Fix up packed double-precision (64-bit) floating-point elements in a and b using packed 64-bit integers in c, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set). imm is used to set the required flags reporting.



_mm256_maskz_fixupimm_pd

__m256d _mm256_maskz_fixupimm_pd(__mmask8 k, __m256d a, __m256d b, __m256i c, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfixupimmpd

Fix up packed double-precision (64-bit) floating-point elements in a and b using packed 64-bit integers in c, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set). imm is used to set the required flags reporting.



_mm_fixupimm_ps

__m128 _mm_fixupimm_ps(__m128 a, __m128 b, __m128i c, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfixupimmps

Fix up packed single-precision (32-bit) floating-point elements in a and b using packed 32-bit integers in c, and return the results. imm is used to set the required flags reporting.



_mm_mask_fixupimm_ps

__m128 _mm_mask_fixupimm_ps(__m128 a, __mmask8 k, __m128 b, __m128i c, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfixupimmps

Fix up packed single-precision (32-bit) floating-point elements in a and b using packed 32-bit integers in c, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set). imm is used to set the required flags reporting.



_mm_maskz_fixupimm_ps

__m128 _mm_maskz_fixupimm_ps(__mmask8 k, __m128 a, __m128 b, __m128i c, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfixupimmps

Fix up packed single-precision (32-bit) floating-point elements in a and b using packed 32-bit integers in c, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set). imm is used to set the required flags reporting.



_mm256_fixupimm_ps

__m256 _mm256_fixupimm_ps(__m256 a, __m256 b, __m256i c, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfixupimmps

Fix up packed single-precision (32-bit) floating-point elements in a and b using packed 32-bit integers in c, and return the results. imm is used to set the required flags reporting.



_mm256_mask_fixupimm_ps

__m256 _mm256_mask_fixupimm_ps(__m256 a, __mmask8 k, __m256 b, __m256i c, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfixupimmps

Fix up packed single-precision (32-bit) floating-point elements in a and b using packed 32-bit integers in c, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set). imm is used to set the required flags reporting.



_mm256_maskz_fixupimm_ps

__m256 _mm256_maskz_fixupimm_ps(__mmask8 k, __m256 a, __m256 b, __m256i c, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vfixupimmps

Fix up packed single-precision (32-bit) floating-point elements in a and b using packed 32-bit integers in c, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set). imm is used to set the required flags reporting.



_mm_getexp_pd

__m128d _mm_getexp_pd(__m128d a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vgetexppd

Convert the exponent of each packed double-precision (64-bit) floating-point element in a to a double-precision (64-bit) floating-point number representing the integer exponent, and return the results. This intrinsic essentially calculates floor(log2(x)) for each element.



_mm_mask_getexp_pd

__m128d _mm_mask_getexp_pd(__m128d src, __mmask8 k, __m128d a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vgetexppd

Convert the exponent of each packed double-precision (64-bit) floating-point element in a to a double-precision (64-bit) floating-point number representing the integer exponent, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.



_mm_maskz_getexp_pd

__m128d _mm_maskz_getexp_pd(__mmask8 k, __m128d a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vgetexppd

Convert the exponent of each packed double-precision (64-bit) floating-point element in a to a double-precision (64-bit) floating-point number representing the integer exponent, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.



_mm256_getexp_pd

__m256d _mm256_getexp_pd(__m256d a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vgetexppd

Convert the exponent of each packed double-precision (64-bit) floating-point element in a to a double-precision (64-bit) floating-point number representing the integer exponent, and return the results. This intrinsic essentially calculates floor(log2(x)) for each element.



_mm256_mask_getexp_pd

__m256d _mm256_mask_getexp_pd(__m256d src, __mmask8 k, __m256d a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vgetexppd

Convert the exponent of each packed double-precision (64-bit) floating-point element in a to a double-precision (64-bit) floating-point number representing the integer exponent, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.



_mm256_maskz_getexp_pd

__m256d _mm256_maskz_getexp_pd(__mmask8 k, __m256d a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vgetexppd

Convert the exponent of each packed double-precision (64-bit) floating-point element in a to a double-precision (64-bit) floating-point number representing the integer exponent, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.



_mm_getexp_ps

__m128 _mm_getexp_ps(__m128 a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vgetexpps

Convert the exponent of each packed single-precision (32-bit) floating-point element in a to a single-precision (32-bit) floating-point number representing the integer exponent, and return the results. This intrinsic essentially calculates floor(log2(x)) for each element.



_mm_mask_getexp_ps

__m128 _mm_mask_getexp_ps(__m128 src, __mmask8 k, __m128 a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vgetexpps

Convert the exponent of each packed single-precision (32-bit) floating-point element in a to a single-precision (32-bit) floating-point number representing the integer exponent, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.



_mm_maskz_getexp_ps

__m128 _mm_maskz_getexp_ps(__mmask8 k, __m128 a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vgetexpps

Convert the exponent of each packed single-precision (32-bit) floating-point element in a to a single-precision (32-bit) floating-point number representing the integer exponent, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.



_mm256_getexp_ps

__m256 _mm256_getexp_ps(__m256 a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vgetexpps

Convert the exponent of each packed single-precision (32-bit) floating-point element in a to a single-precision (32-bit) floating-point number representing the integer exponent, and return the results. This intrinsic essentially calculates floor(log2(x)) for each element.



_mm256_mask_getexp_ps

__m256 _mm256_mask_getexp_ps(__m256 src, __mmask8 k, __m256 a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vgetexpps

Convert the exponent of each packed single-precision (32-bit) floating-point element in a to a single-precision (32-bit) floating-point number representing the integer exponent, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.



_mm256_maskz_getexp_ps

__m256 _mm256_maskz_getexp_ps(__mmask8 k, __m256 a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vgetexpps

Convert the exponent of each packed single-precision (32-bit) floating-point element in a to a single-precision (32-bit) floating-point number representing the integer exponent, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.



_mm_getmant_pd

__m128d _mm_getmant_pd(__m128d a, _MM_MANTISSA_NORM_ENUM interv, _MM_MANTISSA_SIGN_ENUM sc)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vgetmantpd

Normalize the mantissas of packed double-precision (64-bit) floating-point elements in a, and return the results. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.



_mm_mask_getmant_pd

__m128d _mm_mask_getmant_pd(__m128d src, __mmask8 k, __m128d a, _MM_MANTISSA_NORM_ENUM interv, _MM_MANTISSA_SIGN_ENUM sc)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vgetmantpd

Normalize the mantissas of packed double-precision (64-bit) floating-point elements in a, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.



_mm_maskz_getmant_pd

__m128d _mm_maskz_getmant_pd(__mmask8 k, __m128d a, _MM_MANTISSA_NORM_ENUM interv, _MM_MANTISSA_SIGN_ENUM sc)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vgetmantpd

Normalize the mantissas of packed double-precision (64-bit) floating-point elements in a, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.



_mm256_getmant_pd

__m256d _mm256_getmant_pd(__m256d a, _MM_MANTISSA_NORM_ENUM interv, _MM_MANTISSA_SIGN_ENUM sc)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vgetmantpd

Normalize the mantissas of packed double-precision (64-bit) floating-point elements in a, and return the results. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.



_mm256_mask_getmant_pd

__m256d _mm256_mask_getmant_pd(__m256d src, __mmask8 k, __m256d a, _MM_MANTISSA_NORM_ENUM interv, _MM_MANTISSA_SIGN_ENUM sc)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vgetmantpd

Normalize the mantissas of packed double-precision (64-bit) floating-point elements in a, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.



_mm256_maskz_getmant_pd

__m256d _mm256_maskz_getmant_pd(__mmask8 k, __m256d a, _MM_MANTISSA_NORM_ENUM interv, _MM_MANTISSA_SIGN_ENUM sc)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vgetmantpd

Normalize the mantissas of packed double-precision (64-bit) floating-point elements in a, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.



_mm_getmant_ps

__m128 _mm_getmant_ps(__m128 a, _MM_MANTISSA_NORM_ENUM interv, _MM_MANTISSA_SIGN_ENUM sc)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vgetmantps

Normalize the mantissas of packed single-precision (32-bit) floating-point elements in a, and return the results. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.



_mm_mask_getmant_ps

__m128 _mm_mask_getmant_ps(__m128 src, __mmask8 k, __m128 a, _MM_MANTISSA_NORM_ENUM interv, _MM_MANTISSA_SIGN_ENUM sc)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vgetmantps

Normalize the mantissas of packed single-precision (32-bit) floating-point elements in a, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.



_mm_maskz_getmant_ps

__m128 _mm_maskz_getmant_ps(__mmask8 k, __m128 a, _MM_MANTISSA_NORM_ENUM interv, _MM_MANTISSA_SIGN_ENUM sc)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vgetmantps

Normalize the mantissas of packed single-precision (32-bit) floating-point elements in a, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.



_mm256_getmant_ps

__m256 _mm256_getmant_ps(__m256 a, _MM_MANTISSA_NORM_ENUM interv, _MM_MANTISSA_SIGN_ENUM sc)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vgetmantps

Normalize the mantissas of packed single-precision (32-bit) floating-point elements in a, and return the results. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.



_mm256_mask_getmant_ps

__m256 _mm256_mask_getmant_ps(__m256 src, __mmask8 k, __m256 a, _MM_MANTISSA_NORM_ENUM interv, _MM_MANTISSA_SIGN_ENUM sc)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vgetmantps

Normalize the mantissas of packed single-precision (32-bit) floating-point elements in a, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.



_mm256_maskz_getmant_ps

__m256 _mm256_maskz_getmant_ps(__mmask8 k, __m256 a, _MM_MANTISSA_NORM_ENUM interv, _MM_MANTISSA_SIGN_ENUM sc)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vgetmantps

Normalize the mantissas of packed single-precision (32-bit) floating-point elements in a, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.



_mm256_insertf32x4

__m256 _mm256_insertf32x4(__m256 a, __m128 b, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vinsertf32x4

Copy a to the return value, then insert 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from b into dst at the location specified by imm.



_mm256_mask_insertf32x4

__m256 _mm256_mask_insertf32x4(__m256 src, __mmask8 k, __m256 a, __m128 b, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vinsertf32x4

Copy a to tmp, then insert 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from b into tmp at the location specified by imm. Store tmp to the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_insertf32x4

__m256 _mm256_maskz_insertf32x4(__mmask8 k, __m256 a, __m128 b, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vinsertf32x4

Copy a to tmp, then insert 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from b into tmp at the location specified by imm. Store tmp to the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_insertf32x8

__m512 _mm512_insertf32x8(__m512 a, __m256 b, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vinsertf32x8

Copy a to the return value, then insert 256 bits (composed of 8 packed single-precision (32-bit) floating-point elements) from b into dst at the location specified by imm.



_mm512_mask_insertf32x8

__m512 _mm512_mask_insertf32x8(__m512 src, __mmask16 k, __m512 a, __m256 b, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vinsertf32x8

Copy a to tmp, then insert 256 bits (composed of 8 packed single-precision (32-bit) floating-point elements) from b into tmp at the location specified by imm. Store tmp to the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_insertf32x8

__m512 _mm512_maskz_insertf32x8(__mmask16 k, __m512 a, __m256 b, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vinsertf32x8

Copy a to tmp, then insert 256 bits (composed of 8 packed single-precision (32-bit) floating-point elements) from b into tmp at the location specified by imm. Store tmp to the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_insertf64x2

__m256d _mm256_insertf64x2(__m256d a, __m128d b, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vinsertf64x2

Copy a to the return value, then insert 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from b into dst at the location specified by imm.



_mm256_mask_insertf64x2

__m256d _mm256_mask_insertf64x2(__m256d src, __mmask8 k, __m256d a, __m128d b, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vinsertf64x2

Copy a to tmp, then insert 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from b into tmp at the location specified by imm. Store tmp to the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_insertf64x2

__m256d _mm256_maskz_insertf64x2(__mmask8 k, __m256d a, __m128d b, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vinsertf64x2

Copy a to tmp, then insert 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from b into tmp at the location specified by imm. Store tmp to the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_insertf64x2

__m512d _mm512_insertf64x2(__m512d a, __m128d b, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vinsertf64x2

Copy a to the return value, then insert 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from b into dst at the location specified by imm.



_mm512_mask_insertf64x2

__m512d _mm512_mask_insertf64x2(__m512d src, __mmask8 k, __m512d a, __m128d b, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vinsertf64x2

Copy a to tmp, then insert 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from b into tmp at the location specified by imm. Store tmp to the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_insertf64x2

__m512d _mm512_maskz_insertf64x2(__mmask8 k, __m512d a, __m128d b, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vinsertf64x2

Copy a to tmp, then insert 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from b into tmp at the location specified by imm. Store tmp to the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask2_permutex2var_pd

__m128d _mm_mask2_permutex2var_pd(__m128d a, __m128i idx, __mmask8 k, __m128d b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermi2pd

Shuffle double-precision (64-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and return the results using writemask k (elements are copied from idx when the corresponding mask bit is not set)



_mm256_mask2_permutex2var_pd

__m256d _mm256_mask2_permutex2var_pd(__m256d a, __m256i idx, __mmask8 k, __m256d b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermi2pd

Shuffle double-precision (64-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and return the results using writemask k (elements are copied from idx when the corresponding mask bit is not set).



_mm_maskz_permutex2var_pd

__m128d _mm_maskz_permutex2var_pd(__mmask8 k, __m128d a, __m128i idx, __m128d b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermi2pd, vpermt2pd

Shuffle double-precision (64-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_permutex2var_pd

__m128d _mm_permutex2var_pd(__m128d a, __m128i idx, __m128d b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermi2pd, vpermt2pd

Shuffle double-precision (64-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and return the results.



_mm256_maskz_permutex2var_pd

__m256d _mm256_maskz_permutex2var_pd(__mmask8 k, __m256d a, __m256i idx, __m256d b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermi2pd, vpermt2pd

Shuffle double-precision (64-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_permutex2var_pd

__m256d _mm256_permutex2var_pd(__m256d a, __m256i idx, __m256d b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermi2pd, vpermt2pd

Shuffle double-precision (64-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and return the results.



_mm_mask2_permutex2var_ps

__m128 _mm_mask2_permutex2var_ps(__m128 a, __m128i idx, __mmask8 k, __m128 b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermi2ps

Shuffle single-precision (32-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and return the results using writemask k (elements are copied from idx when the corresponding mask bit is not set).



_mm256_mask2_permutex2var_ps

__m256 _mm256_mask2_permutex2var_ps(__m256 a, __m256i idx, __mmask8 k, __m256 b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermi2ps

Shuffle single-precision (32-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and return the results using writemask k (elements are copied from idx when the corresponding mask bit is not set).



_mm_maskz_permutex2var_ps

__m128 _mm_maskz_permutex2var_ps(__mmask8 k, __m128 a, __m128i idx, __m128 b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermi2ps, vpermt2ps

Shuffle single-precision (32-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_permutex2var_ps

__m128 _mm_permutex2var_ps(__m128 a, __m128i idx, __m128 b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermi2ps, vpermt2ps

Shuffle single-precision (32-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and return the results.



_mm256_maskz_permutex2var_ps

__m256 _mm256_maskz_permutex2var_ps(__mmask8 k, __m256 a, __m256i idx, __m256 b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermi2ps, vpermt2ps

Shuffle single-precision (32-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_permutex2var_ps

__m256 _mm256_permutex2var_ps(__m256 a, __m256i idx, __m256 b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermi2ps, vpermt2ps

Shuffle single-precision (32-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and return the results.



_mm_mask_permute_pd

__m128d _mm_mask_permute_pd(__m128d src, __mmask8 k, __m128d a, const int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermilpd

Shuffle double-precision (64-bit) floating-point elements in a using the control in imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_mask_permutevar_pd

__m128d _mm_mask_permutevar_pd(__m128d src, __mmask8 k, __m128d a, __m128i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermilpd

Shuffle double-precision (64-bit) floating-point elements in a using the control in b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_permute_pd

__m128d _mm_maskz_permute_pd(__mmask8 k, __m128d a, const int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermilpd

Shuffle double-precision (64-bit) floating-point elements in a using the control in imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_maskz_permutevar_pd

__m128d _mm_maskz_permutevar_pd(__mmask8 k, __m128d a, __m128i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermilpd

Shuffle double-precision (64-bit) floating-point elements in a using the control in b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_permute_pd

__m256d _mm256_mask_permute_pd(__m256d src, __mmask8 k, __m256d a, const int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermilpd

Shuffle double-precision (64-bit) floating-point elements in a within 128-bit lanes using the control in imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_mask_permutevar_pd

__m256d _mm256_mask_permutevar_pd(__m256d src, __mmask8 k, __m256d a, __m256i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermilpd

Shuffle double-precision (64-bit) floating-point elements in a within 128-bit lanes using the control in b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_permute_pd

__m256d _mm256_maskz_permute_pd(__mmask8 k, __m256d a, const int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermilpd

Shuffle double-precision (64-bit) floating-point elements in a within 128-bit lanes using the control in imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_maskz_permutevar_pd

__m256d _mm256_maskz_permutevar_pd(__mmask8 k, __m256d a, __m256i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermilpd

Shuffle double-precision (64-bit) floating-point elements in a within 128-bit lanes using the control in b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_permute_ps

__m128 _mm_mask_permute_ps(__m128 src, __mmask8 k, __m128 a, const int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermilps

Shuffle single-precision (32-bit) floating-point elements in a using the control in imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_mask_permutevar_ps

__m128 _mm_mask_permutevar_ps(__m128 src, __mmask8 k, __m128 a, __m128i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermilps

Shuffle single-precision (32-bit) floating-point elements in a using the control in b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_permute_ps

__m128 _mm_maskz_permute_ps(__mmask8 k, __m128 a, const int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermilps

Shuffle single-precision (32-bit) floating-point elements in a using the control in imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_maskz_permutevar_ps

__m128 _mm_maskz_permutevar_ps(__mmask8 k, __m128 a, __m128i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermilps

Shuffle single-precision (32-bit) floating-point elements in a using the control in b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_permute_ps

__m256 _mm256_mask_permute_ps(__m256 src, __mmask8 k, __m256 a, const int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermilps

Shuffle single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_mask_permutevar_ps

__m256 _mm256_mask_permutevar_ps(__m256 src, __mmask8 k, __m256 a, __m256i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermilps

Shuffle single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_permute_ps

__m256 _mm256_maskz_permute_ps(__mmask8 k, __m256 a, const int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermilps

Shuffle single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_maskz_permutevar_ps

__m256 _mm256_maskz_permutevar_ps(__mmask8 k, __m256 a, __m256i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermilps

Shuffle single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_permutex_pd

__m256d _mm256_mask_permutex_pd(__m256d src, __mmask8 k, __m256d a, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermpd

Shuffle double-precision (64-bit) floating-point elements in a across lanes using the control in imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_mask_permutexvar_pd

__m256d _mm256_mask_permutexvar_pd(__m256d src, __mmask8 k, __m256i idx, __m256d a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermpd

Shuffle double-precision (64-bit) floating-point elements in a across lanes using the corresponding index in idx, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_permutex_pd

__m256d _mm256_maskz_permutex_pd(__mmask8 k, __m256d a, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermpd

Shuffle double-precision (64-bit) floating-point elements in a across lanes using the control in imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_maskz_permutexvar_pd

__m256d _mm256_maskz_permutexvar_pd(__mmask8 k, __m256i idx, __m256d a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermpd

Shuffle double-precision (64-bit) floating-point elements in a across lanes using the corresponding index in idx, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_permutex_pd

__m256d _mm256_permutex_pd(__m256d a, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermpd

Shuffle double-precision (64-bit) floating-point elements in a across lanes using the control in imm, and return the results.



_mm256_permutexvar_pd

__m256d _mm256_permutexvar_pd(__m256i idx, __m256d a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermpd

Shuffle double-precision (64-bit) floating-point elements in a across lanes using the corresponding index in idx, and return the results.



_mm256_mask_permutexvar_ps

__m256 _mm256_mask_permutexvar_ps(__m256 src, __mmask8 k, __m256i idx, __m256 a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermps

Shuffle single-precision (32-bit) floating-point elements in a across lanes using the corresponding index in idx, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_permutexvar_ps

__m256 _mm256_maskz_permutexvar_ps(__mmask8 k, __m256i idx, __m256 a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermps

Shuffle single-precision (32-bit) floating-point elements in a across lanes using the corresponding index in idx, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_permutexvar_ps

__m256 _mm256_permutexvar_ps(__m256i idx, __m256 a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermps

Shuffle single-precision (32-bit) floating-point elements in a across lanes using the corresponding index in idx.



_mm_mask_permutex2var_pd

__m128d _mm_mask_permutex2var_pd(__m128d a, __mmask8 k, __m128i idx, __m128d b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermt2pd

Shuffle double-precision (64-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).



_mm256_mask_permutex2var_pd

__m256d _mm256_mask_permutex2var_pd(__m256d a, __mmask8 k, __m256i idx, __m256d b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermt2pd

Shuffle double-precision (64-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).



_mm_mask_permutex2var_ps

__m128 _mm_mask_permutex2var_ps(__m128 a, __mmask8 k, __m128i idx, __m128 b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermt2ps

Shuffle single-precision (32-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).



_mm256_mask_permutex2var_ps

__m256 _mm256_mask_permutex2var_ps(__m256 a, __mmask8 k, __m256i idx, __m256 b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermt2ps

Shuffle single-precision (32-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).



_mm_mask_range_pd

__m128d _mm_mask_range_pd(__m128d src, __mmask8 k, __m128d a, __m128d b, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vrangepd

Calculate the max, min, absolute max, or absolute min (depending on control in imm) for packed double-precision (64-bit) floating-point elements in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_range_pd

__m128d _mm_maskz_range_pd(__mmask8 k, __m128d a, __m128d b, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vrangepd

Calculate the max, min, absolute max, or absolute min (depending on control in imm) for packed double-precision (64-bit) floating-point elements in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_range_pd

__m128d _mm_range_pd(__m128d a, __m128d b, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vrangepd

Calculate the max, min, absolute max, or absolute min (depending on control in imm) for packed double-precision (64-bit) floating-point elements in a and b, and return the results.



_mm256_mask_range_pd

__m256d _mm256_mask_range_pd(__m256d src, __mmask8 k, __m256d a, __m256d b, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vrangepd

Calculate the max, min, absolute max, or absolute min (depending on control in imm) for packed double-precision (64-bit) floating-point elements in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_range_pd

__m256d _mm256_maskz_range_pd(__mmask8 k, __m256d a, __m256d b, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vrangepd

Calculate the max, min, absolute max, or absolute min (depending on control in imm) for packed double-precision (64-bit) floating-point elements in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_range_pd

__m256d _mm256_range_pd(__m256d a, __m256d b, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vrangepd

Calculate the max, min, absolute max, or absolute min (depending on control in imm) for packed double-precision (64-bit) floating-point elements in a and b, and return the results.



_mm512_mask_range_pd

__m512d _mm512_mask_range_pd(__m512d src, __mmask8 k, __m512d a, __m512d b, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vrangepd

Calculate the max, min, absolute max, or absolute min (depending on control in imm) for packed double-precision (64-bit) floating-point elements in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_mask_range_round_pd

__m512d _mm512_mask_range_round_pd(__m512d src, __mmask8 k, __m512d a, __m512d b, int imm, int rounding)

CPUID Flags: AVX512DQ

Instruction(s): vrangepd

Calculate the max, min, absolute max, or absolute min (depending on control in imm) for packed double-precision (64-bit) floating-point elements in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_range_pd

__m512d _mm512_maskz_range_pd(__mmask8 k, __m512d a, __m512d b, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vrangepd

Calculate the max, min, absolute max, or absolute min (depending on control in imm) for packed double-precision (64-bit) floating-point elements in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_maskz_range_round_pd

__m512d _mm512_maskz_range_round_pd(__mmask8 k, __m512d a, __m512d b, int imm, int rounding)

CPUID Flags: AVX512DQ

Instruction(s): vrangepd

Calculate the max, min, absolute max, or absolute min (depending on control in imm) for packed double-precision (64-bit) floating-point elements in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_range_pd

__m512d _mm512_range_pd(__m512d a, __m512d b, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vrangepd

Calculate the max, min, absolute max, or absolute min (depending on control in imm) for packed double-precision (64-bit) floating-point elements in a and b, and return the results.



_mm512_range_round_pd

__m512d _mm512_range_round_pd(__m512d a, __m512d b, int imm, int rounding)

CPUID Flags: AVX512DQ

Instruction(s): vrangepd

Calculate the max, min, absolute max, or absolute min (depending on control in imm) for packed double-precision (64-bit) floating-point elements in a and b, and return the results.



_mm_mask_range_ps

__m128 _mm_mask_range_ps(__m128 src, __mmask8 k, __m128 a, __m128 b, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vrangeps

Calculate the max, min, absolute max, or absolute min (depending on control in imm) for packed single-precision (32-bit) floating-point elements in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_range_ps

__m128 _mm_maskz_range_ps(__mmask8 k, __m128 a, __m128 b, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vrangeps

Calculate the max, min, absolute max, or absolute min (depending on control in imm) for packed single-precision (32-bit) floating-point elements in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_range_ps

__m128 _mm_range_ps(__m128 a, __m128 b, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vrangeps

Calculate the max, min, absolute max, or absolute min (depending on control in imm) for packed single-precision (32-bit) floating-point elements in a and b, and return the results.



_mm256_mask_range_ps

__m256 _mm256_mask_range_ps(__m256 src, __mmask8 k, __m256 a, __m256 b, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vrangeps

Calculate the max, min, absolute max, or absolute min (depending on control in imm) for packed single-precision (32-bit) floating-point elements in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_range_ps

__m256 _mm256_maskz_range_ps(__mmask8 k, __m256 a, __m256 b, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vrangeps

Calculate the max, min, absolute max, or absolute min (depending on control in imm) for packed single-precision (32-bit) floating-point elements in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_range_ps

__m256 _mm256_range_ps(__m256 a, __m256 b, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vrangeps

Calculate the max, min, absolute max, or absolute min (depending on control in imm) for packed single-precision (32-bit) floating-point elements in a and b, and return the results.



_mm512_mask_range_ps

__m512 _mm512_mask_range_ps(__m512 src, __mmask16 k, __m512 a, __m512 b, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vrangeps

Calculate the max, min, absolute max, or absolute min (depending on control in imm) for packed single-precision (32-bit) floating-point elements in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_mask_range_round_ps

__m512 _mm512_mask_range_round_ps(__m512 src, __mmask16 k, __m512 a, __m512 b, int imm, int rounding)

CPUID Flags: AVX512DQ

Instruction(s): vrangeps

Calculate the max, min, absolute max, or absolute min (depending on control in imm) for packed single-precision (32-bit) floating-point elements in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_range_ps

__m512 _mm512_maskz_range_ps(__mmask16 k, __m512 a, __m512 b, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vrangeps

Calculate the max, min, absolute max, or absolute min (depending on control in imm) for packed single-precision (32-bit) floating-point elements in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_maskz_range_round_ps

__m512 _mm512_maskz_range_round_ps(__mmask16 k, __m512 a, __m512 b, int imm, int rounding)

CPUID Flags: AVX512DQ

Instruction(s): vrangeps

Calculate the max, min, absolute max, or absolute min (depending on control in imm) for packed single-precision (32-bit) floating-point elements in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_range_ps

__m512 _mm512_range_ps(__m512 a, __m512 b, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vrangeps

Calculate the max, min, absolute max, or absolute min (depending on control in imm) for packed single-precision (32-bit) floating-point elements in a and b, and return the results.



_mm512_range_round_ps

__m512 _mm512_range_round_ps(__m512 a, __m512 b, int imm, int rounding)

CPUID Flags: AVX512DQ

Instruction(s): vrangeps

Calculate the max, min, absolute max, or absolute min (depending on control in imm) for packed single-precision (32-bit) floating-point elements in a and b, and return the results.



_mm_mask_range_round_sd

__m128d _mm_mask_range_round_sd(__m128d src, __mmask8 k, __m128d a, __m128d b, int imm, int rounding)

CPUID Flags: AVX512DQ

Instruction(s): vrangesd

Calculate the max, min, absolute max, or absolute min (depending on control in imm) for the lower double-precision (64-bit) floating-point element in a and b, store the result in the lower element of the return value using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.



_mm_mask_range_sd

__m128d _mm_mask_range_sd(__m128d src, __mmask8 k, __m128d a, __m128d b, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vrangesd

Calculate the max, min, absolute max, or absolute min (depending on control in imm) for the lower double-precision (64-bit) floating-point element in a and b, store the result in the lower element of the return value using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.



_mm_maskz_range_round_sd

__m128d _mm_maskz_range_round_sd(__mmask8 k, __m128d a, __m128d b, int imm, int rounding)

CPUID Flags: AVX512DQ

Instruction(s): vrangesd

Calculate the max, min, absolute max, or absolute min (depending on control in imm) for the lower double-precision (64-bit) floating-point element in a and b, store the result in the lower element of the return value using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.



_mm_maskz_range_sd

__m128d _mm_maskz_range_sd(__mmask8 k, __m128d a, __m128d b, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vrangesd

Calculate the max, min, absolute max, or absolute min (depending on control in imm) for the lower double-precision (64-bit) floating-point element in a and b, store the result in the lower element of the return value using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.



_mm_range_round_sd

__m128d _mm_range_round_sd(__m128d a, __m128d b, int imm, int rounding)

CPUID Flags: AVX512DQ

Instruction(s): vrangesd

Calculate the max, min, absolute max, or absolute min (depending on control in imm) for the lower double-precision (64-bit) floating-point element in a and b, store the result in the lower element of the return value, and copy the upper element from a to the upper element of dst.



_mm_mask_range_round_ss

__m128 _mm_mask_range_round_ss(__m128 src, __mmask8 k, __m128 a, __m128 b, int imm, int rounding)

CPUID Flags: AVX512DQ

Instruction(s): vrangess

Calculate the max, min, absolute max, or absolute min (depending on control in imm) for the lower single-precision (32-bit) floating-point element in a and b, store the result in the lower element of the return value using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.



_mm_mask_range_ss

__m128 _mm_mask_range_ss(__m128 src, __mmask8 k, __m128 a, __m128 b, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vrangess

Calculate the max, min, absolute max, or absolute min (depending on control in imm) for the lower single-precision (32-bit) floating-point element in a and b, store the result in the lower element of the return value using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.



_mm_maskz_range_round_ss

__m128 _mm_maskz_range_round_ss(__mmask8 k, __m128 a, __m128 b, int imm, int rounding)

CPUID Flags: AVX512DQ

Instruction(s): vrangess

Calculate the max, min, absolute max, or absolute min (depending on control in imm) for the lower single-precision (32-bit) floating-point element in a and b, store the result in the lower element of the return value using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.



_mm_maskz_range_ss

__m128 _mm_maskz_range_ss(__mmask8 k, __m128 a, __m128 b, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vrangess

Calculate the max, min, absolute max, or absolute min (depending on control in imm) for the lower single-precision (32-bit) floating-point element in a and b, store the result in the lower element of the return value using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.



_mm_range_round_ss

__m128 _mm_range_round_ss(__m128 a, __m128 b, int imm, int rounding)

CPUID Flags: AVX512DQ

Instruction(s): vrangess

Calculate the max, min, absolute max, or absolute min (depending on control in imm) for the lower single-precision (32-bit) floating-point element in a and b, store the result in the lower element of the return value, and copy the upper 3 packed elements from a to the upper elements of dst.



_mm_mask_reduce_pd

__m128d _mm_mask_reduce_pd(__m128d src, __mmask8 k, __m128d a, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vreducepd

Extract the reduced argument of packed double-precision (64-bit) floating-point elements in a by the number of bits specified by imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_reduce_pd

__m128d _mm_maskz_reduce_pd(__mmask8 k, __m128d a, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vreducepd

Extract the reduced argument of packed double-precision (64-bit) floating-point elements in a by the number of bits specified by imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_reduce_pd

__m128d _mm_reduce_pd(__m128d a, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vreducepd

Extract the reduced argument of packed double-precision (64-bit) floating-point elements in a by the number of bits specified by imm, and return the results.



_mm256_mask_reduce_pd

__m256d _mm256_mask_reduce_pd(__m256d src, __mmask8 k, __m256d a, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vreducepd

Extract the reduced argument of packed double-precision (64-bit) floating-point elements in a by the number of bits specified by imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_reduce_pd

__m256d _mm256_maskz_reduce_pd(__mmask8 k, __m256d a, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vreducepd

Extract the reduced argument of packed double-precision (64-bit) floating-point elements in a by the number of bits specified by imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_reduce_pd

__m256d _mm256_reduce_pd(__m256d a, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vreducepd

Extract the reduced argument of packed double-precision (64-bit) floating-point elements in a by the number of bits specified by imm, and return the results.



_mm512_mask_reduce_pd

__m512d _mm512_mask_reduce_pd(__m512d src, __mmask8 k, __m512d a, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vreducepd

Extract the reduced argument of packed double-precision (64-bit) floating-point elements in a by the number of bits specified by imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_mask_reduce_round_pd

__m512d _mm512_mask_reduce_round_pd(__m512d src, __mmask8 k, __m512d a, int imm, int rounding)

CPUID Flags: AVX512DQ

Instruction(s): vreducepd

Extract the reduced argument of packed double-precision (64-bit) floating-point elements in a by the number of bits specified by imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_reduce_pd

__m512d _mm512_maskz_reduce_pd(__mmask8 k, __m512d a, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vreducepd

Extract the reduced argument of packed double-precision (64-bit) floating-point elements in a by the number of bits specified by imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_maskz_reduce_round_pd

__m512d _mm512_maskz_reduce_round_pd(__mmask8 k, __m512d a, int imm, int rounding)

CPUID Flags: AVX512DQ

Instruction(s): vreducepd

Extract the reduced argument of packed double-precision (64-bit) floating-point elements in a by the number of bits specified by imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_reduce_pd

__m512d _mm512_reduce_pd(__m512d a, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vreducepd

Extract the reduced argument of packed double-precision (64-bit) floating-point elements in a by the number of bits specified by imm, and return the results.



_mm512_reduce_round_pd

__m512d _mm512_reduce_round_pd(__m512d a, int imm, int rounding)

CPUID Flags: AVX512DQ

Instruction(s): vreducepd

Extract the reduced argument of packed double-precision (64-bit) floating-point elements in a by the number of bits specified by imm, and return the results.



_mm_mask_reduce_ps

__m128 _mm_mask_reduce_ps(__m128 src, __mmask8 k, __m128 a, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vreduceps

Extract the reduced argument of packed single-precision (32-bit) floating-point elements in a by the number of bits specified by imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_reduce_ps

__m128 _mm_maskz_reduce_ps(__mmask8 k, __m128 a, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vreduceps

Extract the reduced argument of packed single-precision (32-bit) floating-point elements in a by the number of bits specified by imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_reduce_ps

__m128 _mm_reduce_ps(__m128 a, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vreduceps

Extract the reduced argument of packed single-precision (32-bit) floating-point elements in a by the number of bits specified by imm, and return the results.



_mm256_mask_reduce_ps

__m256 _mm256_mask_reduce_ps(__m256 src, __mmask8 k, __m256 a, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vreduceps

Extract the reduced argument of packed single-precision (32-bit) floating-point elements in a by the number of bits specified by imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_reduce_ps

__m256 _mm256_maskz_reduce_ps(__mmask8 k, __m256 a, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vreduceps

Extract the reduced argument of packed single-precision (32-bit) floating-point elements in a by the number of bits specified by imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_reduce_ps

__m256 _mm256_reduce_ps(__m256 a, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vreduceps

Extract the reduced argument of packed single-precision (32-bit) floating-point elements in a by the number of bits specified by imm, and return the results.



_mm512_mask_reduce_ps

__m512 _mm512_mask_reduce_ps(__m512 src, __mmask16 k, __m512 a, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vreduceps

Extract the reduced argument of packed single-precision (32-bit) floating-point elements in a by the number of bits specified by imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_mask_reduce_round_ps

__m512 _mm512_mask_reduce_round_ps(__m512 src, __mmask16 k, __m512 a, int imm, int rounding)

CPUID Flags: AVX512DQ

Instruction(s): vreduceps

Extract the reduced argument of packed single-precision (32-bit) floating-point elements in a by the number of bits specified by imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_reduce_ps

__m512 _mm512_maskz_reduce_ps(__mmask16 k, __m512 a, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vreduceps

Extract the reduced argument of packed single-precision (32-bit) floating-point elements in a by the number of bits specified by imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_maskz_reduce_round_ps

__m512 _mm512_maskz_reduce_round_ps(__mmask16 k, __m512 a, int imm, int rounding)

CPUID Flags: AVX512DQ

Instruction(s): vreduceps

Extract the reduced argument of packed single-precision (32-bit) floating-point elements in a by the number of bits specified by imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_reduce_ps

__m512 _mm512_reduce_ps(__m512 a, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vreduceps

Extract the reduced argument of packed single-precision (32-bit) floating-point elements in a by the number of bits specified by imm, and return the results.



_mm512_reduce_round_ps

__m512 _mm512_reduce_round_ps(__m512 a, int imm, int rounding)

CPUID Flags: AVX512DQ

Instruction(s): vreduceps

Extract the reduced argument of packed single-precision (32-bit) floating-point elements in a by the number of bits specified by imm, and return the results.



_mm_mask_reduce_round_sd

__m128d _mm_mask_reduce_round_sd(__m128d src, __mmask8 k, __m128d a, __m128d b, int imm, int rounding)

CPUID Flags: AVX512DQ

Instruction(s): vreducesd

Extract the reduced argument of the lower double-precision (64-bit) floating-point element in a by the number of bits specified by imm, store the result in the lower element of the return value using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from b to the upper element of dst.



_mm_mask_reduce_sd

__m128d _mm_mask_reduce_sd(__m128d src, __mmask8 k, __m128d a, __m128d b, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vreducesd

Extract the reduced argument of the lower double-precision (64-bit) floating-point element in a by the number of bits specified by imm, store the result in the lower element of the return value using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from b to the upper element of dst.



_mm_maskz_reduce_round_sd

__m128d _mm_maskz_reduce_round_sd(__mmask8 k, __m128d a, __m128d b, int imm, int rounding)

CPUID Flags: AVX512DQ

Instruction(s): vreducesd

Extract the reduced argument of the lower double-precision (64-bit) floating-point element in a by the number of bits specified by imm, store the result in the lower element of the return value using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from b to the upper element of dst.



_mm_maskz_reduce_sd

__m128d _mm_maskz_reduce_sd(__mmask8 k, __m128d a, __m128d b, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vreducesd

Extract the reduced argument of the lower double-precision (64-bit) floating-point element in a by the number of bits specified by imm, store the result in the lower element of the return value using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from b to the upper element of dst.



_mm_reduce_round_sd

__m128d _mm_reduce_round_sd(__m128d a, __m128d b, int imm, int rounding)

CPUID Flags: AVX512DQ

Instruction(s): vreducesd

Extract the reduced argument of the lower double-precision (64-bit) floating-point element in a by the number of bits specified by imm, store the result in the lower element of the return value, and copy the upper element from b to the upper element of dst.



_mm_reduce_sd

__m128d _mm_reduce_sd(__m128d a, __m128d b, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vreducesd

Extract the reduced argument of the lower double-precision (64-bit) floating-point element in a by the number of bits specified by imm, store the result in the lower element of the return value, and copy the upper element from b to the upper element of dst.



_mm_mask_reduce_round_ss

__m128 _mm_mask_reduce_round_ss(__m128 src, __mmask8 k, __m128 a, __m128 b, int imm, int rounding)

CPUID Flags: AVX512DQ

Instruction(s): vreducess

Extract the reduced argument of the lower single-precision (32-bit) floating-point element in a by the number of bits specified by imm, store the result in the lower element of the return value using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 3 packed elements from b to the upper elements of dst.



_mm_mask_reduce_ss

__m128 _mm_mask_reduce_ss(__m128 src, __mmask8 k, __m128 a, __m128 b, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vreducess

Extract the reduced argument of the lower single-precision (32-bit) floating-point element in a by the number of bits specified by imm, store the result in the lower element of the return value using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 3 packed elements from b to the upper elements of dst.



_mm_maskz_reduce_round_ss

__m128 _mm_maskz_reduce_round_ss(__mmask8 k, __m128 a, __m128 b, int imm, int rounding)

CPUID Flags: AVX512DQ

Instruction(s): vreducess

Extract the reduced argument of the lower single-precision (32-bit) floating-point element in a by the number of bits specified by imm, store the result in the lower element of the return value using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from b to the upper elements of dst.



_mm_maskz_reduce_ss

__m128 _mm_maskz_reduce_ss(__mmask8 k, __m128 a, __m128 b, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vreducess

Extract the reduced argument of the lower single-precision (32-bit) floating-point element in a by the number of bits specified by imm, store the result in the lower element of the return value using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from b to the upper elements of dst.



_mm_reduce_round_ss

__m128 _mm_reduce_round_ss(__m128 a, __m128 b, int imm, int rounding)

CPUID Flags: AVX512DQ

Instruction(s): vreducess

Extract the reduced argument of the lower single-precision (32-bit) floating-point element in a by the number of bits specified by imm, store the result in the lower element of the return value, and copy the upper 3 packed elements from b to the upper elements of dst.



_mm_reduce_ss

__m128 _mm_reduce_ss(__m128 a, __m128 b, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vreducess

Extract the reduced argument of the lower single-precision (32-bit) floating-point element in a by the number of bits specified by imm, store the result in the lower element of the return value, and copy the upper 3 packed elements from b to the upper elements of dst.



_mm_mask_roundscale_pd

__m128d _mm_mask_roundscale_pd(__m128d src, __mmask8 k, __m128d a, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vrndscalepd

Round packed double-precision (64-bit) floating-point elements in a to the number of fraction bits specified by imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_roundscale_pd

__m128d _mm_maskz_roundscale_pd(__mmask8 k, __m128d a, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vrndscalepd

Round packed double-precision (64-bit) floating-point elements in a to the number of fraction bits specified by imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_roundscale_pd

__m128d _mm_roundscale_pd(__m128d a, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vrndscalepd

Round packed double-precision (64-bit) floating-point elements in a to the number of fraction bits specified by imm, and return the results.



_mm256_mask_roundscale_pd

__m256d _mm256_mask_roundscale_pd(__m256d src, __mmask8 k, __m256d a, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vrndscalepd

Round packed double-precision (64-bit) floating-point elements in a to the number of fraction bits specified by imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_roundscale_pd

__m256d _mm256_maskz_roundscale_pd(__mmask8 k, __m256d a, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vrndscalepd

Round packed double-precision (64-bit) floating-point elements in a to the number of fraction bits specified by imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_roundscale_pd

__m256d _mm256_roundscale_pd(__m256d a, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vrndscalepd

Round packed double-precision (64-bit) floating-point elements in a to the number of fraction bits specified by imm, and return the results.



_mm_mask_roundscale_ps

__m128 _mm_mask_roundscale_ps(__m128 src, __mmask8 k, __m128 a, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vrndscaleps

Round packed single-precision (32-bit) floating-point elements in a to the number of fraction bits specified by imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_roundscale_ps

__m128 _mm_maskz_roundscale_ps(__mmask8 k, __m128 a, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vrndscaleps

Round packed single-precision (32-bit) floating-point elements in a to the number of fraction bits specified by imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_roundscale_ps

__m128 _mm_roundscale_ps(__m128 a, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vrndscaleps

Round packed single-precision (32-bit) floating-point elements in a to the number of fraction bits specified by imm, and return the results.



_mm256_mask_roundscale_ps

__m256 _mm256_mask_roundscale_ps(__m256 src, __mmask8 k, __m256 a, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vrndscaleps

Round packed single-precision (32-bit) floating-point elements in a to the number of fraction bits specified by imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_roundscale_ps

__m256 _mm256_maskz_roundscale_ps(__mmask8 k, __m256 a, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vrndscaleps

Round packed single-precision (32-bit) floating-point elements in a to the number of fraction bits specified by imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_roundscale_ps

__m256 _mm256_roundscale_ps(__m256 a, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vrndscaleps

Round packed single-precision (32-bit) floating-point elements in a to the number of fraction bits specified by imm, and return the results.



_mm_mask_scalef_pd

__m128d _mm_mask_scalef_pd(__m128d src, __mmask8 k, __m128d a, __m128d b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vscalefpd

Scale the packed double-precision (64-bit) floating-point elements in a using values from b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_scalef_pd

__m128d _mm_maskz_scalef_pd(__mmask8 k, __m128d a, __m128d b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vscalefpd

Scale the packed double-precision (64-bit) floating-point elements in a using values from b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_scalef_pd

__m128d _mm_scalef_pd(__m128d a, __m128d b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vscalefpd

Scale the packed double-precision (64-bit) floating-point elements in a using values from b, and return the results.



_mm256_mask_scalef_pd

__m256d _mm256_mask_scalef_pd(__m256d src, __mmask8 k, __m256d a, __m256d b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vscalefpd

Scale the packed double-precision (64-bit) floating-point elements in a using values from b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_scalef_pd

__m256d _mm256_maskz_scalef_pd(__mmask8 k, __m256d a, __m256d b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vscalefpd

Scale the packed double-precision (64-bit) floating-point elements in a using values from b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_scalef_pd

__m256d _mm256_scalef_pd(__m256d a, __m256d b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vscalefpd

Scale the packed double-precision (64-bit) floating-point elements in a using values from b, and return the results.



_mm_mask_scalef_ps

__m128 _mm_mask_scalef_ps(__m128 src, __mmask8 k, __m128 a, __m128 b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vscalefps

Scale the packed single-precision (32-bit) floating-point elements in a using values from b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_scalef_ps

__m128 _mm_maskz_scalef_ps(__mmask8 k, __m128 a, __m128 b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vscalefps

Scale the packed single-precision (32-bit) floating-point elements in a using values from b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_scalef_ps

__m128 _mm_scalef_ps(__m128 a, __m128 b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vscalefps

Scale the packed single-precision (32-bit) floating-point elements in a using values from b, and return the results.



_mm256_mask_scalef_ps

__m256 _mm256_mask_scalef_ps(__m256 src, __mmask8 k, __m256 a, __m256 b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vscalefps

Scale the packed single-precision (32-bit) floating-point elements in a using values from b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_scalef_ps

__m256 _mm256_maskz_scalef_ps(__mmask8 k, __m256 a, __m256 b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vscalefps

Scale the packed single-precision (32-bit) floating-point elements in a using values from b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_scalef_ps

__m256 _mm256_scalef_ps(__m256 a, __m256 b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vscalefps

Scale the packed single-precision (32-bit) floating-point elements in a using values from b, and return the results.



_mm256_mask_shuffle_f32x4

__m256 _mm256_mask_shuffle_f32x4(__m256 src, __mmask8 k, __m256 a, __m256 b, const int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vshuff32x4

Shuffle 128-bits (composed of 4 single-precision (32-bit) floating-point elements) selected by imm from a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_shuffle_f32x4

__m256 _mm256_maskz_shuffle_f32x4(__mmask8 k, __m256 a, __m256 b, const int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vshuff32x4

Shuffle 128-bits (composed of 4 single-precision (32-bit) floating-point elements) selected by imm from a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_shuffle_f32x4

__m256 _mm256_shuffle_f32x4(__m256 a, __m256 b, const int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vshuff32x4

Shuffle 128-bits (composed of 4 single-precision (32-bit) floating-point elements) selected by imm from a and b, and return the results.



_mm256_mask_shuffle_f64x2

__m256d _mm256_mask_shuffle_f64x2(__m256d src, __mmask8 k, __m256d a, __m256d b, const int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vshuff64x2

Shuffle 128-bits (composed of 2 double-precision (64-bit) floating-point elements) selected by imm from a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_shuffle_f64x2

__m256d _mm256_maskz_shuffle_f64x2(__mmask8 k, __m256d a, __m256d b, const int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vshuff64x2

Shuffle 128-bits (composed of 2 double-precision (64-bit) floating-point elements) selected by imm from a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_shuffle_f64x2

__m256d _mm256_shuffle_f64x2(__m256d a, __m256d b, const int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vshuff64x2

Shuffle 128-bits (composed of 2 double-precision (64-bit) floating-point elements) selected by imm from a and b, and return the results.



_mm_mask_shuffle_pd

__m128d _mm_mask_shuffle_pd(__m128d src, __mmask8 k, __m128d a, __m128d b, const int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vshufpd

Shuffle double-precision (64-bit) floating-point elements using the control in imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_shuffle_pd

__m128d _mm_maskz_shuffle_pd(__mmask8 k, __m128d a, __m128d b, const int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vshufpd

Shuffle double-precision (64-bit) floating-point elements using the control in imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_shuffle_pd

__m256d _mm256_mask_shuffle_pd(__m256d src, __mmask8 k, __m256d a, __m256d b, const int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vshufpd

Shuffle double-precision (64-bit) floating-point elements within 128-bit lanes using the control in imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_shuffle_pd

__m256d _mm256_maskz_shuffle_pd(__mmask8 k, __m256d a, __m256d b, const int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vshufpd

Shuffle double-precision (64-bit) floating-point elements within 128-bit lanes using the control in imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_shuffle_ps

__m128 _mm_mask_shuffle_ps(__m128 src, __mmask8 k, __m128 a, __m128 b, const int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vshufps

Shuffle single-precision (32-bit) floating-point elements in a using the control in imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_shuffle_ps

__m128 _mm_maskz_shuffle_ps(__mmask8 k, __m128 a, __m128 b, const int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vshufps

Shuffle single-precision (32-bit) floating-point elements in a using the control in imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_shuffle_ps

__m256 _mm256_mask_shuffle_ps(__m256 src, __mmask8 k, __m256 a, __m256 b, const int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vshufps

Shuffle single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_shuffle_ps

__m256 _mm256_maskz_shuffle_ps(__mmask8 k, __m256 a, __m256 b, const int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vshufps

Shuffle single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_unpackhi_pd

__m128d _mm_mask_unpackhi_pd(__m128d src, __mmask8 k, __m128d a, __m128d b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vunpckhpd

Unpack and interleave double-precision (64-bit) floating-point elements from the high half of a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_unpackhi_pd

__m128d _mm_maskz_unpackhi_pd(__mmask8 k, __m128d a, __m128d b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vunpckhpd

Unpack and interleave double-precision (64-bit) floating-point elements from the high half of a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_unpackhi_pd

__m256d _mm256_mask_unpackhi_pd(__m256d src, __mmask8 k, __m256d a, __m256d b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vunpckhpd

Unpack and interleave double-precision (64-bit) floating-point elements from the high half of each 128-bit lane in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_unpackhi_pd

__m256d _mm256_maskz_unpackhi_pd(__mmask8 k, __m256d a, __m256d b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vunpckhpd

Unpack and interleave double-precision (64-bit) floating-point elements from the high half of each 128-bit lane in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_unpackhi_ps

__m128 _mm_mask_unpackhi_ps(__m128 src, __mmask8 k, __m128 a, __m128 b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vunpckhps

Unpack and interleave single-precision (32-bit) floating-point elements from the high half of a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_unpackhi_ps

__m128 _mm_maskz_unpackhi_ps(__mmask8 k, __m128 a, __m128 b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vunpckhps

Unpack and interleave single-precision (32-bit) floating-point elements from the high half of a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_unpackhi_ps

__m256 _mm256_mask_unpackhi_ps(__m256 src, __mmask8 k, __m256 a, __m256 b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vunpckhps

Unpack and interleave single-precision (32-bit) floating-point elements from the high half of each 128-bit lane in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_unpackhi_ps

__m256 _mm256_maskz_unpackhi_ps(__mmask8 k, __m256 a, __m256 b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vunpckhps

Unpack and interleave single-precision (32-bit) floating-point elements from the high half of each 128-bit lane in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_unpacklo_pd

__m128d _mm_mask_unpacklo_pd(__m128d src, __mmask8 k, __m128d a, __m128d b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vunpcklpd

Unpack and interleave double-precision (64-bit) floating-point elements from the low half of a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_unpacklo_pd

__m128d _mm_maskz_unpacklo_pd(__mmask8 k, __m128d a, __m128d b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vunpcklpd

Unpack and interleave double-precision (64-bit) floating-point elements from the low half of a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_unpacklo_pd

__m256d _mm256_mask_unpacklo_pd(__m256d src, __mmask8 k, __m256d a, __m256d b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vunpcklpd

Unpack and interleave double-precision (64-bit) floating-point elements from the low half of each 128-bit lane in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_unpacklo_pd

__m256d _mm256_maskz_unpacklo_pd(__mmask8 k, __m256d a, __m256d b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vunpcklpd

Unpack and interleave double-precision (64-bit) floating-point elements from the low half of each 128-bit lane in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_unpacklo_ps

__m128 _mm_mask_unpacklo_ps(__m128 src, __mmask8 k, __m128 a, __m128 b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vunpcklps

Unpack and interleave single-precision (32-bit) floating-point elements from the low half of a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_unpacklo_ps

__m128 _mm_maskz_unpacklo_ps(__mmask8 k, __m128 a, __m128 b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vunpcklps

Unpack and interleave single-precision (32-bit) floating-point elements from the low half of a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_unpacklo_ps

__m256 _mm256_mask_unpacklo_ps(__m256 src, __mmask8 k, __m256 a, __m256 b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vunpcklps

Unpack and interleave single-precision (32-bit) floating-point elements from the low half of each 128-bit lane in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_unpacklo_ps

__m256 _mm256_maskz_unpacklo_ps(__mmask8 k, __m256 a, __m256 b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vunpcklps

Unpack and interleave single-precision (32-bit) floating-point elements from the low half of each 128-bit lane in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_alignr_epi32

__m128i _mm_alignr_epi32(__m128i a, __m128i b, const int count)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): valignd

Concatenate a and b into a 32-byte immediate result, shift the result right by count 32-bit elements, and store the low 16 bytes (4 elements) in the return value.



_mm_mask_alignr_epi32

__m128i _mm_mask_alignr_epi32(__m128i src, __mmask8 k, __m128i a, __m128i b, const int count)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): valignd

Concatenate a and b into a 32-byte immediate result, shift the result right by count 32-bit elements, and store the low 16 bytes (4 elements) in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_alignr_epi32

__m128i _mm_maskz_alignr_epi32(__mmask8 k, __m128i a, __m128i b, const int count)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): valignd

Concatenate a and b into a 32-byte immediate result, shift the result right by count 32-bit elements, and store the low 16 bytes (4 elements) in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_alignr_epi32

__m256i _mm256_alignr_epi32(__m256i a, __m256i b, const int count)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): valignd

Concatenate a and b into a 64-byte immediate result, shift the result right by count 32-bit elements, and store the low 32 bytes (8 elements) in the return value.



_mm256_mask_alignr_epi32

__m256i _mm256_mask_alignr_epi32(__m256i src, __mmask8 k, __m256i a, __m256i b, const int count)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): valignd

Concatenate a and b into a 64-byte immediate result, shift the result right by count 32-bit elements, and store the low 32 bytes (8 elements) in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_alignr_epi32

__m256i _mm256_maskz_alignr_epi32(__mmask8 k, __m256i a, __m256i b, const int count)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): valignd

Concatenate a and b into a 64-byte immediate result, shift the result right by count 32-bit elements, and store the low 32 bytes (8 elements) in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_alignr_epi64

__m128i _mm_alignr_epi64(__m128i a, __m128i b, const int count)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): valignq

Concatenate a and b into a 32-byte immediate result, shift the result right by count 64-bit elements, and store the low 16 bytes (2 elements) in the return value.



_mm_mask_alignr_epi64

__m128i _mm_mask_alignr_epi64(__m128i src, __mmask8 k, __m128i a, __m128i b, const int count)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): valignq

Concatenate a and b into a 32-byte immediate result, shift the result right by count 64-bit elements, and store the low 16 bytes (2 elements) in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_alignr_epi64

__m128i _mm_maskz_alignr_epi64(__mmask8 k, __m128i a, __m128i b, const int count)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): valignq

Concatenate a and b into a 32-byte immediate result, shift the result right by count 64-bit elements, and store the low 16 bytes (2 elements) in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_alignr_epi64

__m256i _mm256_alignr_epi64(__m256i a, __m256i b, const int count)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): valignq

Concatenate a and b into a 64-byte immediate result, shift the result right by count 64-bit elements, and store the low 32 bytes (4 elements) in the return value.



_mm256_mask_alignr_epi64

__m256i _mm256_mask_alignr_epi64(__m256i src, __mmask8 k, __m256i a, __m256i b, const int count)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): valignq

Concatenate a and b into a 64-byte immediate result, shift the result right by count 64-bit elements, and store the low 32 bytes (4 elements) in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_alignr_epi64

__m256i _mm256_maskz_alignr_epi64(__mmask8 k, __m256i a, __m256i b, const int count)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): valignq

Concatenate a and b into a 64-byte immediate result, shift the result right by count 64-bit elements, and store the low 32 bytes (4 elements) in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_dbsad_epu8

__m128i _mm_dbsad_epu8(__m128i a, __m128i b, int imm)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vdbpsadbw

Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in a compared to those in b, and store the 16-bit results in the return value.



_mm_mask_dbsad_epu8

__m128i _mm_mask_dbsad_epu8(__m128i src, __mmask8 k, __m128i a, __m128i b, int imm)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vdbpsadbw

Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in a compared to those in b, and store the 16-bit results in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_dbsad_epu8

__m128i _mm_maskz_dbsad_epu8(__mmask8 k, __m128i a, __m128i b, int imm)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vdbpsadbw

Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in a compared to those in b, and store the 16-bit results in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_dbsad_epu8

__m256i _mm256_dbsad_epu8(__m256i a, __m256i b, int imm)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vdbpsadbw

Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in a compared to those in b, and store the 16-bit results in the return value.



_mm256_mask_dbsad_epu8

__m256i _mm256_mask_dbsad_epu8(__m256i src, __mmask16 k, __m256i a, __m256i b, int imm)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vdbpsadbw

Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in a compared to those in b, and store the 16-bit results in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_dbsad_epu8

__m256i _mm256_maskz_dbsad_epu8(__mmask16 k, __m256i a, __m256i b, int imm)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vdbpsadbw

Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in a compared to those in b, and store the 16-bit results in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_dbsad_epu8

__m512i _mm512_dbsad_epu8(__m512i a, __m512i b, int imm)

CPUID Flags: AVX512BW

Instruction(s): vdbpsadbw

Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in a compared to those in b, and store the 16-bit results in the return value.



_mm512_mask_dbsad_epu8

__m512i _mm512_mask_dbsad_epu8(__m512i src, __mmask32 k, __m512i a, __m512i b, int imm)

CPUID Flags: AVX512BW

Instruction(s): vdbpsadbw

Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in a compared to those in b, and store the 16-bit results in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_dbsad_epu8

__m512i _mm512_maskz_dbsad_epu8(__mmask32 k, __m512i a, __m512i b, int imm)

CPUID Flags: AVX512BW

Instruction(s): vdbpsadbw

Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in a compared to those in b, and store the 16-bit results in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_extracti32x4_epi32

__m128i _mm256_extracti32x4_epi32(__m256i a, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vextracti32x4

Extract 128 bits (composed of 4 packed 32-bit integers) from a, selected with imm, and store the result in the return value.



_mm256_mask_extracti32x4_epi32

__m128i _mm256_mask_extracti32x4_epi32(__m128i src, __mmask8 k, __m256i a, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vextracti32x4

Extract 128 bits (composed of 4 packed 32-bit integers) from a, selected with imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_extracti32x4_epi32

__m128i _mm256_maskz_extracti32x4_epi32(__mmask8 k, __m256i a, int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vextracti32x4

Extract 128 bits (composed of 4 packed 32-bit integers) from a, selected with imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_extracti32x8_epi32

__m256i _mm512_extracti32x8_epi32(__m512i a, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vextracti32x8

Extract 256 bits (composed of 8 packed 32-bit integers) from a, selected with imm, and store the result in the return value.



_mm512_mask_extracti32x8_epi32

__m256i _mm512_mask_extracti32x8_epi32(__m256i src, __mmask8 k, __m512i a, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vextracti32x8

Extract 256 bits (composed of 8 packed 32-bit integers) from a, selected with imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_extracti32x8_epi32

__m256i _mm512_maskz_extracti32x8_epi32(__mmask8 k, __m512i a, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vextracti32x8

Extract 256 bits (composed of 8 packed 32-bit integers) from a, selected with imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_extracti64x2_epi64

__m128i _mm256_extracti64x2_epi64(__m256i a, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vextracti64x2

Extract 128 bits (composed of 2 packed 64-bit integers) from a, selected with imm, and store the result in the return value.



_mm256_mask_extracti64x2_epi64

__m128i _mm256_mask_extracti64x2_epi64(__m128i src, __mmask8 k, __m256i a, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vextracti64x2

Extract 128 bits (composed of 2 packed 64-bit integers) from a, selected with imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_extracti64x2_epi64

__m128i _mm256_maskz_extracti64x2_epi64(__mmask8 k, __m256i a, int imm)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vextracti64x2

Extract 128 bits (composed of 2 packed 64-bit integers) from a, selected with imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_extracti64x2_epi64

__m128i _mm512_extracti64x2_epi64(__m512i a, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vextracti64x2

Extract 128 bits (composed of 2 packed 64-bit integers) from a, selected with imm, and store the result in the return value.



_mm512_mask_extracti64x2_epi64

__m128i _mm512_mask_extracti64x2_epi64(__m128i src, __mmask8 k, __m512i a, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vextracti64x2

Extract 128 bits (composed of 2 packed 64-bit integers) from a, selected with imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_extracti64x2_epi64

__m128i _mm512_maskz_extracti64x2_epi64(__mmask8 k, __m512i a, int imm)

CPUID Flags: AVX512DQ

Instruction(s): vextracti64x2

Extract 128 bits (composed of 2 packed 64-bit integers) from a, selected with imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_alignr_epi8

__m128i _mm_mask_alignr_epi8(__m128i src, __mmask16 k, __m128i a, __m128i b, const int count)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpalignr

Concatenate pairs of 16-byte blocks in a and b into a 32-byte temporary result, shift the result right by count bytes, and store the low 16 bytes in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_alignr_epi8

__m128i _mm_maskz_alignr_epi8(__mmask16 k, __m128i a, __m128i b, const int count)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpalignr

Concatenate pairs of 16-byte blocks in a and b into a 32-byte temporary result, shift the result right by count bytes, and store the low 16 bytes in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_alignr_epi8

__m256i _mm256_mask_alignr_epi8(__m256i src, __mmask32 k, __m256i a, __m256i b, const int count)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpalignr

Concatenate pairs of 16-byte blocks in a and b into a 32-byte temporary result, shift the result right by count bytes, and store the low 16 bytes in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_alignr_epi8

__m256i _mm256_maskz_alignr_epi8(__mmask32 k, __m256i a, __m256i b, const int count)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpalignr

Concatenate pairs of 16-byte blocks in a and b into a 32-byte temporary result, shift the result right by count bytes, and store the low 16 bytes in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_alignr_epi8

__m512i _mm512_alignr_epi8(__m512i a, __m512i b, const int count)

CPUID Flags: AVX512BW

Instruction(s): vpalignr

Concatenate pairs of 16-byte blocks in a and b into a 32-byte temporary result, shift the result right by count bytes, and store the low 16 bytes in the return value.



_mm512_mask_alignr_epi8

__m512i _mm512_mask_alignr_epi8(__m512i src, __mmask64 k, __m512i a, __m512i b, const int count)

CPUID Flags: AVX512BW

Instruction(s): vpalignr

Concatenate pairs of 16-byte blocks in a and b into a 32-byte temporary result, shift the result right by count bytes, and store the low 16 bytes in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_alignr_epi8

__m512i _mm512_maskz_alignr_epi8(__mmask64 k, __m512i a, __m512i b, const int count)

CPUID Flags: AVX512BW

Instruction(s): vpalignr

Concatenate pairs of 16-byte blocks in a and b into a 32-byte temporary result, shift the result right by count bytes, and store the low 16 bytes in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_blend_epi8

__m128i _mm_mask_blend_epi8(__mmask16 k, __m128i a, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpblendmb

Blend packed 8-bit integers from a and b using control mask k, and return the results.



_mm256_mask_blend_epi8

__m256i _mm256_mask_blend_epi8(__mmask32 k, __m256i a, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpblendmb

Blend packed 8-bit integers from a and b using control mask k, and return the results.



_mm512_mask_blend_epi8

__m512i _mm512_mask_blend_epi8(__mmask64 k, __m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpblendmb

Blend packed 8-bit integers from a and b using control mask k, and return the results.



_mm_mask_blend_epi32

__m128i _mm_mask_blend_epi32(__mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpblendmd

Blend packed 32-bit integers from a and b using control mask k, and return the results.



_mm256_mask_blend_epi32

__m256i _mm256_mask_blend_epi32(__mmask8 k, __m256i a, __m256i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpblendmd

Blend packed 32-bit integers from a and b using control mask k, and return the results.



_mm_mask_blend_epi64

__m128i _mm_mask_blend_epi64(__mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpblendmq

Blend packed 64-bit integers from a and b using control mask k, and return the results.



_mm256_mask_blend_epi64

__m256i _mm256_mask_blend_epi64(__mmask8 k, __m256i a, __m256i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpblendmq

Blend packed 64-bit integers from a and b using control mask k, and return the results.



_mm_mask_blend_epi16

__m128i _mm_mask_blend_epi16(__mmask8 k, __m128i a, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpblendmw

Blend packed 16-bit integers from a and b using control mask k, and return the results.



_mm256_mask_blend_epi16

__m256i _mm256_mask_blend_epi16(__mmask16 k, __m256i a, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpblendmw

Blend packed 16-bit integers from a and b using control mask k, and return the results.



_mm512_mask_blend_epi16

__m512i _mm512_mask_blend_epi16(__mmask32 k, __m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpblendmw

Blend packed 16-bit integers from a and b using control mask k, and return the results.



_mm_mask_broadcastb_epi8

__m128i _mm_mask_broadcastb_epi8(__m128i src, __mmask16 k, __m128i a)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpbroadcastb

Broadcast the low packed 8-bit integer from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_broadcastb_epi8

__m128i _mm_maskz_broadcastb_epi8(__mmask16 k, __m128i a)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpbroadcastb

Broadcast the low packed 8-bit integer from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_broadcastb_epi8

__m256i _mm256_mask_broadcastb_epi8(__m256i src, __mmask32 k, __m128i a)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpbroadcastb

Broadcast the low packed 8-bit integer from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_broadcastb_epi8

__m256i _mm256_maskz_broadcastb_epi8(__mmask32 k, __m128i a)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpbroadcastb

Broadcast the low packed 8-bit integer from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_broadcastb_epi8

__m512i _mm512_broadcastb_epi8(__m128i a)

CPUID Flags: AVX512BW

Instruction(s): vpbroadcastb

Broadcast the low packed 8-bit integer from a to all elements of the return value.



_mm512_mask_broadcastb_epi8

__m512i _mm512_mask_broadcastb_epi8(__m512i src, __mmask64 k, __m128i a)

CPUID Flags: AVX512BW

Instruction(s): vpbroadcastb

Broadcast the low packed 8-bit integer from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_broadcastb_epi8

__m512i _mm512_maskz_broadcastb_epi8(__mmask64 k, __m128i a)

CPUID Flags: AVX512BW

Instruction(s): vpbroadcastb

Broadcast the low packed 8-bit integer from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_broadcastd_epi32

__m128i _mm_mask_broadcastd_epi32(__m128i src, __mmask8 k, __m128i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpbroadcastd

Broadcast the low packed 32-bit integer from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_broadcastd_epi32

__m128i _mm_maskz_broadcastd_epi32(__mmask8 k, __m128i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpbroadcastd

Broadcast the low packed 32-bit integer from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_broadcastd_epi32

__m256i _mm256_mask_broadcastd_epi32(__m256i src, __mmask8 k, __m128i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpbroadcastd

Broadcast the low packed 32-bit integer from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_broadcastd_epi32

__m256i _mm256_maskz_broadcastd_epi32(__mmask8 k, __m128i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpbroadcastd

Broadcast the low packed 32-bit integer from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_broadcastmb_epi64

__m128i _mm_broadcastmb_epi64(__mmask8 k)

CPUID Flags: AVX512CD, AVX512VL

Instruction(s): vpbroadcastmb2q

Broadcast the low 8-bits from input mask k to all 64-bit elements of the return value.



_mm256_broadcastmb_epi64

__m256i _mm256_broadcastmb_epi64(__mmask8 k)

CPUID Flags: AVX512CD, AVX512VL

Instruction(s): vpbroadcastmb2q

Broadcast the low 8-bits from input mask k to all 64-bit elements of the return value.



_mm_broadcastmw_epi32

__m128i _mm_broadcastmw_epi32(__mmask16 k)

CPUID Flags: AVX512CD, AVX512VL

Instruction(s): vpbroadcastmw2d

Broadcast the low 16-bits from input mask k to all 32-bit elements of the return value.



_mm256_broadcastmw_epi32

__m256i _mm256_broadcastmw_epi32(__mmask16 k)

CPUID Flags: AVX512CD, AVX512VL

Instruction(s): vpbroadcastmw2d

Broadcast the low 16-bits from input mask k to all 32-bit elements of the return value.



_mm_mask_broadcastq_epi64

__m128i _mm_mask_broadcastq_epi64(__m128i src, __mmask8 k, __m128i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpbroadcastq

Broadcast the low packed 64-bit integer from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_broadcastq_epi64

__m128i _mm_maskz_broadcastq_epi64(__mmask8 k, __m128i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpbroadcastq

Broadcast the low packed 64-bit integer from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_broadcastq_epi64

__m256i _mm256_mask_broadcastq_epi64(__m256i src, __mmask8 k, __m128i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpbroadcastq

Broadcast the low packed 64-bit integer from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_broadcastq_epi64

__m256i _mm256_maskz_broadcastq_epi64(__mmask8 k, __m128i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpbroadcastq

Broadcast the low packed 64-bit integer from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_broadcastw_epi16

__m128i _mm_mask_broadcastw_epi16(__m128i src, __mmask8 k, __m128i a)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpbroadcastw

Broadcast the low packed 16-bit integer from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_broadcastw_epi16

__m128i _mm_maskz_broadcastw_epi16(__mmask8 k, __m128i a)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpbroadcastw

Broadcast the low packed 16-bit integer from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_broadcastw_epi16

__m256i _mm256_mask_broadcastw_epi16(__m256i src, __mmask16 k, __m128i a)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpbroadcastw

Broadcast the low packed 16-bit integer from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_broadcastw_epi16

__m256i _mm256_maskz_broadcastw_epi16(__mmask16 k, __m128i a)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpbroadcastw

Broadcast the low packed 16-bit integer from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_broadcastw_epi16

__m512i _mm512_broadcastw_epi16(__m128i a)

CPUID Flags: AVX512BW

Instruction(s): vpbroadcastw

Broadcast the low packed 16-bit integer from a to all elements of the return value.



_mm512_mask_broadcastw_epi16

__m512i _mm512_mask_broadcastw_epi16(__m512i src, __mmask32 k, __m128i a)

CPUID Flags: AVX512BW

Instruction(s): vpbroadcastw

Broadcast the low packed 16-bit integer from a to all elements of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_broadcastw_epi16

__m512i _mm512_maskz_broadcastw_epi16(__mmask32 k, __m128i a)

CPUID Flags: AVX512BW

Instruction(s): vpbroadcastw

Broadcast the low packed 16-bit integer from a to all elements of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_compress_epi32

__m128i _mm_mask_compress_epi32(__m128i src, __mmask8 k, __m128i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpcompressd

Contiguously store the active 32-bit integers in a (those with their respective bit set in writemask k) to the return value, and pass through the remaining elements from src.



_mm_maskz_compress_epi32

__m128i _mm_maskz_compress_epi32(__mmask8 k, __m128i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpcompressd

Contiguously store the active 32-bit integers in a (those with their respective bit set in zeromask k) to the return value, and set the remaining elements to zero.



_mm256_mask_compress_epi32

__m256i _mm256_mask_compress_epi32(__m256i src, __mmask8 k, __m256i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpcompressd

Contiguously store the active 32-bit integers in a (those with their respective bit set in writemask k) to the return value, and pass through the remaining elements from src.



_mm256_maskz_compress_epi32

__m256i _mm256_maskz_compress_epi32(__mmask8 k, __m256i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpcompressd

Contiguously store the active 32-bit integers in a (those with their respective bit set in zeromask k) to the return value, and set the remaining elements to zero.



_mm_mask_compress_epi64

__m128i _mm_mask_compress_epi64(__m128i src, __mmask8 k, __m128i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpcompressq

Contiguously store the active 64-bit integers in a (those with their respective bit set in writemask k) to the return value, and pass through the remaining elements from src.



_mm_maskz_compress_epi64

__m128i _mm_maskz_compress_epi64(__mmask8 k, __m128i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpcompressq

Contiguously store the active 64-bit integers in a (those with their respective bit set in zeromask k) to the return value, and set the remaining elements to zero.



_mm256_mask_compress_epi64

__m256i _mm256_mask_compress_epi64(__m256i src, __mmask8 k, __m256i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpcompressq

Contiguously store the active 64-bit integers in a (those with their respective bit set in writemask k) to the return value, and pass through the remaining elements from src.



_mm256_maskz_compress_epi64

__m256i _mm256_maskz_compress_epi64(__mmask8 k, __m256i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpcompressq

Contiguously store the active 64-bit integers in a (those with their respective bit set in zeromask k) to the return value, and set the remaining elements to zero.



_mm256_mask_permutexvar_epi32

__m256i _mm256_mask_permutexvar_epi32(__m256i src, __mmask8 k, __m256i idx, __m256i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermd

Shuffle 32-bit integers in a across lanes using the corresponding index in idx, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_permutexvar_epi32

__m256i _mm256_maskz_permutexvar_epi32(__mmask8 k, __m256i idx, __m256i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermd

Shuffle 32-bit integers in a across lanes using the corresponding index in idx, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_permutexvar_epi32

__m256i _mm256_permutexvar_epi32(__m256i idx, __m256i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermd

Shuffle 32-bit integers in a across lanes using the corresponding index in idx, and return the results.



_mm_mask2_permutex2var_epi32

__m128i _mm_mask2_permutex2var_epi32(__m128i a, __m128i idx, __mmask8 k, __m128i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermi2d

Shuffle 32-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results using writemask k (elements are copied from idx when the corresponding mask bit is not set).



_mm256_mask2_permutex2var_epi32

__m256i _mm256_mask2_permutex2var_epi32(__m256i a, __m256i idx, __mmask8 k, __m256i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermi2d

Shuffle 32-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results using writemask k (elements are copied from idx when the corresponding mask bit is not set).



_mm_maskz_permutex2var_epi32

__m128i _mm_maskz_permutex2var_epi32(__mmask8 k, __m128i a, __m128i idx, __m128i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermi2d, vpermt2d

Shuffle 32-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_permutex2var_epi32

__m128i _mm_permutex2var_epi32(__m128i a, __m128i idx, __m128i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermi2d, vpermt2d

Shuffle 32-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results.



_mm256_maskz_permutex2var_epi32

__m256i _mm256_maskz_permutex2var_epi32(__mmask8 k, __m256i a, __m256i idx, __m256i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermi2d, vpermt2d

Shuffle 32-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_permutex2var_epi32

__m256i _mm256_permutex2var_epi32(__m256i a, __m256i idx, __m256i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermi2d, vpermt2d

Shuffle 32-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results.



_mm_mask2_permutex2var_epi64

__m128i _mm_mask2_permutex2var_epi64(__m128i a, __m128i idx, __mmask8 k, __m128i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermi2q

Shuffle 64-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results using writemask k (elements are copied from idx when the corresponding mask bit is not set).



_mm256_mask2_permutex2var_epi64

__m256i _mm256_mask2_permutex2var_epi64(__m256i a, __m256i idx, __mmask8 k, __m256i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermi2q

Shuffle 64-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results using writemask k (elements are copied from idx when the corresponding mask bit is not set).



_mm_maskz_permutex2var_epi64

__m128i _mm_maskz_permutex2var_epi64(__mmask8 k, __m128i a, __m128i idx, __m128i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermi2q, vpermt2q

Shuffle 64-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_permutex2var_epi64

__m128i _mm_permutex2var_epi64(__m128i a, __m128i idx, __m128i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermi2q, vpermt2q

Shuffle 64-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results.



_mm256_maskz_permutex2var_epi64

__m256i _mm256_maskz_permutex2var_epi64(__mmask8 k, __m256i a, __m256i idx, __m256i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermi2q, vpermt2q

Shuffle 64-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_permutex2var_epi64

__m256i _mm256_permutex2var_epi64(__m256i a, __m256i idx, __m256i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermi2q, vpermt2q

Shuffle 64-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results.



_mm_mask2_permutex2var_epi16

__m128i _mm_mask2_permutex2var_epi16(__m128i a, __m128i idx, __mmask8 k, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpermi2w

Shuffle 16-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results using writemask k (elements are copied from idx when the corresponding mask bit is not set).



_mm256_mask2_permutex2var_epi16

__m256i _mm256_mask2_permutex2var_epi16(__m256i a, __m256i idx, __mmask16 k, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpermi2w

Shuffle 16-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results using writemask k (elements are copied from idx when the corresponding mask bit is not set).



_mm512_mask2_permutex2var_epi16

__m512i _mm512_mask2_permutex2var_epi16(__m512i a, __m512i idx, __mmask32 k, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpermi2w

Shuffle 16-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results using writemask k (elements are copied from idx when the corresponding mask bit is not set).



_mm_maskz_permutex2var_epi16

__m128i _mm_maskz_permutex2var_epi16(__mmask8 k, __m128i a, __m128i idx, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpermi2w, vpermt2w

Shuffle 16-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_permutex2var_epi16

__m128i _mm_permutex2var_epi16(__m128i a, __m128i idx, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpermi2w, vpermt2w

Shuffle 16-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results.



_mm256_maskz_permutex2var_epi16

__m256i _mm256_maskz_permutex2var_epi16(__mmask16 k, __m256i a, __m256i idx, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpermi2w, vpermt2w

Shuffle 16-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_permutex2var_epi16

__m256i _mm256_permutex2var_epi16(__m256i a, __m256i idx, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpermi2w, vpermt2w

Shuffle 16-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results.



_mm512_maskz_permutex2var_epi16

__m512i _mm512_maskz_permutex2var_epi16(__mmask32 k, __m512i a, __m512i idx, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpermi2w, vpermt2w

Shuffle 16-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_permutex2var_epi16

__m512i _mm512_permutex2var_epi16(__m512i a, __m512i idx, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpermi2w, vpermt2w

Shuffle 16-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results.



_mm256_mask_permutex_epi64

__m256i _mm256_mask_permutex_epi64(__m256i src, __mmask8 k, __m256i a, const int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermq

Shuffle 64-bit integers in a across lanes lanes using the control in imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_mask_permutexvar_epi64

__m256i _mm256_mask_permutexvar_epi64(__m256i src, __mmask8 k, __m256i idx, __m256i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermq

Shuffle 64-bit integers in a across lanes using the corresponding index in idx, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_permutex_epi64

__m256i _mm256_maskz_permutex_epi64(__mmask8 k, __m256i a, const int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermq

Shuffle 64-bit integers in a across lanes using the control in imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_maskz_permutexvar_epi64

__m256i _mm256_maskz_permutexvar_epi64(__mmask8 k, __m256i idx, __m256i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermq

Shuffle 64-bit integers in a across lanes using the corresponding index in idx, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_permutex_epi64

__m256i _mm256_permutex_epi64(__m256i a, const int imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermq

Shuffle 64-bit integers in a across lanes using the control in imm, and return the results.



_mm256_permutexvar_epi64

__m256i _mm256_permutexvar_epi64(__m256i idx, __m256i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermq

Shuffle 64-bit integers in a across lanes using the corresponding index in idx, and return the results.



_mm_mask_permutex2var_epi32

__m128i _mm_mask_permutex2var_epi32(__m128i a, __mmask8 k, __m128i idx, __m128i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermt2d

Shuffle 32-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).



_mm256_mask_permutex2var_epi32

__m256i _mm256_mask_permutex2var_epi32(__m256i a, __mmask8 k, __m256i idx, __m256i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermt2d

Shuffle 32-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).



_mm_mask_permutex2var_epi64

__m128i _mm_mask_permutex2var_epi64(__m128i a, __mmask8 k, __m128i idx, __m128i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermt2q

Shuffle 64-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).



_mm256_mask_permutex2var_epi64

__m256i _mm256_mask_permutex2var_epi64(__m256i a, __mmask8 k, __m256i idx, __m256i b)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpermt2q

Shuffle 64-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).



_mm_mask_permutex2var_epi16

__m128i _mm_mask_permutex2var_epi16(__m128i a, __mmask8 k, __m128i idx, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpermt2w

Shuffle 16-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).



_mm256_mask_permutex2var_epi16

__m256i _mm256_mask_permutex2var_epi16(__m256i a, __mmask16 k, __m256i idx, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpermt2w

Shuffle 16-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).



_mm512_mask_permutex2var_epi16

__m512i _mm512_mask_permutex2var_epi16(__m512i a, __mmask32 k, __m512i idx, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpermt2w

Shuffle 16-bit integers in a and b across lanes using the corresponding selector and index in idx, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).



_mm_mask_permutexvar_epi16

__m128i _mm_mask_permutexvar_epi16(__m128i src, __mmask8 k, __m128i idx, __m128i a)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpermw

Shuffle 16-bit integers in a across lanes using the corresponding index in idx, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_permutexvar_epi16

__m128i _mm_maskz_permutexvar_epi16(__mmask8 k, __m128i idx, __m128i a)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpermw

Shuffle 16-bit integers in a across lanes using the corresponding index in idx, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_permutexvar_epi16

__m128i _mm_permutexvar_epi16(__m128i idx, __m128i a)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpermw

Shuffle 16-bit integers in a across lanes using the corresponding index in idx, and return the results.



_mm256_mask_permutexvar_epi16

__m256i _mm256_mask_permutexvar_epi16(__m256i src, __mmask16 k, __m256i idx, __m256i a)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpermw

Shuffle 16-bit integers in a across lanes using the corresponding index in idx, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_permutexvar_epi16

__m256i _mm256_maskz_permutexvar_epi16(__mmask16 k, __m256i idx, __m256i a)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpermw

Shuffle 16-bit integers in a across lanes using the corresponding index in idx, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_permutexvar_epi16

__m256i _mm256_permutexvar_epi16(__m256i idx, __m256i a)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpermw

Shuffle 16-bit integers in a across lanes using the corresponding index in idx, and return the results.



_mm512_mask_permutexvar_epi16

__m512i _mm512_mask_permutexvar_epi16(__m512i src, __mmask32 k, __m512i idx, __m512i a)

CPUID Flags: AVX512BW

Instruction(s): vpermw

Shuffle 16-bit integers in a across lanes using the corresponding index in idx, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_permutexvar_epi16

__m512i _mm512_maskz_permutexvar_epi16(__mmask32 k, __m512i idx, __m512i a)

CPUID Flags: AVX512BW

Instruction(s): vpermw

Shuffle 16-bit integers in a across lanes using the corresponding index in idx, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_permutexvar_epi16

__m512i _mm512_permutexvar_epi16(__m512i idx, __m512i a)

CPUID Flags: AVX512BW

Instruction(s): vpermw

Shuffle 16-bit integers in a across lanes using the corresponding index in idx, and return the results.



_mm_mask_expand_epi32

__m128i _mm_mask_expand_epi32(__m128i src, __mmask8 k, __m128i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpexpandd

Load contiguous active 32-bit integers from a (those with their respective bit set in mask k), and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_expand_epi32

__m128i _mm_maskz_expand_epi32(__mmask8 k, __m128i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpexpandd

Load contiguous active 32-bit integers from a (those with their respective bit set in mask k), and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_expand_epi32

__m256i _mm256_mask_expand_epi32(__m256i src, __mmask8 k, __m256i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpexpandd

Load contiguous active 32-bit integers from a (those with their respective bit set in mask k), and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_expand_epi32

__m256i _mm256_maskz_expand_epi32(__mmask8 k, __m256i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpexpandd

Load contiguous active 32-bit integers from a (those with their respective bit set in mask k), and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_expand_epi64

__m128i _mm_mask_expand_epi64(__m128i src, __mmask8 k, __m128i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpexpandq

Load contiguous active 64-bit integers from a (those with their respective bit set in mask k), and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_expand_epi64

__m128i _mm_maskz_expand_epi64(__mmask8 k, __m128i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpexpandq

Load contiguous active 64-bit integers from a (those with their respective bit set in mask k), and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_expand_epi64

__m256i _mm256_mask_expand_epi64(__m256i src, __mmask8 k, __m256i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpexpandq

Load contiguous active 64-bit integers from a (those with their respective bit set in mask k), and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_expand_epi64

__m256i _mm256_maskz_expand_epi64(__mmask8 k, __m256i a)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpexpandq

Load contiguous active 64-bit integers from a (those with their respective bit set in mask k), and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_movm_epi8

__m128i _mm_movm_epi8(__mmask16 k)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpmovm2b

Set each packed 8-bit integer in the return value to all ones or all zeros based on the value of the corresponding bit in k.



_mm256_movm_epi8

__m256i _mm256_movm_epi8(__mmask32 k)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpmovm2b

Set each packed 8-bit integer in the return value to all ones or all zeros based on the value of the corresponding bit in k.



_mm512_movm_epi8

__m512i _mm512_movm_epi8(__mmask64 k)

CPUID Flags: AVX512BW

Instruction(s): vpmovm2b

Set each packed 8-bit integer in the return value to all ones or all zeros based on the value of the corresponding bit in k.



_mm_movm_epi32

__m128i _mm_movm_epi32(__mmask8 k)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vpmovm2d

Set each packed 32-bit integer in the return value to all ones or all zeros based on the value of the corresponding bit in k.



_mm256_movm_epi32

__m256i _mm256_movm_epi32(__mmask8 k)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vpmovm2d

Set each packed 32-bit integer in the return value to all ones or all zeros based on the value of the corresponding bit in k.



_mm512_movm_epi32

__m512i _mm512_movm_epi32(__mmask16 k)

CPUID Flags: AVX512DQ

Instruction(s): vpmovm2d

Set each packed 32-bit integer in the return value to all ones or all zeros based on the value of the corresponding bit in k.



_mm_movm_epi64

__m128i _mm_movm_epi64(__mmask8 k)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vpmovm2q

Set each packed 64-bit integer in the return value to all ones or all zeros based on the value of the corresponding bit in k.



_mm256_movm_epi64

__m256i _mm256_movm_epi64(__mmask8 k)

CPUID Flags: AVX512DQ, AVX512VL

Instruction(s): vpmovm2q

Set each packed 64-bit integer in the return value to all ones or all zeros based on the value of the corresponding bit in k.



_mm512_movm_epi64

__m512i _mm512_movm_epi64(__mmask8 k)

CPUID Flags: AVX512DQ

Instruction(s): vpmovm2q

Set each packed 64-bit integer in the return value to all ones or all zeros based on the value of the corresponding bit in k.



_mm_movm_epi16

__m128i _mm_movm_epi16(__mmask8 k)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpmovm2w

Set each packed 16-bit integer in the return value to all ones or all zeros based on the value of the corresponding bit in k.



_mm256_movm_epi16

__m256i _mm256_movm_epi16(__mmask16 k)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpmovm2w

Set each packed 16-bit integer in the return value to all ones or all zeros based on the value of the corresponding bit in k.



_mm512_movm_epi16

__m512i _mm512_movm_epi16(__mmask32 k)

CPUID Flags: AVX512BW

Instruction(s): vpmovm2w

Set each packed 16-bit integer in the return value to all ones or all zeros based on the value of the corresponding bit in k.



_mm512_sad_epu8

__m512i _mm512_sad_epu8(__m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpsadbw

Compute the absolute differences of packed unsigned 8-bit integers in a and b, then horizontally sum each consecutive 8 differences to produce four unsigned 16-bit integers, and pack these unsigned 16-bit integers in the low 16 bits of 64-bit elements in the return value.



_mm_mask_shuffle_epi8

__m128i _mm_mask_shuffle_epi8(__m128i src, __mmask16 k, __m128i a, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpshufb

Shuffle packed 8-bit integers in a according to shuffle control mask in the corresponding 8-bit element of b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_shuffle_epi8

__m128i _mm_maskz_shuffle_epi8(__mmask16 k, __m128i a, __m128i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpshufb

Shuffle packed 8-bit integers in a according to shuffle control mask in the corresponding 8-bit element of b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_shuffle_epi8

__m256i _mm256_mask_shuffle_epi8(__m256i src, __mmask32 k, __m256i a, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpshufb

Shuffle packed 8-bit integers in a according to shuffle control mask in the corresponding 8-bit element of b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_shuffle_epi8

__m256i _mm256_maskz_shuffle_epi8(__mmask32 k, __m256i a, __m256i b)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpshufb

Shuffle packed 8-bit integers in a according to shuffle control mask in the corresponding 8-bit element of b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_mask_shuffle_epi8

__m512i _mm512_mask_shuffle_epi8(__m512i src, __mmask64 k, __m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpshufb

Shuffle 8-bit integers in a within 128-bit lanes using the control in the corresponding 8-bit element of b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_shuffle_epi8

__m512i _mm512_maskz_shuffle_epi8(__mmask64 k, __m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpshufb

Shuffle packed 8-bit integers in a according to shuffle control mask in the corresponding 8-bit element of b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_shuffle_epi8

__m512i _mm512_shuffle_epi8(__m512i a, __m512i b)

CPUID Flags: AVX512BW

Instruction(s): vpshufb

Shuffle packed 8-bit integers in a according to shuffle control mask in the corresponding 8-bit element of b, and return the results.



_mm_mask_shuffle_epi32

__m128i _mm_mask_shuffle_epi32(__m128i src, __mmask8 k, __m128i a, _MM_PERM_ENUM imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpshufd

Shuffle 32-bit integers in a using the control in imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_shuffle_epi32

__m128i _mm_maskz_shuffle_epi32(__mmask8 k, __m128i a, _MM_PERM_ENUM imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpshufd

Shuffle 32-bit integers in a using the control in imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_mask_shuffle_epi32

__m256i _mm256_mask_shuffle_epi32(__m256i src, __mmask8 k, __m256i a, _MM_PERM_ENUM imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpshufd

Shuffle 32-bit integers in a within 128-bit lanes using the control in imm, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_shuffle_epi32

__m256i _mm256_maskz_shuffle_epi32(__mmask8 k, __m256i a, _MM_PERM_ENUM imm)

CPUID Flags: AVX512F, AVX512VL

Instruction(s): vpshufd

Shuffle 32-bit integers in a within 128-bit lanes using the control in imm, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_mask_shufflehi_epi16

__m128i _mm_mask_shufflehi_epi16(__m128i src, __mmask8 k, __m128i a, int imm)

CPUID Flags: AVX512BW, AVX512VL

Instruction(s): vpshufhw

Shuffle 16-bit integers in the high 64 bits of a using the control in imm. Store the results in the high 64 bits of the return value, with the low 64 bits being copied from from a to dst, using write