Developer Guide and Reference

Contents

Intrinsics for Intel® Advanced Vector Extensions 512 (Intel® AVX-512) BF16 Instructions

The prototypes for Intel® Advanced Vector Extensions 512 (Intel® AVX-512) BF16 instruction intrinsics are located in the zmmintrin.h header file.
To use these intrinsics, include the
immintrin.h
file as follows:
#include <immintrin.h>
variable
definition
a
a source vector element
b
a second source vector element
k
mask used as a selector; depending on the intrinsic, it may be a writemask or a zeromask
_mm_cvtne2ps_pbh
__m128bh _mm_cvtne2ps_pbh (__m128 a, __m128 b)
Instructions: vcvtne2ps2bf16 xmm, xmm, xmm
CPUID Flags: AVX512_BF16 + AVX512VL
Converts packed single-precision (32-bit) floating-point elements in two vectors
a
and
b
to packed BF16 (16-bit) floating-point elements, and stores the results in a single vector dst.
_mm_mask_cvtne2ps_pbh
__m128bh _mm_mask_cvtne2ps_pbh (__m128bh src, __mmask8 k, __m128 a, __m128 b)
Instructions: vcvtne2ps2bf16 xmm {k}, xmm, xmm
CPUID Flags: AVX512_BF16 + AVX512VL
Converts packed single-precision (32-bit) floating-point elements in two vectors
a
and
b
to packed BF16 (16-bit) floating-point elements, and stores the results in a single vector dst using writemask
k
. Elements are copied from src when the corresponding mask bit is not set.