cblas_gemm_*_pack
cblas_gemm_*_pack
Pack the matrix into the buffer allocated previously.
Syntax
void cblas_gemm_s8u8s32_pack
(
const
CBLAS_LAYOUT
Layout
,
const
CBLAS_IDENTIFIER
identifier
,
const
CBLAS_TRANSPOSE
trans
,
const
MKL_INT
m
,
const
MKL_INT
n
,
const
MKL_INT
k
,
const
void
*src
,
const
MKL_INT
ld
,
void
*dest
);
void cblas_gemm_s16s16s32_pack
(
const
CBLAS_LAYOUT
Layout
,
const
CBLAS_IDENTIFIER
identifier
,
const
CBLAS_TRANSPOSE
trans
,
const
MKL_INT
m
,
const
MKL_INT
n
,
const
MKL_INT
k
,
const
MKL_INT16
*src
,
const
MKL_INT
ld
,
MKL_INT16
*dest
);
void cblas_gemm_bf16bf16f32_pack
(
const
CBLAS_LAYOUT
Layout
,
const
CBLAS_IDENTIFIER
identifier
,
const
CBLAS_TRANSPOSE
trans
,
const
MKL_INT
m
,
const
MKL_INT
n
,
const
MKL_INT
k
,
const
MKL_BF16
*src
,
const
MKL_INT
ld
,
MKL_BF16
*dest
);
Include Files
- mkl.h
Description
The
cblas_gemm_*_pack
routine is one of a set of related routines
that enable the use of an internal packed storage.
Call cblas_gemm_*_pack
after you allocate a buffer whose size is given by cblas_gemm_*_pack_get_size
. The cblas_gemm_*_pack
routine packs the identified matrix into the buffer allocated previously. The
cblas_gemm_*_pack
routine performs this operation:dest
:= op(src
)C
:= alpha
*(op(A
) + A_offset
)*(op(B
) + B_offset
) + beta
*C
+ C_offset
C
:= alpha
*op(A
) * op(B
) + beta
*C
where:
op(
is one of the operations X
) op(
or X
) = X
op(
X
) = X
T
alpha
and
beta
are scalars,src
is a matrix,A
, A_offset
,B
, B_offset
,c
,and
C_offset
are matricesop(
is an src
)m
-by-k
matrix if identifier
= CblasAMatrix
,op(
is a src
)k
-by-n
matrix if identifier
=CblasBMatrix
,dest
is the buffer previously allocated to store the matrix packed into an internal formatA_offset
is an
m
-by-k
matrix.B_offset
is an
k
-by-n
matrix.C_offset
is an
m
-by-n
matrix.You must use the same value of the
Layout
parameter for the entire sequence of related cblas_gemm_*_pack
and cblas_gemm_*_compute
calls. For best performance, use the same number of threads for packing and for computing.
If packing for both
A
and B
matrices, you must use the same number of threads for packing A
as for packing B
.Input Parameters
- Layout
- CBLAS_LAYOUTSpecifies whether two-dimensional array storage is row-major (CblasRowMajor) or column-major(CblasColMajor).
- identifier
- CBLAS_IDENTIFIERSpecifies which matrix is to be packed:Ifidentifier=CblasAMatrix, theAmatrix is packed.Ifidentifier=CblasBMatrix, theBmatrix is packed.
- trans
- CBLAS_TRANSPOSESpecifies the form ofop(used in the packing:src)Iftrans=CblasNoTransop(.src) =srcIftrans=CblasTransop(.src) =srcT
- m
- MKL_INTSpecifies the number of rows of matrix op(A) and of the matrixC. The value ofmmust be at least zero.
- n
- MKL_INTSpecifies the number of columns of matrix op(B) and the number of columns of matrixC. The value ofnmust be at least zero.
- k
- MKL_INTSpecifies the number of columns of matrix op(A) and the number of rows of matrix op(B). The value ofkmust be at least zero.
- src
- MKL_BF16*forcblas_gemm_bf16bf16f32_pack,void*forcblas_gemm_s8u8s32_packandMKL_INT16*forcblas_gemm_s16s16s32_packidentifier=CblasAMatrixidentifier=CblasBMatrixtrans=CblasNoTranstrans=CblasTranstrans=CblasNoTranstrans=CblasTransLayout=CblasColMajorSize.ld*kBefore entry, the leadingm-by-kpart of the arraysrcmust contain the matrixA.Forcblas_gemm_s8u8s32_packthe element insrcarray must be an 8-bit signed integer.Size.ld*mBefore entry, the leadingk-by-mpart of the arraysrcmust contain the matrixA.Forcblas_gemm_s8u8s32_packthe element insrcarray must be an 8-bit signed integer.Size.ld*nBefore entry, the leadingk-by-npart of the arraysrcmust contain the matrixB.Forcblas_gemm_s8u8s32_packthe element insrcarray must be an 8-bit unsigned integer.Size.ld*kBefore entry, the leadingn-by-kpart of the arraysrcmust contain the matrixB.Forcblas_gemm_s8u8s32_packthe element insrcarray must be an 8-bit unsigned integer.Layout=CblasRowMajorSize.ld*mBefore entry, the leadingk-by-mpart of the arraysrcmust contain the matrixA.Forcblas_gemm_s8u8s32_packthe element insrcarray must be an 8-bit unsigned integer.Size.ld*kBefore entry, the leadingm-by-kpart of the arraysrcmust contain the matrixA.Forcblas_gemm_s8u8s32_packthe element insrcarray must be an 8-bit unsigned integer.Size.ld*kBefore entry, the leadingn-by-kpart of the arraysrcmust contain the matrixB.Forcblas_gemm_s8u8s32_packthe element insrcarray must be an 8-bit signed integer.Size.ld*nBefore entry, the leadingk-by-npart of the arraysrcmust contain the matrixB.Forcblas_gemm_s8u8s32_packthe element insrcarray must be an 8-bit signed integer.
- ld
- MKL_INTSpecifies the leading dimension ofsrcas declared in the calling (sub)program.identifier=CblasAMatrixidentifier=CblasBMatrixtrans=CblasNoTranstrans=CblasTranstrans=CblasNoTranstrans=CblasTransLayout=CblasColMajorldmust be at leastmax(1,.m)ldmust be at leastmax(1,.k)ldmust be at leastmax(1,.k)ldmust be at leastmax(1,.n)Layout=CblasRowMajorldmust be at leastmax(1,.k)ldmust be at leastmax(1,.m)ldmust be at leastmax(1,.n)ldmust be at leastmax(1,.k)
- dest
- forMKL_BF16*forcblas_gemm_bf16bf16f32_pack, void*orcblas_gemm_s8u8s32_packMKL_INT16*forcblas_gemm_s16s16s32_packBuffer for the packed matrix.
Output Parameters
- dest
- forMKL_BF16*forcblas_gemm_bf16bf16f32_pack, void*orcblas_gemm_s8u8s32_packMKL_INT16*forcblas_gemm_s16s16s32_packOverwritten by the matrixop(stored in a format internal tosrc).Intel® oneAPI Math Kernel Library
Example
See the following examples in the MKL installation directory to understand the use of these routines:
cblas_gemm_s8u8s32_pack: examples\cblas\source\cblas_gemm_s8u8s32_computex.c
cblas_gemm_s16s16s32_pack: examples\cblas\source\cblas_gemm_s16s16s32_computex.c
cblas_gemm_bf16bf16f32_pack: examples\cblas\source\cblas_gemm_bf16bf16f32_computex.c
Application Notes
When using
cblas_gemm_s8u8s32_pack
with row-major layout , the data types
of A
and B
must be swapped. That
is, you must provide an 8-bit unsigned integer array for
matrix A
and an
8-bit signed integer array for matrix B
.