Developer Reference

  • 2021.1
  • 12/04/2020
  • Public Content
Contents

cblas_gemm_*_pack

Pack the matrix into the buffer allocated previously.

Syntax

void cblas_gemm_s8u8s32_pack
(
const
CBLAS_LAYOUT
Layout
,
const
CBLAS_IDENTIFIER
identifier
,
const
CBLAS_TRANSPOSE
trans
,
const
MKL_INT
m
,
const
MKL_INT
n
,
const
MKL_INT
k
,
const
void
*src
,
const
MKL_INT
ld
,
void
*dest
);
void cblas_gemm_s16s16s32_pack
(
const
CBLAS_LAYOUT
Layout
,
const
CBLAS_IDENTIFIER
identifier
,
const
CBLAS_TRANSPOSE
trans
,
const
MKL_INT
m
,
const
MKL_INT
n
,
const
MKL_INT
k
,
const
MKL_INT16
*src
,
const
MKL_INT
ld
,
MKL_INT16
*dest
);
void cblas_gemm_bf16bf16f32_pack
(
const
CBLAS_LAYOUT
Layout
,
const
CBLAS_IDENTIFIER
identifier
,
const
CBLAS_TRANSPOSE
trans
,
const
MKL_INT
m
,
const
MKL_INT
n
,
const
MKL_INT
k
,
const
MKL_BF16
*src
,
const
MKL_INT
ld
,
MKL_BF16
*dest
);
Include Files
  • mkl.h
Description
The
cblas_gemm_*_pack
routine is one of a set of related routines that enable the use of an internal packed storage. Call
cblas_gemm_*_pack
after you allocate a buffer whose size is given by
cblas_gemm_*_pack_get_size
. The
cblas_gemm_*_pack
routine packs the identified matrix into the buffer allocated previously.
The
cblas_gemm_*_pack
routine performs this operation:
dest
:= op(
src
)
as part of the computation
C
:=
alpha
*(op(
A
) +
A_offset
)*(op(
B
) +
B_offset
) +
beta
*
C
+
C_offset
for integer types.
C
:=
alpha
*op(
A
) * op(
B
) +
beta
*
C
for bfloat16 type.
where:

    op(
    X
    )
    is one of the operations
    op(
    X
    ) =
    X
    or
    op(
    X
    ) =
    X
    T

    alpha
    and
    beta
    are scalars,

    src
    is a matrix,

    A
    ,
    A_offset
    ,
    B
    ,
    B_offset
    ,
    c
    ,and
    C_offset
    are matrices

    op(
    src
    )
    is an
    m
    -by-
    k
    matrix if
    identifier
    =
    CblasAMatrix
    ,

    op(
    src
    )
    is a
    k
    -by-
    n
    matrix if
    identifier
    =
    CblasBMatrix
    ,

    dest
    is the buffer previously allocated to store the matrix packed into an internal format

    A_offset
    is an
    m
    -by-
    k
    matrix.

    B_offset
    is an
    k
    -by-
    n
    matrix.

    C_offset
    is an
    m
    -by-
    n
    matrix.

You must use the same value of the
Layout
parameter for the entire sequence of related
cblas_gemm_*_pack
and
cblas_gemm_*_compute
calls.
For best performance, use the same number of threads for packing and for computing.
If packing for both
A
and
B
matrices, you must use the same number of threads for packing
A
as for packing
B
.
Input Parameters
Layout
CBLAS_LAYOUT
Specifies whether two-dimensional array storage is row-major (
CblasRowMajor
) or column-major(
CblasColMajor
).
identifier
CBLAS_IDENTIFIER
Specifies which matrix is to be packed:
If
identifier
=
CblasAMatrix
, the
A
matrix is packed.
If
identifier
=
CblasBMatrix
, the
B
matrix is packed.
trans
CBLAS_TRANSPOSE
Specifies the form of
op(
src
)
used in the packing:
If
trans
=
CblasNoTrans
 
op(
src
) =
src
.
If
trans
=
CblasTrans
 
op(
src
) =
src
T
.
m
MKL_INT
Specifies the number of rows of matrix op(
A
) and of the matrix
C
. The value of
m
must be at least zero.
n
MKL_INT
Specifies the number of columns of matrix op(
B
) and the number of columns of matrix
C
. The value of
n
must be at least zero.
k
MKL_INT
Specifies the number of columns of matrix op(
A
) and the number of rows of matrix op(
B
). The value of
k
must be at least zero.
src
MKL_BF16*
for
cblas_gemm_bf16bf16f32_pack
,
void*
for
cblas_gemm_s8u8s32_pack
and
MKL_INT16*
for
cblas_gemm_s16s16s32_pack
identifier
=
CblasAMatrix
identifier
=
CblasBMatrix
trans
=
CblasNoTrans
trans
=
CblasTrans
trans
=
CblasNoTrans
trans
=
CblasTrans
Layout
=
CblasColMajor
Size
ld
*
k
.
Before entry, the leading
m
-by-
k
part of the array
src
must contain the matrix
A
.
For
cblas_gemm_s8u8s32_pack
the element in
src
array must be an 8-bit signed integer.
Size
ld
*
m
.
Before entry, the leading
k
-by-
m
part of the array
src
must contain the matrix
A
.
For
cblas_gemm_s8u8s32_pack
the element in
src
array must be an 8-bit signed integer.
Size
ld
*
n
.
Before entry, the leading
k
-by-
n
part of the array
src
must contain the matrix
B
.
For
cblas_gemm_s8u8s32_pack
the element in
src
array must be an 8-bit unsigned integer.
Size
ld
*
k
.
Before entry, the leading
n
-by-
k
part of the array
src
must contain the matrix
B
.
For
cblas_gemm_s8u8s32_pack
the element in
src
array must be an 8-bit unsigned integer.
Layout
=
CblasRowMajor
Size
ld
*
m
.
Before entry, the leading
k
-by-
m
part of the array
src
must contain the matrix
A
.
For
cblas_gemm_s8u8s32_pack
the element in
src
array must be an 8-bit unsigned integer.
Size
ld
*
k
.
Before entry, the leading
m
-by-
k
part of the array
src
must contain the matrix
A
.
For
cblas_gemm_s8u8s32_pack
the element in
src
array must be an 8-bit unsigned integer.
Size
ld
*
k
.
Before entry, the leading
n
-by-
k
part of the array
src
must contain the matrix
B
.
For
cblas_gemm_s8u8s32_pack
the element in
src
array must be an 8-bit signed integer.
Size
ld
*
n
.
Before entry, the leading
k
-by-
n
part of the array
src
must contain the matrix
B
.
For
cblas_gemm_s8u8s32_pack
the element in
src
array must be an 8-bit signed integer.
ld
MKL_INT
Specifies the leading dimension of
src
as declared in the calling (sub)program.
identifier
=
CblasAMatrix
identifier
=
CblasBMatrix
trans
=
CblasNoTrans
trans
=
CblasTrans
trans
=
CblasNoTrans
trans
=
CblasTrans
Layout
=
CblasColMajor
ld
must be at least
max(1,
m
)
.
ld
must be at least
max(1,
k
)
.
ld
must be at least
max(1,
k
)
.
ld
must be at least
max(1,
n
)
.
Layout
=
CblasRowMajor
ld
must be at least
max(1,
k
)
.
ld
must be at least
max(1,
m
)
.
ld
must be at least
max(1,
n
)
.
ld
must be at least
max(1,
k
)
.
dest
MKL_BF16*
for
cblas_gemm_bf16bf16f32_pack
, void*
for
cblas_
gemm_s8u8s32_pack
or
MKL_INT16*
for
cblas_
gemm_s16s16s32_pack
Buffer for the packed matrix.
Output Parameters
dest
MKL_BF16*
for
cblas_gemm_bf16bf16f32_pack
, void*
for
cblas_
gemm_s8u8s32_pack
or
MKL_INT16*
for
cblas_
gemm_s16s16s32_pack
Overwritten by the matrix
op(
src
)
stored in a format internal to
Intel® oneAPI Math Kernel Library
.

Example

See the following examples in the MKL installation directory to understand the use of these routines:
cblas_gemm_s8u8s32_pack: examples\cblas\source\cblas_gemm_s8u8s32_computex.c
cblas_gemm_s16s16s32_pack: examples\cblas\source\cblas_gemm_s16s16s32_computex.c
cblas_gemm_bf16bf16f32_pack: examples\cblas\source\cblas_gemm_bf16bf16f32_computex.c
Application Notes
When using
cblas_gemm_s8u8s32_pack
with row-major layout , the data types of
A
and
B
must be swapped. That is, you must provide an 8-bit unsigned integer array for matrix
A
and an 8-bit signed integer array for matrix
B
.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.