Developer Reference

  • 0.9
  • 09/09/2020
  • Public Content
Contents

cblas_gemm_*_pack

Pack the matrix into the buffer allocated previously.

Syntax

void cblas_gemm_s8u8s32_pack
(
const
CBLAS_LAYOUT
Layout
,
const
CBLAS_IDENTIFIER
identifier
,
const
CBLAS_TRANSPOSE
trans
,
const
MKL_INT
m
,
const
MKL_INT
n
,
const
MKL_INT
k
,
const
void
*src
,
const
MKL_INT
ld
,
void
*dest
);
void cblas_gemm_s16s16s32_pack
(
const
CBLAS_LAYOUT
Layout
,
const
CBLAS_IDENTIFIER
identifier
,
const
CBLAS_TRANSPOSE
trans
,
const
MKL_INT
m
,
const
MKL_INT
n
,
const
MKL_INT
k
,
const
MKL_INT16
*src
,
const
MKL_INT
ld
,
MKL_INT16
*dest
);
void cblas_gemm_bf16bf16f32_pack
(
const
CBLAS_LAYOUT
Layout
,
const
CBLAS_IDENTIFIER
identifier
,
const
CBLAS_TRANSPOSE
trans
,
const
MKL_INT
m
,
const
MKL_INT
n
,
const
MKL_INT
k
,
const
MKL_BF16
*src
,
const
MKL_INT
ld
,
MKL_BF16
*dest
);
Include Files
  • mkl.h
Description
The
cblas_gemm_*_pack
routine is one of a set of related routines that enable the use of an internal packed storage. Call
cblas_gemm_*_pack
after you allocate a buffer whose size is given by
cblas_gemm_*_pack_get_size
. The
cblas_gemm_*_pack
routine packs the identified matrix into the buffer allocated previously.
The
cblas_gemm_*_pack
routine performs this operation:
dest
:= op(
src
)
as part of the computation
C
:=
alpha
*(op(
A
) +
A_offset
)*(op(
B
) +
B_offset
) +
beta
*
C
+
C_offset
for integer types.
C
:=
alpha
*op(
A
) * op(
B
) +
beta
*
C
for bfloat16 type.
where:

    op(
    X
    )
    is one of the operations
    op(
    X
    ) =
    X
    or
    op(
    X
    ) =
    X
    T

    alpha
    and
    beta
    are scalars,

    src
    is a matrix,

    A
    ,
    A_offset
    ,
    B
    ,
    B_offset
    ,
    c
    ,and
    C_offset
    are matrices

    op(
    src
    )
    is an
    m
    -by-
    k
    matrix if
    identifier
    =
    CblasAMatrix
    ,

    op(
    src
    )
    is a
    k
    -by-
    n
    matrix if
    identifier
    =
    CblasBMatrix
    ,

    dest
    is the buffer previously allocated to store the matrix packed into an internal format

    A_offset
    is an
    m
    -by-
    k
    matrix.

    B_offset
    is an
    k
    -by-
    n
    matrix.

    C_offset
    is an
    m
    -by-
    n
    matrix.

You must use the same value of the
Layout
parameter for the entire sequence of related
cblas_gemm_*_pack
and
cblas_gemm_*_compute
calls.
For best performance, use the same number of threads for packing and for computing.
If packing for both
A
and
B
matrices, you must use the same number of threads for packing
A
as for packing
B
.
Input Parameters
Layout
CBLAS_LAYOUT
Specifies whether two-dimensional array storage is row-major (
CblasRowMajor
) or column-major(
CblasColMajor
).
identifier
CBLAS_IDENTIFIER
Specifies which matrix is to be packed:
If
identifier
=
CblasAMatrix
, the
A
matrix is packed.
If
identifier
=
CblasBMatrix
, the
B
matrix is packed.
trans
CBLAS_TRANSPOSE
Specifies the form of
op(
src
)
used in the packing:
If
trans
=
CblasNoTrans
 
op(
src
) =
src
.
If
trans
=
CblasTrans
 
op(
src
) =
src
T
.
m
MKL_INT
Specifies the number of rows of matrix op(
A
) and of the matrix
C
. The value of
m
must be at least zero.
n
MKL_INT
Specifies the number of columns of matrix op(
B
) and the number of columns of matrix
C
. The value of
n
must be at least zero.
k
MKL_INT
Specifies the number of columns of matrix op(
A
) and the number of rows of matrix op(
B
). The value of
k
must be at least zero.
src
MKL_BF16*
for
cblas_gemm_bf16bf16f32_pack
,
void*
for
cblas_gemm_s8u8s32_pack
and
MKL_INT16*
for
cblas_gemm_s16s16s32_pack
identifier
=
CblasAMatrix
identifier
=
CblasBMatrix
trans
=
CblasNoTrans
trans
=
CblasTrans
trans
=
CblasNoTrans
trans
=
CblasTrans
Layout
=
CblasColMajor
Size
ld
*
k
.
Before entry, the leading
m
-by-
k
part of the array
src
must contain the matrix
A
.
For
cblas_gemm_s8u8s32_pack
the element in
src
array must be an 8-bit signed integer.
Size
ld
*
m
.
Before entry, the leading
k
-by-
m
part of the array
src
must contain the matrix
A
.
For
cblas_gemm_s8u8s32_pack
the element in
src
array must be an 8-bit signed integer.
Size
ld
*
n
.
Before entry, the leading
k
-by-
n
part of the array
src
must contain the matrix
B
.
For
cblas_gemm_s8u8s32_pack
the element in
src
array must be an 8-bit unsigned integer.
Size
ld
*
k
.
Before entry, the leading
n
-by-
k
part of the array
src
must contain the matrix
B
.
For
cblas_gemm_s8u8s32_pack
the element in
src
array must be an 8-bit unsigned integer.
Layout
=
CblasRowMajor
Size
ld
*
m
.
Before entry, the leading
k
-by-
m
part of the array
src
must contain the matrix
A
.
For
cblas_gemm_s8u8s32_pack
the element in
src
array must be an 8-bit unsigned integer.
Size
ld
*
k
.
Before entry, the leading
m
-by-
k
part of the array
src
must contain the matrix
A
.
For
cblas_gemm_s8u8s32_pack
the element in
src
array must be an 8-bit unsigned integer.
Size
ld
*
k
.
Before entry, the leading
n
-by-
k
part of the array
src
must contain the matrix
B
.
For
cblas_gemm_s8u8s32_pack
the element in
src
array must be an 8-bit signed integer.
Size
ld
*
n
.
Before entry, the leading
k
-by-
n
part of the array
src
must contain the matrix
B
.
For
cblas_gemm_s8u8s32_pack
the element in
src
array must be an 8-bit signed integer.
ld
MKL_INT
Specifies the leading dimension of
src
as declared in the calling (sub)program.
identifier
=
CblasAMatrix
identifier
=
CblasBMatrix
trans
=
CblasNoTrans
trans
=
CblasTrans
trans
=
CblasNoTrans
trans
=
CblasTrans
Layout
=
CblasColMajor
ld
must be at least
max(1,
m
)
.
ld
must be at least
max(1,
k
)
.
ld
must be at least
max(1,
k
)
.
ld
must be at least
max(1,
n
)
.
Layout
=
CblasRowMajor
ld
must be at least
max(1,
k
)
.
ld
must be at least
max(1,
m
)
.
ld
must be at least
max(1,
n
)
.
ld
must be at least
max(1,
k
)
.
dest
MKL_BF16*
for
cblas_gemm_bf16bf16f32_pack
, void*
for
cblas_
gemm_s8u8s32_pack
or
MKL_INT16*
for
cblas_
gemm_s16s16s32_pack
Buffer for the packed matrix.
Output Parameters
dest
MKL_BF16*
for
cblas_gemm_bf16bf16f32_pack
, void*
for
cblas_
gemm_s8u8s32_pack
or
MKL_INT16*
for
cblas_
gemm_s16s16s32_pack
Overwritten by the matrix
op(
src
)
stored in a format internal to
Intel® oneAPI Math Kernel Library
.

Example

See the following examples in the MKL installation directory to understand the use of these routines:
cblas_gemm_s8u8s32_pack: examples\cblas\source\cblas_gemm_s8u8s32_computex.c
cblas_gemm_s16s16s32_pack: examples\cblas\source\cblas_gemm_s16s16s32_computex.c
cblas_gemm_bf16bf16f32_pack: examples\cblas\source\cblas_gemm_bf16bf16f32_computex.c
Application Notes
When using
cblas_gemm_s8u8s32_pack
with row-major layout , the data types of
A
and
B
must be swapped. That is, you must provide an 8-bit unsigned integer array for matrix
A
and an 8-bit signed integer array for matrix
B
.

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804