Developer Reference

Contents

cblas_gemm_*

Computes a matrix-matrix product with general integer matrices.

Syntax

void
cblas_gemm_s8u8s32
(
const
CBLAS_LAYOUT
Layout
,
const
CBLAS_TRANSPOSE
transa
,
const
CBLAS_TRANSPOSE
transb
,
const
CBLAS_OFFSET
offsetc
,
const
MKL_INT
m
,
const
MKL_INT
n
,
const
MKL_INT
k
,
const
float
alpha
,
const
void
*a
,
const
MKL_INT
lda
,
const
MKL_INT8
oa
,
const
void
*b
,
const
MKL_INT
ldb
,
const
MKL_INT8
ob
, const
float
beta
,
MKL_INT32 *c,
const
MKL_INT
ldc,
const
MKL_INT32 *oc
);
void
cblas_gemm_s16s16s32
(
const
CBLAS_LAYOUT
Layout
,
const
CBLAS_TRANSPOSE
transa
,
const
CBLAS_TRANSPOSE
transb
,
const
CBLAS_OFFSET
offsetc, const
MKL_INT
m
,
const
MKL_INT
n
,
const
MKL_INT
k
,
const
float
alpha
,
const
MKL_INT16
*a
,
const
MKL_INT
lda
,
const
MKL_INT16
oa
,
const
MKL_INT16
*b,
const
MKL_INT
ldb
,
const
MKL_INT16
ob
,
const
float
beta
,
MKL_INT32
*c
,
const
MKL_INT
ldc
,
const
MKL_INT32
*oc
);
Include Files
  • mkl.h
Description
The
cblas_gemm_*
routines compute a scalar-matrix-matrix product and adds the result to a scalar-matrix product. To get the final result, a vector is added to each row or column of the output matrix. The operation is defined as:
C
:=
alpha
*(op(
A
) + A_offset)*(op(
B
) + B_offset) +
beta
*C + C_offset
where :

    op(
    X
    )
    is either
    op(
    X
    ) =
    X
    or
    op(
    X
    ) =
    X
    T
    ,

    A_offset
    is an
    m
    -by-
    k
    matrix with every element equal to the value
    oa
    ,

    B_offset
    is a
    k
    -by-
    n
    matrix with every element equal to the value
    ob
    ,

    C_offset
    is an
    m
    -by-
    n
    matrix defined by the
    oc
    array as described in the description of the
    offsetc
    parameter,

    alpha
    and
    beta
    are scalars,

    A
    is a matrix such that
    op(
    A
    )
    is
    m
    -by-
    k
    ,

    B
    is a matrix such that
    op(
    B
    )
    is
    k
    -by-
    n
    ,

    and
    C
    is an
    m
    -by-
    n
    matrix.

Input Parameters
Layout
Specifies whether two-dimensional array storage is row-major (
CblasRowMajor
) or column-major (
CblasColMajor
).
transa
Specifies the form of op(
A
) used in the matrix multiplication:
if
transa
=
CblasNoTrans
, then
op(
A
) =
A
;
if
transa
=
CblasTrans
, then
op(
A
) =
A
T
.
transb
Specifies the form of op(
B
) used in the matrix multiplication:
if
transb
=
CblasNoTrans
, then
op(
B
) =
B
;
if
transb
=
CblasTrans
, then
op(
B
) =
B
T
.
offsetc
Specifies the form of
C_offset
used in the matrix multiplication.

    offsetc
    =
    CblasFixOffset
    :
    oc
    has a single element and every element of
    C_offset
    is equal to this element.

    offsetc
    =
    CblasColOffset
    :
    oc
    has a size of
    m
    and every column of
    C_offset
    is equal to
    oc
    .

    offsetc
    =
    CblasRowOffset
    :
    oc
    has a size of
    n
    and every row of
    C_offset
    is equal to
    oc
    .

m
Specifies the number of rows of the matrix
op(
A
)
and of the matrix
C
. The value of
m
must be at least zero.
n
Specifies the number of columns of the matrix
op(
B
)
and the number of columns of the matrix
C
. The value of
n
must be at least zero.
k
Specifies the number of columns of the matrix
op(
A
)
and the number of rows of the matrix
op(
B
)
. The value of
k
must be at least zero.
alpha
.
Specifies the scalar
alpha
.
a
transa
=
CblasNoTrans
transa
=
CblasTrans
Layout
=
CblasColMajor
Array, size
lda
*
k
Before entry, the leading
m
-by-
k
part of the array
a
must contain the matrix
A
of 8-bit signed integers for
cblas_gemm_s8u8s32
or 16-bit signed integers for
cblas_gemm_s16s16s32
.
Array, size
lda
*
m
Before entry, the leading
k
-by-
m
part of the array
a
must contain the matrix
A
of 8-bit signed integers for
cblas_gemm_s8u8s32
or 16-bit signed integers for
cblas_gemm_s16s16s32
.
Layout
=
CblasRowMajor
Array, size
lda
*
m
Before entry, the leading
k
-by-
m
part of the array
a
must contain the matrix
A
of 8-bit unsigned integers for
cblas_gemm_s8u8s32
or 16-bit signed integers for
cblas_gemm_s16s16s32
.
Array, size
lda
*
k
Before entry, the leading
m
-by-
k
part of the array
a
must contain the matrix
A
of 8-bit unsigned integers for
cblas_gemm_s8u8s32
or 16-bit signed integers for
cblas_gemm_s16s16s32
.
lda
Specifies the leading dimension of
a
as declared in the calling (sub)program.
transa
=
CblasNoTrans
transa
=
CblasTrans
Layout
=
CblasColMajor
lda
must be at least
max(1,
m
)
.
lda
must be at least
max(1,
k
)
.
Layout
=
CblasRowMajor
lda
must be at least
max(1,
k
)
.
lda
must be at least
max(1,
m
)
.
oa
Specifies the scalar offset value for matrix
A
.
b
transb
=
CblasNoTrans
transb
=
CblasTrans
Layout
=
CblasColMajor
Array, size
ldb
by
n
Before entry, the leading
k
-by-
n
part of the array
b
must contain the matrix
B
of 8-bit unsigned integers for
cblas_gemm_s8u8s32
or 16-bit signed integers for
cblas_gemm_s16s16s32
.
Array, size
ldb
by
k
Before entry the leading
n
-by-
k
part of the array
b
must contain the matrix
B
of 8-bit unsigned integers for
cblas_gemm_s8u8s32
or 16-bit signed integers for
cblas_gemm_s16s16s32
.
Layout
=
CblasRowMajor
Array, size
ldb
by
k
Before entry the leading
n
-by-
k
part of the array
b
must contain the matrix
B
of 8-bit signed integers for
cblas_gemm_s8u8s32
or 16-bit signed integers for
cblas_gemm_s16s16s32
.
Array, size
ldb
by
n
Before entry, the leading
k
-by-
n
part of the array
b
must contain the matrix
B
of 8-bit signed integers for
cblas_gemm_s8u8s32
or 16-bit signed integers for
cblas_gemm_s16s16s32
.
ldb
Specifies the leading dimension of
b
as declared in the calling (sub)program.
transb
=
CblasNoTrans
transb
=
CblasTrans
Layout
=
CblasColMajor
ldb
must be at least
max(1,
k
)
.
ldb
must be at least
max(1,
n
)
.
Layout
=
CblasRowMajor
ldb
must be at least
max(1,
n
)
.
ldb
must be at least
max(1,
k
)
.
ob
Specifies the scalar offset value for matrix
B
.
beta
Specifies the scalar
beta
. When
beta
is equal to zero, then
c
need not be set on input.
c
Layout
=
CblasColMajor
Array, size
ldc
by
n
. Before entry, the leading
m
-by-
n
part of the array
c
must contain the matrix
C
, except when
beta
is equal to zero, in which case
c
need not be set on entry.
Layout
=
CblasRowMajor
Array, size
ldc
by
m
. Before entry, the leading
n
-by-
m
part of the array
c
must contain the matrix
C
, except when
beta
is equal to zero, in which case
c
need not be set on entry.
ldc
Specifies the leading dimension of
c
as declared in the calling (sub)program.
Layout
=
CblasColMajor
ldc
must be at least
max(1,
m
)
.
Layout
=
CblasRowMajor
ldc
must be at least
max(1,
n
)
.
oc
Array, size
len
. Specifies the offset values for matrix
C
.

    If
    offsetc
    =
    CblasFixOffset
    :
    len
    must be at least 1.

    If
    offsetc
    =
    CblasColOffset
    :
    len
    must be at least max(1,
    m
    ).

    If
    offsetc
    =
    CblasRowOffset
    :
    oc
    must be at least max(1,
    n
    ).

Output Parameters
c
Overwritten by
alpha
*(op(
A
) +
A_offset
)*(op(
B
) +
B_offset
) +
beta
*
C
+
C_offset
.

Example

For examples of routine usage, see the code in
in the following links and in
the
Intel® MKL
installation directory:
  • cblas_gemm_s8u8s32
    :
    examples\cblas\source\cblas_gemm_s8u8s32x.c
  • cblas_gemm_s16s16s32
    :
    examples\cblas\source\cblas_gemm_s16s16s32x.c
Application Notes
The matrix-matrix product can be expanded:
(op(
A
) +
A_offset
)*(op(
B
) +
B_offset
)
= op(
A
)*op(
B
) + op(
A
)*
B_offset
+
A_offset
*op(
B
) +
A_offset
*
B_offset
After computing these four multiplication terms separately, they are summed from left to right. The results from the matrix-matrix product and the
C
matrix are scaled with
alpha
and
beta
floating-point values respectively using double-precision arithmetic. Before storing the results to the output
c
array, the floating-point values are rounded to the nearest integers. In the event of overflow or underflow, the results depend on the architecture . The results are either unsaturated (wrapped) or saturated to maximum or minimum representable integer values for the data type of the output matrix.
When using
cblas_gemm_s8u8s32
with row-major layout, the data types of
A
and
B
must be swapped. That is, you must provide an 8-bit unsigned integer array for matrix
A
and an 8-bit signed integer array for matrix
B
.
Intermediate integer computations in
cblas_gemm_s8u8s32
on 64-bit Intel® Advanced Vector Extensions 2 (Intel® AVX2) and Intel® Advanced Vector Extensions 512 (Intel® AVX-512) architectures without Vector Neural Network Instructions (VNNI) extensions can saturate. This is because only 16-bits are available for the accumulation of intermediate results. You can avoid integer saturation by maintaining all integer elements of A or B matrices under 8 bits.

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804