cblas_gemm_*
cblas_gemm_*
Computes a matrix-matrix product with general integer matrices.
Syntax
void
cblas_gemm_s8u8s32
(
const
CBLAS_LAYOUT
Layout
,
const
CBLAS_TRANSPOSE
transa
,
const
CBLAS_TRANSPOSE
transb
,
const
CBLAS_OFFSET
offsetc
,
const
MKL_INT
m
,
const
MKL_INT
n
,
const
MKL_INT
k
,
const
float
alpha
,
const
void
*a
,
const
MKL_INT
lda
,
const
MKL_INT8
oa
,
const
void
*b
,
const
MKL_INT
ldb
,
const
MKL_INT8
ob
, const
float
beta
,
MKL_INT32 *c,
const
MKL_INT
ldc,
const
MKL_INT32 *oc
);
void
cblas_gemm_s16s16s32
(
const
CBLAS_LAYOUT
Layout
,
const
CBLAS_TRANSPOSE
transa
,
const
CBLAS_TRANSPOSE
transb
,
const
CBLAS_OFFSET
offsetc, const
MKL_INT
m
,
const
MKL_INT
n
,
const
MKL_INT
k
,
const
float
alpha
,
const
MKL_INT16
*a
,
const
MKL_INT
lda
,
const
MKL_INT16
oa
,
const
MKL_INT16
*b,
const
MKL_INT
ldb
,
const
MKL_INT16
ob
,
const
float
beta
,
MKL_INT32
*c
,
const
MKL_INT
ldc
,
const
MKL_INT32
*oc
);
Include Files
- mkl.h
Description
The
cblas_gemm_*
routines compute a scalar-matrix-matrix product and adds the result to a scalar-matrix product. To get the final result, a vector is added to each row or column of the output matrix. The operation is defined as:C := alpha*(op(A) + A_offset)*(op(B) + B_offset) + beta*C + C_offset
where :
op(
is either X
)op(
or X
) = X
op(
,X
) = X
T
A_offset
is an m
-by-k
matrix with every element equal to the value oa
,B_offset
is a k
-by-n
matrix with every element equal to the value ob
,C_offset
is an m
-by-n
matrix defined by the oc
array as described in the description of the offsetc
parameter,alpha
and beta
are scalars,A
is a matrix such that op(
is A
)m
-by-k
,B
is a matrix such that op(
is B
)k
-by-n
, and
C
is an m
-by-n
matrix.Input Parameters
- Layout
- Specifies whether two-dimensional array storage is row-major (CblasRowMajor) or column-major (CblasColMajor).
- transa
- Specifies the form of op(A) used in the matrix multiplication:if, thentransa=CblasNoTransop(;A) =Aif, thentransa=CblasTransop(.A) =AT
- transb
- Specifies the form of op(B) used in the matrix multiplication:if, thentransb=CblasNoTransop(;B) =Bif, thentransb=CblasTransop(.B) =BT
- offsetc
- Specifies the form ofC_offsetused in the matrix multiplication.offsetc=CblasFixOffset:ochas a single element and every element ofC_offsetis equal to this element.offsetc=CblasColOffset:ochas a size ofmand every column ofC_offsetis equal tooc.offsetc=CblasRowOffset:ochas a size ofnand every row ofC_offsetis equal tooc.
- m
- Specifies the number of rows of the matrixop(and of the matrixA)C. The value ofmmust be at least zero.
- n
- Specifies the number of columns of the matrixop(and the number of columns of the matrixB)C. The value ofnmust be at least zero.
- k
- Specifies the number of columns of the matrixop(and the number of rows of the matrixA)op(. The value ofB)kmust be at least zero.
- alpha
- .Specifies the scalaralpha.
- a
- transa=CblasNoTranstransa=CblasTransLayout=CblasColMajorArray, sizelda*kBefore entry, the leadingm-by-kpart of the arrayamust contain the matrixAof 8-bit signed integers forcblas_gemm_s8u8s32or 16-bit signed integers forcblas_gemm_s16s16s32.Array, sizelda*mBefore entry, the leadingk-by-mpart of the arrayamust contain the matrixAof 8-bit signed integers forcblas_gemm_s8u8s32or 16-bit signed integers forcblas_gemm_s16s16s32.Layout=CblasRowMajorArray, sizelda*mBefore entry, the leadingk-by-mpart of the arrayamust contain the matrixAof 8-bit unsigned integers forcblas_gemm_s8u8s32or 16-bit signed integers forcblas_gemm_s16s16s32.Array, sizelda*kBefore entry, the leadingm-by-kpart of the arrayamust contain the matrixAof 8-bit unsigned integers forcblas_gemm_s8u8s32or 16-bit signed integers forcblas_gemm_s16s16s32.
- lda
- Specifies the leading dimension ofaas declared in the calling (sub)program.transa=CblasNoTranstransa=CblasTransLayout=CblasColMajorldamust be at leastmax(1,.m)ldamust be at leastmax(1,.k)Layout=CblasRowMajorldamust be at leastmax(1,.k)ldamust be at leastmax(1,.m)
- oa
- Specifies the scalar offset value for matrixA.
- b
- transb=CblasNoTranstransb=CblasTransLayout=CblasColMajorArray, sizeldbbynBefore entry, the leadingk-by-npart of the arraybmust contain the matrixBof 8-bit unsigned integers forcblas_gemm_s8u8s32or 16-bit signed integers forcblas_gemm_s16s16s32.Array, sizeldbbykBefore entry the leadingn-by-kpart of the arraybmust contain the matrixBof 8-bit unsigned integers forcblas_gemm_s8u8s32or 16-bit signed integers forcblas_gemm_s16s16s32.Layout=CblasRowMajorArray, sizeldbbykBefore entry the leadingn-by-kpart of the arraybmust contain the matrixBof 8-bit signed integers forcblas_gemm_s8u8s32or 16-bit signed integers forcblas_gemm_s16s16s32.Array, sizeldbbynBefore entry, the leadingk-by-npart of the arraybmust contain the matrixBof 8-bit signed integers forcblas_gemm_s8u8s32or 16-bit signed integers forcblas_gemm_s16s16s32.
- ldb
- Specifies the leading dimension ofbas declared in the calling (sub)program.transb=CblasNoTranstransb=CblasTransLayout=CblasColMajorldbmust be at leastmax(1,.k)ldbmust be at leastmax(1,.n)Layout=CblasRowMajorldbmust be at leastmax(1,.n)ldbmust be at leastmax(1,.k)
- ob
- Specifies the scalar offset value for matrixB.
- beta
- Specifies the scalarbeta. Whenbetais equal to zero, thencneed not be set on input.
- c
- Layout=CblasColMajorArray, sizeldcbyn. Before entry, the leadingm-by-npart of the arraycmust contain the matrixC, except whenbetais equal to zero, in which casecneed not be set on entry.Layout=CblasRowMajorArray, sizeldcbym. Before entry, the leadingn-by-mpart of the arraycmust contain the matrixC, except whenbetais equal to zero, in which casecneed not be set on entry.
- ldc
- Specifies the leading dimension ofcas declared in the calling (sub)program.Layout=CblasColMajorldcmust be at leastmax(1,.m)Layout=CblasRowMajorldcmust be at leastmax(1,.n)
- oc
- Array, sizelen. Specifies the offset values for matrixC.Ifoffsetc=CblasFixOffset:lenmust be at least 1.Ifoffsetc=CblasColOffset:lenmust be at least max(1,m).Ifoffsetc=CblasRowOffset:ocmust be at least max(1,n).
Output Parameters
- c
- Overwritten by.alpha*(op(A) +A_offset)*(op(B) +B_offset) +beta*C+C_offset
Example
For examples of routine usage, see the code in installation directory:
in the following links and in
the Intel® oneAPI Math Kernel Library
- cblas_gemm_s8u8s32:examples\cblas\source\cblas_gemm_s8u8s32x.c
- cblas_gemm_s16s16s32:examples\cblas\source\cblas_gemm_s16s16s32x.c
Application Notes
The matrix-matrix product can be expanded:
(op(
A
) + A_offset
)*(op(B
) + B_offset
) = op(
A
)*op(B
) + op(A
)*B_offset
+ A_offset
*op(B
) + A_offset
*B_offset
After computing these four multiplication terms separately, they are summed from left to right. The
results from the matrix-matrix product and the
C
matrix are scaled
with alpha
and beta
floating-point values respectively using
double-precision arithmetic. Before storing the results to the
output c
array, the floating-point values are rounded to the
nearest integers. In the event of overflow or underflow, the results depend on the architecture . The results are either unsaturated (wrapped) or saturated to maximum or minimum representable integer values for the data type of the output matrix. When using
cblas_gemm_s8u8s32
with row-major layout, the data types of A
and B
must be swapped. That is, you must provide an 8-bit unsigned integer array for matrix A
and an 8-bit signed integer array for matrix B
.Intermediate integer computations in on 64-bit Intel® Advanced Vector Extensions 2 (Intel® AVX2) and Intel® Advanced Vector Extensions 512 (Intel® AVX-512) architectures without Vector Neural Network Instructions (VNNI) extensions can saturate. This is because only 16-bits are available for the accumulation of intermediate results. You can avoid integer saturation by maintaining all integer elements of A or B matrices under 8 bits.
cblas_gemm_s8u8s32