cblas_gemm

Developer Reference for Intel® oneAPI Math Kernel Library for C

Download PDF

ID 766684

Date 11/07/2023

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

Visible to Intel only — GUID: GUID-F026EAA4-1AD8-4226-B4D2-0DD2A617C73C

View Details

cblas_gemm_*

Computes a matrix-matrix product with general integer matrices.

Syntax

void cblas_gemm_s8u8s32 (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE transa, const CBLAS_TRANSPOSE transb, const CBLAS_OFFSET offsetc, const MKL_INT m, const MKL_INT n, const MKL_INT k, const float alpha, const void *a, const MKL_INT lda, const MKL_INT8 oa, const void *b, const MKL_INT ldb, const MKL_INT8 ob, const float beta, MKL_INT32 *c, const MKL_INT ldc, const MKL_INT32 *oc);

void cblas_gemm_s16s16s32 (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE transa, const CBLAS_TRANSPOSE transb, const CBLAS_OFFSET offsetc, const MKL_INT m, const MKL_INT n, const MKL_INT k, const float alpha, const MKL_INT16 *a, const MKL_INT lda, const MKL_INT16 oa, const MKL_INT16 *b, const MKL_INT ldb, const MKL_INT16 ob, const float beta, MKL_INT32 *c, const MKL_INT ldc, const MKL_INT32 *oc);

Include Files

mkl.h

Description

The cblas_gemm_* routines compute a scalar-matrix-matrix product and adds the result to a scalar-matrix product. To get the final result, a vector is added to each row or column of the output matrix. The operation is defined as:

C := alpha*(op(A) + A_offset)*(op(B) + B_offset) + beta*C + C_offset

where :

op(X) is either op(X) = X or op(X) = X^T,
A_offset is an m-by-k matrix with every element equal to the value oa,
B_offset is a k-by-n matrix with every element equal to the value ob,
C_offset is an m-by-n matrix defined by the oc array as described in the description of the offsetc parameter,
alpha and beta are scalars,
A is a matrix such that op(A) is m-by-k,
B is a matrix such that op(B) is k-by-n,
and C is an m-by-n matrix.

Input Parameters

Layout

Specifies whether two-dimensional array storage is row-major (CblasRowMajor) or column-major (CblasColMajor).

transa

Specifies the form of op(A) used in the matrix multiplication:

if transa=CblasNoTrans, then op(A) = A;

if transa=CblasTrans, then op(A) = A^T.

transb

Specifies the form of op(B) used in the matrix multiplication:

if transb=CblasNoTrans, then op(B) = B;

if transb=CblasTrans, then op(B) = B^T.

offsetc

Specifies the form of C_offset used in the matrix multiplication.

offsetc = CblasFixOffset: oc has a single element and every element of C_offset is equal to this element.
offsetc = CblasColOffset: oc has a size of m and every column of C_offset is equal to oc.
offsetc = CblasRowOffset: oc has a size of n and every row of C_offset is equal to oc.

m

Specifies the number of rows of the matrix op(A) and of the matrix C. The value of m must be at least zero.

n

Specifies the number of columns of the matrix op(B) and the number of columns of the matrix C. The value of n must be at least zero.

k

Specifies the number of columns of the matrix op(A) and the number of rows of the matrix op(B). The value of k must be at least zero.

alpha

. Specifies the scalar alpha.

a

transa=CblasNoTrans

transa=CblasTrans

Layout = CblasColMajor

Array, size lda*k

Before entry, the leading m-by-k part of the array a must contain the matrix A of 8-bit signed integers for cblas_gemm_s8u8s32 or 16-bit signed integers for cblas_gemm_s16s16s32.

Array, size lda*m

Before entry, the leading k-by-m part of the array a must contain the matrix A of 8-bit signed integers for cblas_gemm_s8u8s32 or 16-bit signed integers for cblas_gemm_s16s16s32.

Layout = CblasRowMajor

Array, size lda* m

Before entry, the leading k-by-m part of the array a must contain the matrix A of 8-bit unsigned integers for cblas_gemm_s8u8s32 or 16-bit signed integers for cblas_gemm_s16s16s32.

Array, size lda*k

Before entry, the leading m-by-k part of the array a must contain the matrix A of 8-bit unsigned integers for cblas_gemm_s8u8s32 or 16-bit signed integers for cblas_gemm_s16s16s32.

lda

Specifies the leading dimension of a as declared in the calling (sub)program.

	transa=CblasNoTrans	transa=CblasTrans
Layout = CblasColMajor	lda must be at least `max(1, m)`.	lda must be at least `max(1, k)`.
Layout = CblasRowMajor	lda must be at least `max(1, k)`.	lda must be at least `max(1, m)`.

oa

Specifies the scalar offset value for matrix A.

b

transb=CblasNoTrans

transb=CblasTrans

Layout = CblasColMajor

Array, size ldb by n

Before entry, the leading k-by-n part of the array b must contain the matrix B of 8-bit unsigned integers for cblas_gemm_s8u8s32 or 16-bit signed integers for cblas_gemm_s16s16s32.

Array, size ldb by k

Before entry the leading n-by-k part of the array b must contain the matrix B of 8-bit unsigned integers for cblas_gemm_s8u8s32 or 16-bit signed integers for cblas_gemm_s16s16s32.

Layout = CblasRowMajor

Array, size ldb by k

Before entry the leading n-by-k part of the array b must contain the matrix B of 8-bit signed integers for cblas_gemm_s8u8s32 or 16-bit signed integers for cblas_gemm_s16s16s32.

Array, size ldb by n

Before entry, the leading k-by-n part of the array b must contain the matrix B of 8-bit signed integers for cblas_gemm_s8u8s32 or 16-bit signed integers for cblas_gemm_s16s16s32.

ldb

Specifies the leading dimension of b as declared in the calling (sub)program.

	transb=CblasNoTrans	transb=CblasTrans
Layout = CblasColMajor	ldb must be at least `max(1, k)`.	ldb must be at least `max(1, n)`.
Layout = CblasRowMajor	ldb must be at least `max(1, n)`.	ldb must be at least `max(1, k)`.

ob

Specifies the scalar offset value for matrix B.

beta

Specifies the scalar beta. When beta is equal to zero, then c need not be set on input.

c

Layout = CblasColMajor	Array, size ldc by n. Before entry, the leading m-by-n part of the array c must contain the matrix `C`, except when beta is equal to zero, in which case c need not be set on entry.
Layout = CblasRowMajor	Array, size ldc by m. Before entry, the leading n-by-m part of the array c must contain the matrix `C`, except when beta is equal to zero, in which case c need not be set on entry.

ldc

Specifies the leading dimension of c as declared in the calling (sub)program.

Layout = CblasColMajor	ldc must be at least `max(1, m)`.
Layout = CblasRowMajor	ldc must be at least `max(1, n)`.

oc

Array, size len. Specifies the offset values for matrix C.

If offsetc = CblasFixOffset: len must be at least 1.
If offsetc = CblasColOffset: len must be at least max(1, m).
If offsetc = CblasRowOffset: oc must be at least max(1, n).

Output Parameters

c	Overwritten by `alpha(op(A) + A_offset)(op(B) + B_offset) + beta*C+ C_offset`.

Example

For examples of routine usage, see the code in in the following links and in the Intel® oneAPI Math Kernel Library (oneMKL) installation directory:

cblas_gemm_s8u8s32: examples\cblas\source\cblas_gemm_s8u8s32x.c
cblas_gemm_s16s16s32: examples\cblas\source\cblas_gemm_s16s16s32x.c

Application Notes

The matrix-matrix product can be expanded:

(op(A) + A_offset)*(op(B) + B_offset)

= op(A)*op(B) + op(A)*B_offset + A_offset*op(B) + A_offset*B_offset

After computing these four multiplication terms separately, they are summed from left to right. The results from the matrix-matrix product and the C matrix are scaled with alpha and beta floating-point values respectively using double-precision arithmetic. Before storing the results to the output c array, the floating-point values are rounded to the nearest integers. In the event of overflow or underflow, the results depend on the architecture . The results are either unsaturated (wrapped) or saturated to maximum or minimum representable integer values for the data type of the output matrix.

When using cblas_gemm_s8u8s32 with row-major layout, the data types of A and B must be swapped. That is, you must provide an 8-bit unsigned integer array for matrix A and an 8-bit signed integer array for matrix B.

Intermediate integer computations in cblas_gemm_s8u8s32 on 64-bit Intel® Advanced Vector Extensions 2 (Intel® AVX2) and Intel® Advanced Vector Extensions 512 (Intel® AVX-512) architectures without Vector Neural Network Instructions (VNNI) extensions can saturate. This is because only 16-bits are available for the accumulation of intermediate results. You can avoid integer saturation by maintaining all integer elements of A or B matrices under 8 bits.

Parent topic: BLAS-like Extensions

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in