cblas_gemm_bf16bf16f32
cblas_gemm_bf16bf16f32
Computes a matrix-matrix product with general bfloat16 matrices.
Syntax
void
cblas_gemm_bf16bf16f32
(
const
CBLAS_LAYOUT
Layout
,
const
CBLAS_TRANSPOSE
transa
,
const
CBLAS_TRANSPOSE
transb
,
const
MKL_INT
m
,
const
MKL_INT
n
,
const
MKL_INT
k
,
const
float
alpha
,
const
MKL_BF16
*a
,
const
MKL_INT
lda
,
const
MKL_BF16
*b
,
const
MKL_INT
ldb
,
const
float
beta
,
float
*c
const
MKL_INT
ldc,
);
Include Files
- mkl.h
Description
The
cblas_gemm_bf16bf16f32
routines compute a scalar-matrix-matrix product and adds the result to a scalar-matrix product. The operation is defined as:C := alpha*(op(A) *(op(B) + beta*C
where :
op(
is one of X
)op(
or X
) = X
op(
,X
) = X
T
alpha
and beta
are scalars,A
, B
, and C
are matricesop(
is A
)m
-by-k
matrix,op(
is B
)k
-by-n
matrix,C
is an m
-by-n
matrix.Input Parameters
- Layout
- Specifies whether two-dimensional array storage is row-major (CblasRowMajor) or column-major (CblasColMajor).
- transa
- Specifies the form of op(A) used in the matrix multiplication:iftransa=CblasNoTrans, thenop(;A) =Aiftransa=CblasTrans, thenop(.A) =AT
- transb
- Specifies the form of op(B) used in the matrix multiplication:iftransb=CblasNoTrans, thenop(;B) =Biftransb=CblasTrans, thenop(.B) =BT
- m
- Specifies the number of rows of the matrixop(and of the matrixA)C. The value ofmmust be at least zero.
- n
- Specifies the number of columns of the matrixop(and the number of columns of the matrixB)C. The value ofnmust be at least zero.
- k
- Specifies the number of columns of the matrixop(and the number of rows of the matrixA)op(. The value ofB)kmust be at least zero.
- alpha
- Specifies the scalaralpha.
- a
- transa=CblasNoTranstransa=CblasTransLayout=CblasColMajorArray, sizelda*kBefore entry, the leadingm-by-kpart of the arrayamust contain the matrixA.Array, sizelda*mBefore entry, the leadingk-by-mpart of the arrayamust contain the matrixA.Layout=CblasRowMajorArray, sizelda*mBefore entry, the leadingk-by-mpart of the arrayamust contain the matrix.Array, sizelda*kBefore entry, the leadingm-by-kpart of the arrayamust contain the matrix.
- lda
- Specifies the leading dimension ofaas declared in the calling (sub)program.transa=CblasNoTranstransa=CblasTransLayout=CblasColMajorldamust be at leastmax(1,.m)ldamust be at leastmax(1,.k)Layout=CblasRowMajorldamust be at leastmax(1,.k)ldamust be at leastmax(1,.m)
- b
- transb=CblasNoTranstransb=CblasTransLayout=CblasColMajorArray, sizeldbbynBefore entry, the leadingk-by-npart of the arraybmust contain the matrixB.Array, sizeldbbykBefore entry the leadingn-by-kpart of the arraybmust contain the matrixB.Layout=CblasRowMajorArray, sizeldbbykBefore entry the leadingn-by-kpart of the arraybmust contain the matrixB.Array, sizeldbbynBefore entry, the leadingk-by-npart of the arraybmust contain the matrixB.
- ldb
- Specifies the leading dimension ofbas declared in the calling (sub)program.transb=CblasNoTranstransb=CblasTransLayout=CblasColMajorldbmust be at leastmax(1,.k)ldbmust be at leastmax(1,.n)Layout=CblasRowMajorldbmust be at leastmax(1,.n)ldbmust be at leastmax(1,.k)
- beta
- Specifies the scalarbeta. Whenbetais equal to zero, thencneed not be set on input.
- c
- Layout=CblasColMajorArray, sizeldcbyn. Before entry, the leadingm-by-npart of the arraycmust contain the matrixC, except whenbetais equal to zero, in which casecneed not be set on entry.Layout=CblasRowMajorArray, sizeldcbym. Before entry, the leadingn-by-mpart of the arraycmust contain the matrixC, except whenbetais equal to zero, in which casecneed not be set on entry.
- ldc
- Specifies the leading dimension ofcas declared in the calling (sub)program.Layout=CblasColMajorldcmust be at leastmax(1,.m)Layout=CblasRowMajorldcmust be at leastmax(1,.n)
Output Parameters
- c
- Overwritten by.alpha* op(A) * op(B) +beta*C
Example
For examples of routine usage, see these code examples in the installation directory:
Intel® oneAPI Math Kernel Library
- cblas_gemm_bf16bf16f32:examples\cblas\source\cblas_gemm_bf16bf16f32x.c
Application Notes
On architectures without native bfloat16 hardware instructions, matrix
A
and B
are upconverted to single precision and SGEMM
is called to compute matrix multiplication operation.