Developer Reference

  • 2021.1
  • 12/04/2020
  • Public Content
Contents

cblas_gemm_bf16bf16f32_compute

Computes a matrix-matrix product with general bfloat16 matrices (where one or both input matrices are stored in a packed data structure) and adds the result to a scalar-matrix product.

Syntax

C:
void cblas_gemm_bf16bf16f32_compute
(
const
CBLAS_LAYOUT
Layout
,
const
MKL_INT
transa
,
const
MKL_INT
transb
,
const
MKL_INT
m
,
const
MKL_INT
n
,
const
MKL_INT
k
,
const
float
alpha
,
const
MKL_BF16
*a
,
const
MKL_INT
lda
,
const
MKL_BF16
*b
,
const
MKL_INT
ldb
,
const
float
beta
,
float
*c
,
const
MKL_INT
ldc
);
Include Files
  • mkl.h
Description
The
cblas_gemm_bf16bf16f32_compute
routine is one of a set of related routines that enable use of an internal packed storage. After calling
cblas_gemm_bf16bf16f32_pack
call
cblas_gemm_bf16bf16f32_compute
to compute
C
:=
alpha
* op(
A
)*op(
B
) +
beta
*
C
,
where:

    op(
    X
    )
    is either
    op(
    X
    ) =
    X
    or
    op(
    X
    ) =
    X
    T
    ,

    alpha
    and
    beta
    are scalars,

    A
    ,
    B
    , and
    C
    are matrices:

    op(
    A
    )
    is an
    m
    -by-
    k
    matrix,

    op(
    B
    )
    is a
    k
    -by-
    n
    matrix,

    C
    is an
    m
    -by-
    n
    matrix.

You must use the same value of the
Layout
parameter for the entire sequence of related
cblas_gemm_bf16bf16f32_pack
and
cblas_gemm_bf16bf16f32_compute
calls.
For best performance, use the same number of threads for packing and for computing.
If packing for both
A
and
B
matrices, you must use the same number of threads for packing
A
as for packing
B
.
Input Parameters
Layout
CBLAS_LAYOUT
Specifies whether two-dimensional array storage is row-major (
CblasRowMajor
) or column-major (
CblasColMajor
).
transa
MKL_INT
Specifies the form of
op(
A
)
used in the packing:
If
transa
=
CblasNoTrans
 
op(
A
) =
A
.
If
transa
=
CblasTrans
 
op(
A
) =
A
T
.
If
transa
=
CblasPacked
the matrix in array
a
is packed into a format internal to
Intel® oneAPI Math Kernel Library
and
lda
is ignored.
transb
MKL_INT
Specifies the form of
op(
B
)
used in the packing:
If
transb
=
CblasNoTrans
 
op(
B
) =
B
.
If
transb
=
CblasTrans
op(
B
) =
B
T
.
If
transb
=
CblasPacked
the matrix in array
b
is packed into a format internal to
Intel® oneAPI Math Kernel Library
and
ldb
is ignored.
m
MKL_INT
Specifies the number of rows of the matrix
op(
A
)
and of the matrix
C
. The value of
m
must be at least zero.
n
MKL_INT
Specifies the number of columns of the matrix
op(
B
)
and the number of columns of the matrix
C
. The value of
n
must be at least zero.
k
MKL_INT
Specifies the number of columns of the matrix
op(
A
)
and the number of rows of the matrix
op(
B
)
. The value of
k
must be at least zero.
alpha
float
Specifies the scalar
alpha
.
a
MKL_BF16*
transa
=
CblasNoTrans
transa
=
CblasTrans
transa
=
CblasPacked
Layout
=
CblasColMajor
Array, size
lda
*
k
.
Before entry, the leading
m
-by-
k
part of the array
a
must contain the matrix
A
.
Array, size
lda
*
m
.
Before entry, the leading
k
-by-
m
part of the array
a
must contain the matrix
A
.
Array of size returned by
cblas_gemm_bf16bf16f32_pack_get_size
and initialized using
cblas_gemm_bf16bf16f32_pack
.
Layout
=
CblasRowMajor
Array, size
lda
*
m
.
Before entry, the leading
k
-by-
m
part of the array
a
must contain the matrix
A
.
Array, size
lda
*
k
.
Before entry, the leading
m
-by-
k
part of the array
a
must contain the matrix
A
.
Array size returned by
cblas_gemm_bf16bf16f32_pack_get_size
and initialized using
cblas_gemm_bf16bf16f32_pack
.
lda
MKL_INT
Specifies the leading dimension of
a
as declared in the calling (sub)program.
transa
=
CblasNoTrans
transa
=
CblasTrans
Layout
=
CblasColMajor
lda
must be at least
max(1,
m
)
.
lda
must be at least
max(1,
k
)
.
Layout
=
CblasRowMajor
lda
must be at least
max(1,
k
)
.
lda
must be at least
max(1,
m
)
.
b
MKL_BF16*
transa
=
CblasNoTrans
transa
=
CblasTrans
transa
=
CblasPacked
Layout
=
CblasColMajor
Array, size
ldb
*
n
.
Before entry, the leading
k
-by-
n
part of the array
b
must contain the matrix
B
.
Array, size
ldb
*
k
.
Before entry, the leading
n
-by-
k
part of the array
b
must contain the matrix
B
.
Array of size returned by
cblas_gemm_bf16bf16f32_pack_get_size
and initialized using
cblas_gemm_bf16bf16f32_pack
.
Layout
=
CblasRowMajor
Array, size
ldb
*
k
.
Before entry, the leading
n
-by-
k
part of the array
b
must contain the matrix
B
.
Array, size
ldb
*
n
.
Before entry, the leading
k
-by-
n
part of the array
b
must contain the matrix
B
.
Array size returned by
cblas_gemm_bf16bf16f32_pack_get_size
and initialized using
cblas_gemm_bf16bf16f32_pack
.
ldb
MKL_INT
Specifies the leading dimension of
b
as declared in the calling (sub)program.
transb
=
CblasNoTrans
transb
=
CblasTrans
Layout
=
CblasColMajor
ldb
must be at least
max(1,
k
)
.
ldb
must be at least
max(1,
n
)
.
Layout
=
CblasRowMajor
ldb
must be at least
max(1,
n
)
.
ldb
must be at least
max(1,
k
)
.
beta
float
Specifies the scalar
beta
.
c
float*
Layout
=
CblasColMajor
Array, size
ldc
*
n
.
Before entry, the leading
m
-by-
n
part of the array
c
must contain the matrix
C
, except when
beta
is equal to zero, in which case
c
need not be set on entry.
Layout
=
CblasRowMajor
Array, size
ldc
*
m
.
Before entry, the leading
n
-by-
m
part of the array
c
must contain the matrix
C
, except when
beta
is equal to zero, in which case
c
need not be set on entry.
ldc
MKL_INT
Specifies the leading dimension of
c
as declared in the calling (sub)program.
Layout
=
CblasColMajor
ldc
must be at least
max(1,
m
)
.
Layout
=
CblasRowMajor
ldc
must be at least
max(1,
n
)
.
Output Parameters
c
float*
Overwritten by the matrix
alpha * op(
A
)*op(
B
) +
beta
*
C
.

Example

See the following examples in the
Intel® oneAPI Math Kernel Library
installation directory to understand the use of these routines:
cblas_gemm_bf16bf16f32_compute: examples\cblas\source\cblas_gemm_bf16bf16f32_computex.c
Application Notes
On architectures without native
bfloat16
hardware instructions, matrix
A
and
B
are upconverted to single precision and
SGEMM
is called to compute matrix multiplication operation.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.