Developer Reference

  • 0.10
  • 10/21/2020
  • Public Content
Contents

cblas_gemm_bf16bf16f32_compute

Computes a matrix-matrix product with general bfloat16 matrices (where one or both input matrices are stored in a packed data structure) and adds the result to a scalar-matrix product.

Syntax

C:
void cblas_gemm_bf16bf16f32_compute
(
const
CBLAS_LAYOUT
Layout
,
const
MKL_INT
transa
,
const
MKL_INT
transb
,
const
MKL_INT
m
,
const
MKL_INT
n
,
const
MKL_INT
k
,
const
float
alpha
,
const
MKL_BF16
*a
,
const
MKL_INT
lda
,
const
MKL_BF16
*b
,
const
MKL_INT
ldb
,
const
float
beta
,
float
*c
,
const
MKL_INT
ldc
);
Include Files
  • mkl.h
Description
The
cblas_gemm_bf16bf16f32_compute
routine is one of a set of related routines that enable use of an internal packed storage. After calling
cblas_gemm_bf16bf16f32_pack
call
cblas_gemm_bf16bf16f32_compute
to compute
C
:=
alpha
* op(
A
)*op(
B
) +
beta
*
C
,
where:

    op(
    X
    )
    is either
    op(
    X
    ) =
    X
    or
    op(
    X
    ) =
    X
    T
    ,

    alpha
    and
    beta
    are scalars,

    A
    ,
    B
    , and
    C
    are matrices:

    op(
    A
    )
    is an
    m
    -by-
    k
    matrix,

    op(
    B
    )
    is a
    k
    -by-
    n
    matrix,

    C
    is an
    m
    -by-
    n
    matrix.

You must use the same value of the
Layout
parameter for the entire sequence of related
cblas_gemm_bf16bf16f32_pack
and
cblas_gemm_bf16bf16f32_compute
calls.
For best performance, use the same number of threads for packing and for computing.
If packing for both
A
and
B
matrices, you must use the same number of threads for packing
A
as for packing
B
.
Input Parameters
Layout
CBLAS_LAYOUT
Specifies whether two-dimensional array storage is row-major (
CblasRowMajor
) or column-major (
CblasColMajor
).
transa
MKL_INT
Specifies the form of
op(
A
)
used in the packing:
If
transa
=
CblasNoTrans
 
op(
A
) =
A
.
If
transa
=
CblasTrans
 
op(
A
) =
A
T
.
If
transa
=
CblasPacked
the matrix in array
a
is packed into a format internal to
Intel® oneAPI Math Kernel Library
and
lda
is ignored.
transb
MKL_INT
Specifies the form of
op(
B
)
used in the packing:
If
transb
=
CblasNoTrans
 
op(
B
) =
B
.
If
transb
=
CblasTrans
op(
B
) =
B
T
.
If
transb
=
CblasPacked
the matrix in array
b
is packed into a format internal to
Intel® oneAPI Math Kernel Library
and
ldb
is ignored.
m
MKL_INT
Specifies the number of rows of the matrix
op(
A
)
and of the matrix
C
. The value of
m
must be at least zero.
n
MKL_INT
Specifies the number of columns of the matrix
op(
B
)
and the number of columns of the matrix
C
. The value of
n
must be at least zero.
k
MKL_INT
Specifies the number of columns of the matrix
op(
A
)
and the number of rows of the matrix
op(
B
)
. The value of
k
must be at least zero.
alpha
float
Specifies the scalar
alpha
.
a
MKL_BF16*
transa
=
CblasNoTrans
transa
=
CblasTrans
transa
=
CblasPacked
Layout
=
CblasColMajor
Array, size
lda
*
k
.
Before entry, the leading
m
-by-
k
part of the array
a
must contain the matrix
A
.
Array, size
lda
*
m
.
Before entry, the leading
k
-by-
m
part of the array
a
must contain the matrix
A
.
Array of size returned by
cblas_gemm_bf16bf16f32_pack_get_size
and initialized using
cblas_gemm_bf16bf16f32_pack
.
Layout
=
CblasRowMajor
Array, size
lda
*
m
.
Before entry, the leading
k
-by-
m
part of the array
a
must contain the matrix
A
.
Array, size
lda
*
k
.
Before entry, the leading
m
-by-
k
part of the array
a
must contain the matrix
A
.
Array size returned by
cblas_gemm_bf16bf16f32_pack_get_size
and initialized using
cblas_gemm_bf16bf16f32_pack
.
lda
MKL_INT
Specifies the leading dimension of
a
as declared in the calling (sub)program.
transa
=
CblasNoTrans
transa
=
CblasTrans
Layout
=
CblasColMajor
lda
must be at least
max(1,
m
)
.
lda
must be at least
max(1,
k
)
.
Layout
=
CblasRowMajor
lda
must be at least
max(1,
k
)
.
lda
must be at least
max(1,
m
)
.
b
MKL_BF16*
transa
=
CblasNoTrans
transa
=
CblasTrans
transa
=
CblasPacked
Layout
=
CblasColMajor
Array, size
ldb
*
n
.
Before entry, the leading
k
-by-
n
part of the array
b
must contain the matrix
B
.
Array, size
ldb
*
k
.
Before entry, the leading
n
-by-
k
part of the array
b
must contain the matrix
B
.
Array of size returned by
cblas_gemm_bf16bf16f32_pack_get_size
and initialized using
cblas_gemm_bf16bf16f32_pack
.
Layout
=
CblasRowMajor
Array, size
ldb
*
k
.
Before entry, the leading
n
-by-
k
part of the array
b
must contain the matrix
B
.
Array, size
ldb
*
n
.
Before entry, the leading
k
-by-
n
part of the array
b
must contain the matrix
B
.
Array size returned by
cblas_gemm_bf16bf16f32_pack_get_size
and initialized using
cblas_gemm_bf16bf16f32_pack
.
ldb
MKL_INT
Specifies the leading dimension of
b
as declared in the calling (sub)program.
transb
=
CblasNoTrans
transb
=
CblasTrans
Layout
=
CblasColMajor
ldb
must be at least
max(1,
k
)
.
ldb
must be at least
max(1,
n
)
.
Layout
=
CblasRowMajor
ldb
must be at least
max(1,
n
)
.
ldb
must be at least
max(1,
k
)
.
beta
float
Specifies the scalar
beta
.
c
float*
Layout
=
CblasColMajor
Array, size
ldc
*
n
.
Before entry, the leading
m
-by-
n
part of the array
c
must contain the matrix
C
, except when
beta
is equal to zero, in which case
c
need not be set on entry.
Layout
=
CblasRowMajor
Array, size
ldc
*
m
.
Before entry, the leading
n
-by-
m
part of the array
c
must contain the matrix
C
, except when
beta
is equal to zero, in which case
c
need not be set on entry.
ldc
MKL_INT
Specifies the leading dimension of
c
as declared in the calling (sub)program.
Layout
=
CblasColMajor
ldc
must be at least
max(1,
m
)
.
Layout
=
CblasRowMajor
ldc
must be at least
max(1,
n
)
.
Output Parameters
c
float*
Overwritten by the matrix
alpha * op(
A
)*op(
B
) +
beta
*
C
.

Example

See the following examples in the
Intel® oneAPI Math Kernel Library
installation directory to understand the use of these routines:
cblas_gemm_bf16bf16f32_compute: examples\cblas\source\cblas_gemm_bf16bf16f32_computex.c
Application Notes
On architectures without native
bfloat16
hardware instructions, matrix
A
and
B
are upconverted to single precision and
SGEMM
is called to compute matrix multiplication operation.

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserverd for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804