Developer Reference

  • 2021.1
  • 12/04/2020
  • Public Content
Contents

cblas_?gemm_compute

Computes a matrix-matrix product with general matrices where one or both input matrices are stored in a packed data structure and adds the result to a scalar-matrix product.

Syntax

void cblas_sgemm_compute
(
const
CBLAS_LAYOUT
Layout
,
const
MKL_INT
transa
,
const
MKL_INT
transb
,
const
MKL_INT
m
,
const
MKL_INT
n
,
const
MKL_INT
k
,
const
float
*a
,
const
MKL_INT
lda
,
const
float
*b
,
const
MKL_INT
ldb
,
const
float
beta
,
float
*c
,
const
MKL_INT
ldc
);
void cblas_dgemm_compute
(
const
CBLAS_LAYOUT
Layout
,
const
MKL_INT
transa
,
const
MKL_INT
transb
,
const
MKL_INT
m
,
const
MKL_INT
n
,
const
MKL_INT
k
,
const
double
*a
,
const
MKL_INT
lda
,
const
double
*b
,
const
MKL_INT
ldb
,
const
double
beta
,
double
*c
,
const
MKL_INT
ldc
);
Include Files
  • mkl.h
Description
The
cblas_?gemm_compute
routine is one of a set of related routines that enable use of an internal packed storage. After calling
cblas_?gemm_pack
call
cblas_?gemm_compute
to compute
C
:= op(
A
)*op(
B
) +
beta
*
C
,
where:

    op(
    X
    )
    is one of the operations
    op(
    X
    ) =
    X
    ,
    op(
    X
    ) =
    X
    T
    , or
    op(
    X
    ) =
    X
    H
    ,

    beta
    is a scalar,

    A
    ,
    B
    , and
    C
    are matrices:

    op(
    A
    )
    is an
    m
    -by-
    k
    matrix,

    op(
    B
    )
    is a
    k
    -by-
    n
    matrix,

    C
    is an
    m
    -by-
    n
    matrix.

You must use the same value of the
Layout
parameter for the entire sequence of related
cblas_?gemm_pack
and
cblas_?gemm_compute
calls.
For best performance, use the same number of threads for packing and for computing.
If packing for both
A
and
B
matrices, you must use the same number of threads for packing
A
as for packing
B
.
Input Parameters
Layout
Specifies whether two-dimensional array storage is row-major (
CblasRowMajor
) or column-major (
CblasColMajor
).
transa
Specifies the form of
op(
A
)
used in the matrix multiplication
, one of the
CBLAS_TRANSPOSE
or
CBLAS_STORAGE
enumerated types
:
If
transa
=
CblasNoTrans
 
op(
A
) =
A
.
If
transa
=
CblasTrans
 
op(
A
) =
A
T
.
If
transa
=
CblasConjTrans
 
op(
A
) =
A
H
.
If
transa
=
CblasPacked
the matrix in array
a
is packed and
lda
is ignored.
transb
Specifies the form of
op(
B
)
used in the matrix multiplication
, one of the
CBLAS_TRANSPOSE
or
CBLAS_STORAGE
enumerated types
:
If
transb
=
CblasNoTrans
 
op(
B
) =
B
.
If
transb
=
CblasTrans
op(
B
) =
B
T
.
If
transb
=
CblasConjTrans
op(
B
) =
B
H
.
If
transb
=
CblasPacked
the matrix in array
b
is packed and
ldb
is ignored.
m
Specifies the number of rows of the matrix
op(
A
)
and of the matrix
C
. The value of
m
must be at least zero.
n
Specifies the number of columns of the matrix
op(
B
)
and the number of columns of the matrix
C
. The value of
n
must be at least zero.
k
Specifies the number of columns of the matrix
op(
A
)
and the number of rows of the matrix
op(
B
)
. The value of
k
must be at least zero.
a
Array:
transa
=
CblasNoTrans
transa
=
CblasTrans
or
transa
=
CblasConjTrans
transa
=
CblasPacked
Layout
=
CblasColMajor
Size
lda
*
k
.
Before entry, the leading
m
-by-
k
part of the array
a
must contain the matrix
A
.
Size
lda
*
m
.
Before entry, the leading
k
-by-
m
part of the array
a
must contain the matrix
A
.
Stored in internal packed format.
Layout
=
CblasRowMajor
Size
lda
*
m
.
Before entry, the leading
k
-by-
m
part of the array
a
must contain the matrix
A
.
Size
lda
*
k
.
Before entry, the leading
m
-by-
k
part of the array
a
must contain the matrix
A
.
Stored in internal packed format.
lda
Specifies the leading dimension of
a
as declared in the calling (sub)program.
transa
=
CblasNoTrans
transa
=
CblasTrans
or
transa
=
CblasConjTrans
transa
=
CblasPacked
Layout
=
CblasColMajor
lda
must be at least
max(1,
m
)
.
lda
must be at least
max(1,
k
)
.
lda
is ignored.
Layout
=
CblasRowMajor
lda
must be at least
max(1,
k
)
.
lda
must be at least
max(1,
m
)
.
lda
is ignored.
b
Array:
transb
=
CblasNoTrans
transb
=
CblasTrans
or
transb
=
CblasConjTrans
transb
=
CblasPacked
Layout
=
CblasColMajor
Size
ldb
*
n
.
Before entry, the leading
k
-by-
n
part of the array
b
must contain the matrix
B
.
Size
ldb
*
k
.
Before entry, the leading
n
-by-
k
part of the array
b
must contain the matrix
B
.
Stored in internal packed format.
Layout
=
CblasRowMajor
Size
ldb
*
k
.
Before entry, the leading
n
-by-
k
part of the array
b
must contain the matrix
B
.
Size
ldb
*
n
.
Before entry, the leading
k
-by-
n
part of the array
b
must contain the matrix
B
.
Stored in internal packed format.
ldb
Specifies the leading dimension of
b
as declared in the calling (sub)program.
transb
=
CblasNoTrans
transb
=
CblasTrans
or
transb
=
CblasConjTrans
transb
=
CblasPacked
Layout
=
CblasColMajor
ldb
must be at least
max(1,
k
)
.
ldb
must be at least
max(1,
n
)
.
ldb
is ignored.
Layout
=
CblasRowMajor
ldb
must be at least
max(1,
n
)
.
ldb
must be at least
max(1,
k
)
.
ldb
is ignored.
beta
Specifies the scalar
beta
. When
beta
is equal to zero, then
c
need not be set on input.
c
Array:
Layout
=
CblasColMajor
Size
ldc
*
n
.
Before entry, the leading
m
-by-
n
part of the array
c
must contain the matrix
C
, except when
beta
is equal to zero, in which case
c
need not be set on entry.
Layout
=
CblasRowMajor
Size
ldc
*
m
.
Before entry, the leading
n
-by-
m
part of the array
c
must contain the matrix
C
, except when
beta
is equal to zero, in which case
c
need not be set on entry.
ldc
Specifies the leading dimension of
c
as declared in the calling (sub)program.
Layout
=
CblasColMajor
ldc
must be at least
max(1,
m
)
.
Layout
=
CblasRowMajor
ldc
must be at least
max(1,
n
)
.
Output Parameters
c
Overwritten by the
m
-by-
n
matrix
op(
A
)*op(
B
) +
beta
*
C
.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.