Developer Reference

Contents

cblas_?gemm_compute

Computes a matrix-matrix product with general matrices where one or both input matrices are stored in a packed data structure and adds the result to a scalar-matrix product.

Syntax

void cblas_sgemm_compute
(
const
CBLAS_LAYOUT
Layout
,
const
MKL_INT
transa
,
const
MKL_INT
transb
,
const
MKL_INT
m
,
const
MKL_INT
n
,
const
MKL_INT
k
,
const
float
*a
,
const
MKL_INT
lda
,
const
float
*b
,
const
MKL_INT
ldb
,
const
float
beta
,
float
*c
,
const
MKL_INT
ldc
);
void cblas_dgemm_compute
(
const
CBLAS_LAYOUT
Layout
,
const
MKL_INT
transa
,
const
MKL_INT
transb
,
const
MKL_INT
m
,
const
MKL_INT
n
,
const
MKL_INT
k
,
const
double
*a
,
const
MKL_INT
lda
,
const
double
*b
,
const
MKL_INT
ldb
,
const
double
beta
,
double
*c
,
const
MKL_INT
ldc
);
Include Files
  • mkl.h
Description
The
cblas_?gemm_compute
routine is one of a set of related routines that enable use of an internal packed storage. After calling
cblas_?gemm_pack
call
cblas_?gemm_compute
to compute
C
:= op(
A
)*op(
B
) +
beta
*
C
,
where:

    op(
    X
    )
    is one of the operations
    op(
    X
    ) =
    X
    ,
    op(
    X
    ) =
    X
    T
    , or
    op(
    X
    ) =
    X
    H
    ,

    beta
    is a scalar,

    A
    ,
    B
    , and
    C
    are matrices:

    op(
    A
    )
    is an
    m
    -by-
    k
    matrix,

    op(
    B
    )
    is a
    k
    -by-
    n
    matrix,

    C
    is an
    m
    -by-
    n
    matrix.

You must use the same value of the
Layout
parameter for the entire sequence of related
cblas_?gemm_pack
and
cblas_?gemm_compute
calls.
For best performance, use the same number of threads for packing and for computing.
If packing for both
A
and
B
matrices, you must use the same number of threads for packing
A
as for packing
B
.
Input Parameters
Layout
Specifies whether two-dimensional array storage is row-major (
CblasRowMajor
) or column-major (
CblasColMajor
).
transa
Specifies the form of
op(
A
)
used in the matrix multiplication
, one of the
CBLAS_TRANSPOSE
or
CBLAS_STORAGE
enumerated types
:
If
transa
=
CblasNoTrans
 
op(
A
) =
A
.
If
transa
=
CblasTrans
 
op(
A
) =
A
T
.
If
transa
=
CblasConjTrans
 
op(
A
) =
A
H
.
If
transa
=
CblasPacked
the matrix in array
a
is packed and
lda
is ignored.
transb
Specifies the form of
op(
B
)
used in the matrix multiplication
, one of the
CBLAS_TRANSPOSE
or
CBLAS_STORAGE
enumerated types
:
If
transb
=
CblasNoTrans
 
op(
B
) =
B
.
If
transb
=
CblasTrans
op(
B
) =
B
T
.
If
transb
=
CblasConjTrans
op(
B
) =
B
H
.
If
transb
=
CblasPacked
the matrix in array
b
is packed and
ldb
is ignored.
m
Specifies the number of rows of the matrix
op(
A
)
and of the matrix
C
. The value of
m
must be at least zero.
n
Specifies the number of columns of the matrix
op(
B
)
and the number of columns of the matrix
C
. The value of
n
must be at least zero.
k
Specifies the number of columns of the matrix
op(
A
)
and the number of rows of the matrix
op(
B
)
. The value of
k
must be at least zero.
a
Array:
transa
=
CblasNoTrans
transa
=
CblasTrans
or
transa
=
CblasConjTrans
transa
=
CblasPacked
Layout
=
CblasColMajor
Size
lda
*
k
.
Before entry, the leading
m
-by-
k
part of the array
a
must contain the matrix
A
.
Size
lda
*
m
.
Before entry, the leading
k
-by-
m
part of the array
a
must contain the matrix
A
.
Stored in internal packed format.
Layout
=
CblasRowMajor
Size
lda
*
m
.
Before entry, the leading
k
-by-
m
part of the array
a
must contain the matrix
A
.
Size
lda
*
k
.
Before entry, the leading
m
-by-
k
part of the array
a
must contain the matrix
A
.
Stored in internal packed format.
lda
Specifies the leading dimension of
a
as declared in the calling (sub)program.
transa
=
CblasNoTrans
transa
=
CblasTrans
or
transa
=
CblasConjTrans
transa
=
CblasPacked
Layout
=
CblasColMajor
lda
must be at least
max(1,
m
)
.
lda
must be at least
max(1,
k
)
.
lda
is ignored.
Layout
=
CblasRowMajor
lda
must be at least
max(1,
k
)
.
lda
must be at least
max(1,
m
)
.
lda
is ignored.
b
Array:
transb
=
CblasNoTrans
transb
=
CblasTrans
or
transb
=
CblasConjTrans
transb
=
CblasPacked
Layout
=
CblasColMajor
Size
ldb
*
n
.
Before entry, the leading
k
-by-
n
part of the array
b
must contain the matrix
B
.
Size
ldb
*
k
.
Before entry, the leading
n
-by-
k
part of the array
b
must contain the matrix
B
.
Stored in internal packed format.
Layout
=
CblasRowMajor
Size
ldb
*
k
.
Before entry, the leading
n
-by-
k
part of the array
b
must contain the matrix
B
.
Size
ldb
*
n
.
Before entry, the leading
k
-by-
n
part of the array
b
must contain the matrix
B
.
Stored in internal packed format.
ldb
Specifies the leading dimension of
b
as declared in the calling (sub)program.
transb
=
CblasNoTrans
transb
=
CblasTrans
or
transb
=
CblasConjTrans
transb
=
CblasPacked
Layout
=
CblasColMajor
ldb
must be at least
max(1,
k
)
.
ldb
must be at least
max(1,
n
)
.
ldb
is ignored.
Layout
=
CblasRowMajor
ldb
must be at least
max(1,
n
)
.
ldb
must be at least
max(1,
k
)
.
ldb
is ignored.
beta
Specifies the scalar
beta
. When
beta
is equal to zero, then
c
need not be set on input.
c
Array:
Layout
=
CblasColMajor
Size
ldc
*
n
.
Before entry, the leading
m
-by-
n
part of the array
c
must contain the matrix
C
, except when
beta
is equal to zero, in which case
c
need not be set on entry.
Layout
=
CblasRowMajor
Size
ldc
*
m
.
Before entry, the leading
n
-by-
m
part of the array
c
must contain the matrix
C
, except when
beta
is equal to zero, in which case
c
need not be set on entry.
ldc
Specifies the leading dimension of
c
as declared in the calling (sub)program.
Layout
=
CblasColMajor
ldc
must be at least
max(1,
m
)
.
Layout
=
CblasRowMajor
ldc
must be at least
max(1,
n
)
.
Output Parameters
c
Overwritten by the
m
-by-
n
matrix
op(
A
)*op(
B
) +
beta
*
C
.

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804