Contents

# cblas_?gemm3m

Computes a scalar-matrix-matrix product using matrix multiplications and adds the result to a scalar-matrix product.

## Syntax

Include Files
• mkl.h
Description
The
?gemm3m
routines perform a matrix-matrix operation with general complex matrices. These routines are similar to the
?gemm
routines, but they use fewer matrix multiplication operations
(see
Application Notes
below)
.
The operation is defined as
`C := alpha*op(A)*op(B) + beta*C,`
where:
op(
x
)
is one of
op(
x
) =
x
, or
op(
x
) =
x
'
, or
op(
x
) = conjg(
x
')
,
alpha
and
beta
are scalars,
A
,
B
and
C
are matrices:
op(
A
)
is an
m
-by-
k
matrix,
op(
B
)
is a
k
-by-
n
matrix,
C
is an
m
-by-
n
matrix.
Input Parameters
Layout
Specifies whether two-dimensional array storage is row-major (
CblasRowMajor
) or column-major (
CblasColMajor
).
transa
Specifies the form of
op(
A
)
used in the matrix multiplication:
if
transa
=
CblasNoTrans
, then
op(
A
) =
A
;
if
transa
=
CblasTrans
, then
op(
A
) =
A
'
;
if
transa
=
CblasConjTrans
, then
op(
A
) = conjg(
A
')
.
transb
Specifies the form of
op(
B
)
used in the matrix multiplication:
if
transb
=
CblasNoTrans
, then
op(
B
) =
B
;
if
transb
=
CblasTrans
, then
op(
B
) =
B
'
;
if
transb
=
CblasConjTrans
, then
op(
B
) = conjg(
B
')
.
m
Specifies the number of rows of the matrix
op(
A
)
and of the matrix
C
. The value of
m
must be at least zero.
n
Specifies the number of columns of the matrix
op(
B
)
and the number of columns of the matrix
C
.
The value of
n
must be at least zero.
k
Specifies the number of columns of the matrix
op(
A
)
and the number of rows of the matrix
op(
B
)
.
The value of
k
must be at least zero.
alpha
Specifies the scalar
alpha
.
a
 transa=CblasNoTrans transa=CblasTrans or transa=CblasConjTrans Layout = CblasColMajor Array, size lda*k.Before entry, the leading m-by-k part of the array a must contain the matrix A. Array, size lda*m.Before entry, the leading k-by-m part of the array a must contain the matrix A. Layout = CblasRowMajor Array, size lda* m.Before entry, the leading k-by-m part of the array a must contain the matrix A. Array, size lda*k.Before entry, the leading m-by-k part of the array a must contain the matrix A.
lda
a
as declared in the calling (sub)program.
 transa=CblasNoTrans transa=CblasTrans or transa=CblasConjTrans Layout = CblasColMajor lda must be at least max(1, m). lda must be at least max(1, k) Layout = CblasRowMajor lda must be at least max(1, k) lda must be at least max(1, m).
b
 transb=CblasNoTrans transb=CblasTrans or transb=CblasConjTrans Layout = CblasColMajor Array, size ldb by n. Before entry, the leading k-by-n part of the array b must contain the matrix B. Array, size ldb by k. Before entry the leading n-by-k part of the array b must contain the matrix B. Layout = CblasRowMajor Array, size ldb by k. Before entry the leading n-by-k part of the array b must contain the matrix B. Array, size ldb by n. Before entry, the leading k-by-n part of the array b must contain the matrix B.
ldb
b
as declared in the calling (sub)program.
 transb=CblasNoTrans transb=CblasTrans or transb=CblasConjTrans Layout = CblasColMajor ldb must be at least max(1, k). ldb must be at least max(1, n). Layout = CblasRowMajor ldb must be at least max(1, n). ldb must be at least max(1, k).
beta
Specifies the scalar
beta
.
When
beta
is equal to zero, then
c
need not be set on input.
c
 Layout = CblasColMajor Array, size ldc by n. Before entry, the leading m-by-n part of the array c must contain the matrix C, except when beta is equal to zero, in which case c need not be set on entry. Layout = CblasRowMajor Array, size ldc by m. Before entry, the leading n-by-m part of the array c must contain the matrix C, except when beta is equal to zero, in which case c need not be set on entry.
ldc
c
as declared in the calling (sub)program.
 Layout = CblasColMajor ldc must be at least max(1, m). Layout = CblasRowMajor ldc must be at least max(1, n).
Output Parameters
c
Overwritten by the
m
-by-
n
matrix
(
alpha
*op(
A
)*op(
B
) +
beta
*
C
)
.
Application Notes
These routines perform a complex matrix multiplication by forming the real and imaginary parts of the input matrices. This uses three real matrix multiplications and five real matrix additions instead of the conventional four real matrix multiplications and two real matrix additions. The use of three real matrix multiplications reduces the time spent in matrix operations by 25%, resulting in significant savings in compute time for large matrices.
If the errors in the floating point calculations satisfy the following conditions:
fl
(
x
op
y
)=(
x
op
y
)(1+δ),|δ|≤
u
, op=×,/,
fl
(
x
±
y
)=
x
(1+α)±
y
(1+β), |α|,|β|≤
u
then for an
n
-by-
n
matrix
Ĉ
=
fl
(
C
1
+
i
C
2
)=
fl
((
A
1
+
i
A
2
)(
B
1
+
i
B
2
))=
Ĉ
1
+
i
Ĉ
2
, the following bounds are satisfied:
Ĉ
1
-
C
1
║≤ 2(
n
+1)
u
A
B
+
O
(
u
2
)
,
Ĉ
2
-
C
2
║≤ 4(
n
+4)
u
A
B
+
O
(
u
2
)
,
where
A
=max(║
A
1
,║
A
2
)
, and
B
=max(║
B
1
,║
B
2
)
.
Thus the corresponding matrix multiplications are stable.

#### Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804