Developer Reference

  • 0.9
  • 09/09/2020
  • Public Content
Contents

cblas_?gemm3m_batch

Computes scalar-matrix-matrix products and adds the results to scalar matrix products for groups of general matrices.

Syntax

void
cblas_cgemm3m_batch
(
const
CBLAS_LAYOUT
Layout
,
const
CBLAS_TRANSPOSE*
transa_array
,
const
CBLAS_TRANSPOSE*
transb_array
,
const
MKL_INT*
m_array
,
const
MKL_INT*
n_array
,
const
MKL_INT*
k_array
,
const
void
*alpha_array
,
const
void
**a_array
,
const
MKL_INT*
lda_array
,
const
void
**b_array
,
const
MKL_INT*
ldb_array
,
const
void
*beta_array
,
void
**c_array
,
const
MKL_INT*
ldc_array
,
const MKL_INT
group_count
,
const MKL_INT*
group_size
);
void
cblas_zgemm3m_batch
(
const
CBLAS_LAYOUT
Layout
,
const
CBLAS_TRANSPOSE*
transa_array
,
const
CBLAS_TRANSPOSE*
transb_array
,
const
MKL_INT*
m_array
,
const
MKL_INT*
n_array
,
const
MKL_INT*
k_array
,
const
void
*alpha_array
,
const
void
**a_array
,
const
MKL_INT*
lda_array
,
const
void
**b_array
,
const
MKL_INT*
ldb_array
,
const
void
*beta_array
,
void
**c_array
,
const
MKL_INT*
ldc_array
,
const MKL_INT
group_count
,
const MKL_INT*
group_size
);
Include Files
  • mkl.h
Description
The
?gemm3m_batch
routines perform a series of matrix-matrix operations with general matrices. They are similar to the
?gemm3m
routine counterparts, but the
?gemm3m_batch
routines perform matrix-matrix operations with groups of matrices, processing a number of groups at once. The groups contain matrices with the same parameters. The
?gemm3m_batch
routines use fewer matrix multiplications than the
?gemm_batch
routines, as described in the
Application Notes
.
The operation is defined as
idx = 0 for i = 0..group_count - 1 alpha and beta in alpha_array[i] and beta_array[i] for j = 0..group_size[i] - 1 A, B, and C matrix in a_array[idx], b_array[idx], and c_array[idx] C := alpha*op(A)*op(B) + beta*C, idx = idx + 1 end for end for
where:
op(
X
)
is one of
op(
X
) =
X
, or
op(
X
) =
X
T
, or
op(
X
) =
X
H
,
alpha
and
beta
are scalar elements of
alpha_array
and
beta_array
,
A
,
B
and
C
are matrices such that for
m
,
n
, and
k
which are elements of
m_array
,
n_array
, and
k_array
:
op(
A
)
is an
m
-by-
k
matrix,
op(
B
)
is a
k
-by-
n
matrix,
C
is an
m
-by-
n
matrix.
A
,
B
, and
C
represent matrices stored at addresses pointed to by
a_array
,
b_array
, and
c_array
, respectively. The number of entries in
a_array
,
b_array
, and
c_array
is
total_batch_count
= the sum of all the
group_size
entries.
See also gemm for a detailed description of multiplication for general matrices and gemm_batch, BLAS-like extension routines for similar matrix-matrix operations.
Error checking is not performed for
Intel® oneAPI Math Kernel Library
Windows* single dynamic libraries for the
?gemm3m_batch
routines.
Input Parameters
Layout
Specifies whether two-dimensional array storage is row-major (
CblasRowMajor
) or column-major (
CblasColMajor
).
transa_array
Array of size
group_count
. For the group
i
,
transa
i
=
transa_array
[
i
]
specifies the form of
op(
A
)
used in the matrix multiplication:
if
transa
i
=
CblasNoTrans
, then
op(
A
) =
A
;
if
transa
i
=
CblasTrans
, then
op(
A
) =
A
T
;
if
transa
i
=
CblasConjTrans
, then
op(
A
) =
A
H
.
transb_array
Array of size
group_count
. For the group
i
,
transb
i
=
transb_array
[
i
]
specifies the form of
op(
B
i
)
used in the matrix multiplication:
if
transb
i
=
CblasNoTrans
, then
op(
B
) =
B
;
if
transb
i
=
CblasTrans
, then
op(
B
) =
B
T
;
if
transb
i
=
CblasConjTrans
, then
op(
B
) =
B
H
.
m_array
Array of size
group_count
. For the group
i
,
m
i
=
m_array
[
i
]
specifies the number of rows of the matrix
op(
A
)
and of the matrix
C
.
The value of each element of
m_array
must be at least zero.
n_array
Array of size
group_count
. For the group
i
,
n
i
=
n_array
[
i
]
specifies the number of columns of the matrix
op(
B
)
and the number of columns of the matrix
C
.
The value of each element of
n_array
must be at least zero.
k_array
Array of size
group_count
. For the group
i
,
k
i
=
k_array
[
i
]
specifies the number of columns of the matrix
op(
A
)
and the number of rows of the matrix
op(
B
)
.
The value of each element of
k_array
must be at least zero.
alpha_array
Array of size
group_count
. For the group
i
,
alpha_array
[
i
]
specifies the scalar
alpha
i
.
a_array
Array, size
total_batch_count
, of pointers to arrays used to store
A
matrices.
lda_array
Array of size
group_count
. For the group
i
,
lda
i
=
lda_array
[
i
]
specifies the leading dimension of the array storing matrix
A
as declared in the calling (sub)program.
transa
i
=
CblasNoTrans
transa
i
=
CblasTrans
or
transa
i
=
CblasConjTrans
Layout
=
CblasColMajor
lda
i
must be at least
max(1,
m
i
)
.
lda
i
must be at least
max(1,
k
i
)
Layout
=
CblasRowMajor
lda
i
must be at least
max(1,
k
i
)
lda
i
must be at least
max(1,
m
i
)
.
b_array
Array, size
total_batch_count
, of pointers to arrays used to store
B
matrices.
ldb_array
Array of size
group_count
. For the group
i
,
ldb
i
=
ldb_array
[
i
]
specifies the leading dimension of the array storing matrix
B
as declared in the calling (sub)program.
transb
i
=
CblasNoTrans
transb
i
=
CblasTrans
or
transb
i
=
CblasConjTrans
Layout
=
CblasColMajor
ldb
i
must be at least
max(1,
k
i
)
.
ldb
i
must be at least
max(1,
n
i
)
.
Layout
=
CblasRowMajor
ldb
i
must be at least
max(1,
n
i
)
.
ldb
i
must be at least
max(1,
k
i
)
.
beta_array
For the group
i
,
beta_array
[
i
]
specifies the scalar
beta
i
.
When
beta
i
is equal to zero, then
C
matrices in group
i
need not be set on input.
c_array
Array, size
total_batch_count
, of pointers to arrays used to store
C
matrices.
ldc_array
Array of size
group_count
. For the group
i
,
ldc
i
=
ldc_array
[
i
]
specifies the leading dimension of all arrays storing matrix
C
in group
i
as declared in the calling (sub)program.
When
Layout
=
CblasColMajor
ldc
i
must be at least
max(1,
m
i
)
.
When
Layout
=
CblasRowMajor
ldc
i
must be at least
max(1,
n
i
)
.
group_count
Specifies the number of groups. Must be at least 0.
group_size
Array of size
group_count
. The element
group_size
[
i
]
specifies the number of matrices in group
i
. Each element in
group_size
must be at least 0.
Output Parameters
c_array
Overwritten by the
m
i
-by-
n
i
matrix
(
alpha
i
*op(
A
)*op(
B
) +
beta
i
*
C
)
for group
i
.
Application Notes
These routines perform a complex matrix multiplication by forming the real and imaginary parts of the input matrices. This uses three real matrix multiplications and five real matrix additions instead of the conventional four real matrix multiplications and two real matrix additions. The use of three real matrix multiplications reduces the time spent in matrix operations by 25%, resulting in significant savings in compute time for large matrices.
If the errors in the floating point calculations satisfy the following conditions:
fl
(
x
op
y
)=(
x
op
y
)(1+δ),|δ|≤
u
, op=×,/,
fl
(
x
±
y
)=
x
(1+α)±
y
(1+β), |α|,|β|≤
u
then for an
n
-by-
n
matrix
Ĉ
=
fl
(
C
1
+
i
C
2
)=
fl
((
A
1
+
i
A
2
)(
B
1
+
i
B
2
))=
Ĉ
1
+
i
Ĉ
2
, the following bounds are satisfied:
Ĉ
1
-
C
1
║≤ 2(
n
+1)
u
A
B
+
O
(
u
2
)
,
Ĉ
2
-
C
2
║≤ 4(
n
+4)
u
A
B
+
O
(
u
2
)
,
where
A
=max(║
A
1
,║
A
2
)
, and
B
=max(║
B
1
,║
B
2
)
.
Thus the corresponding matrix multiplications are stable.