cblas_?gemm3m_batch
cblas_?gemm3m_batch
Computes scalar-matrix-matrix products and adds the results to scalar matrix products for groups of general matrices.
Syntax
void
cblas_cgemm3m_batch
(
const
CBLAS_LAYOUT
Layout
,
const
CBLAS_TRANSPOSE*
transa_array
,
const
CBLAS_TRANSPOSE*
transb_array
,
const
MKL_INT*
m_array
,
const
MKL_INT*
n_array
,
const
MKL_INT*
k_array
,
const
void
*alpha_array
,
const
void
**a_array
,
const
MKL_INT*
lda_array
,
const
void
**b_array
,
const
MKL_INT*
ldb_array
,
const
void
*beta_array
,
void
**c_array
,
const
MKL_INT*
ldc_array
,
const MKL_INT
group_count
,
const MKL_INT*
group_size
);
void
cblas_zgemm3m_batch
(
const
CBLAS_LAYOUT
Layout
,
const
CBLAS_TRANSPOSE*
transa_array
,
const
CBLAS_TRANSPOSE*
transb_array
,
const
MKL_INT*
m_array
,
const
MKL_INT*
n_array
,
const
MKL_INT*
k_array
,
const
void
*alpha_array
,
const
void
**a_array
,
const
MKL_INT*
lda_array
,
const
void
**b_array
,
const
MKL_INT*
ldb_array
,
const
void
*beta_array
,
void
**c_array
,
const
MKL_INT*
ldc_array
,
const MKL_INT
group_count
,
const MKL_INT*
group_size
);
Include Files
- mkl.h
Description
The
?gemm3m_batch
routines perform a series of matrix-matrix operations with general matrices. They are similar to the ?gemm3m
routine counterparts, but the ?gemm3m_batch
routines perform matrix-matrix operations with groups of matrices, processing a number of groups at once. The groups contain matrices with the same parameters. The ?gemm3m_batch
routines use fewer matrix multiplications than the ?gemm_batch
routines, as described in the Application Notes
.The operation is defined as
idx = 0 for i = 0..group_count - 1 alpha and beta in alpha_array[i] and beta_array[i] for j = 0..group_size[i] - 1 A, B, and C matrix in a_array[idx], b_array[idx], and c_array[idx] C := alpha*op(A)*op(B) + beta*C, idx = idx + 1 end for end for
where:
op(
is one of X
)op(
, or X
) = X
op(
, or X
) = X
T
op(
,X
) = X
H
alpha
and beta
are scalar elements of alpha_array
and beta_array
,A
, B
and C
are matrices such that for m
, n
, and k
which are elements of m_array
, n_array
, and k_array
:op(
is an A
)m
-by-k
matrix,op(
is a B
)k
-by-n
matrix,C
is an m
-by-n
matrix.A
, B
, and C
represent matrices stored at addresses pointed to by a_array
, b_array
, and c_array
, respectively. The number of entries in a_array
, b_array
, and c_array
is total_batch_count
= the sum of all the group_size
entries.See also gemm for a detailed description of multiplication for general matrices and gemm_batch,
BLAS-like extension routines for similar matrix-matrix operations.
Error checking is not performed for Windows* single dynamic libraries for the
Intel® oneAPI Math Kernel Library
?gemm3m_batch
routines.Input Parameters
- Layout
- Specifies whether two-dimensional array storage is row-major (CblasRowMajor) or column-major (CblasColMajor).
- transa_array
- Array of sizegroup_count. For the groupi,transa=itransa_array[specifies the form ofi]op(used in the matrix multiplication:A)if, thentransa=iCblasNoTransop(;A) =Aif, thentransa=iCblasTransop(;A) =ATif, thentransa=iCblasConjTransop(.A) =AH
- transb_array
- Array of sizegroup_count. For the groupi,transb=itransb_array[specifies the form ofi]op(used in the matrix multiplication:B)iif, thentransb=iCblasNoTransop(;B) =Bif, thentransb=iCblasTransop(;B) =BTif, thentransb=iCblasConjTransop(.B) =BH
- m_array
- Array of sizegroup_count. For the groupi,m=im_array[specifies the number of rows of the matrixi]op(and of the matrixA)C.The value of each element ofm_arraymust be at least zero.
- n_array
- Array of sizegroup_count. For the groupi,n=in_array[specifies the number of columns of the matrixi]op(and the number of columns of the matrixB)C.The value of each element ofn_arraymust be at least zero.
- k_array
- Array of sizegroup_count. For the groupi,k=ik_array[specifies the number of columns of the matrixi]op(and the number of rows of the matrixA)op(.B)The value of each element ofk_arraymust be at least zero.
- alpha_array
- Array of sizegroup_count. For the groupi,alpha_array[specifies the scalari]alpha.i
- a_array
- Array, sizetotal_batch_count, of pointers to arrays used to storeAmatrices.
- lda_array
- Array of sizegroup_count. For the groupi,lda=ispecifies the leading dimension of the array storing matrixlda_array[i]Aas declared in the calling (sub)program.transa=iCblasNoTranstransa=iCblasTransortransa=iCblasConjTransLayout=CblasColMajorldamust be at leastimax(1,.m)ildamust be at leastimax(1,k)iLayout=CblasRowMajorldamust be at leastimax(1,k)ildamust be at leastimax(1,.m)i
- b_array
- Array, sizetotal_batch_count, of pointers to arrays used to storeBmatrices.
- ldb_array
- Array of sizegroup_count. For the groupi,ldb=ildb_array[specifies the leading dimension of the array storing matrixi]Bas declared in the calling (sub)program.transb=iCblasNoTranstransb=iCblasTransortransb=iCblasConjTransLayout=CblasColMajorldbmust be at leastimax(1,.k)ildbmust be at leastimax(1,.n)iLayout=CblasRowMajorldbmust be at leastimax(1,.n)ildbmust be at leastimax(1,.k)i
- beta_array
- For the groupi,beta_array[specifies the scalari]beta.iWhenbetais equal to zero, theniCmatrices in groupineed not be set on input.
- c_array
- Array, sizetotal_batch_count, of pointers to arrays used to storeCmatrices.
- ldc_array
- Array of sizegroup_count. For the groupi,ldc=ildc_array[specifies the leading dimension of all arrays storing matrixi]Cin groupias declared in the calling (sub)program.WhenLayout=CblasColMajorldcmust be at leastimax(1,.m)iWhenLayout=CblasRowMajorldcmust be at leastimax(1,.n)i
- group_count
- Specifies the number of groups. Must be at least 0.
- group_size
- Array of sizegroup_count. The elementgroup_size[specifies the number of matrices in groupi]i. Each element ingroup_sizemust be at least 0.
Output Parameters
- c_array
- Overwritten by them-by-inmatrixi(for groupalpha*op(iA)*op(B) +beta*iC)i.
Application Notes
These routines perform a complex matrix multiplication by forming the real and imaginary parts of the input matrices. This uses three real matrix multiplications and five real matrix additions instead of the conventional four real matrix multiplications and two real matrix additions. The use of three real matrix multiplications reduces the time spent in matrix operations by 25%, resulting in significant savings in compute time for large matrices.
If the errors in the floating point calculations satisfy the following conditions:
fl
(x
op y
)=(x
op y
)(1+δ),|δ|≤u
, op=×,/, fl
(x
±y
)=x
(1+α)±y
(1+β), |α|,|β|≤u
then for an , the following bounds are satisfied:
n
-by-n
matrix Ĉ
=fl
(C
1
+i
C
2
)= fl
((A
1
+i
A
2
)(B
1
+i
B
2
))=Ĉ
1
+i
Ĉ
2
║
,Ĉ
1
-C
1
║≤
2(n
+1)u
║A
║∞
║B
║∞
+O
(u
2
)║
,Ĉ
2
-C
2
║≤
4(n
+4)u
║A
║∞
║B
║∞
+O
(u
2
)where
║
, and A
║∞
=max(║A
1
║∞
,║A
2
║∞
)║
.B
║∞
=max(║B
1
║∞
,║B
2
║∞
)Thus the corresponding matrix multiplications are stable.