p?trord
p?trord
Reorders the Schur factorization of a general matrix.
Syntax
void pstrord
(
char*
compq
,
MKL_INT*
select
,
MKL_INT*
para
,
MKL_INT*
n
,
float*
t
,
MKL_INT*
it
,
MKL_INT*
jt
,
MKL_INT*
desct
,
float*
q
,
MKL_INT*
iq
,
MKL_INT*
jq
,
MKL_INT*
descq
,
float*
wr
,
float*
wi
,
MKL_INT*
m
,
float*
work
,
MKL_INT*
lwork
,
MKL_INT*
iwork
,
MKL_INT*
liwork
,
MKL_INT*
info
);
void pdtrord
(
char*
compq
,
MKL_INT*
select
,
MKL_INT*
para
,
MKL_INT*
n
,
double*
t
,
MKL_INT*
it
,
MKL_INT*
jt
,
MKL_INT*
desct
,
double*
q
,
MKL_INT*
iq
,
MKL_INT*
jq
,
MKL_INT*
descq
,
double*
wr
,
double*
wi
,
MKL_INT*
m
,
double*
work
,
MKL_INT*
lwork
,
MKL_INT*
iwork
,
MKL_INT*
liwork
,
MKL_INT*
info
);
Include Files
- mkl_scalapack.h
Description
p?trord
reorders the real Schur factorization of a real matrix A
= Q
*T
*Q
T
, so that a selected cluster of eigenvalues appears in the leading diagonal blocks of the upper quasi-triangular matrix T
, and the leading columns of Q
form an orthonormal basis of the corresponding right invariant subspace. T
must be in Schur form (as returned by p?lahqr
), that is, block upper triangular with 1-by-1 and 2-by-2 diagonal blocks. This
function
uses a delay and accumulate procedure for performing the off-diagonal updates. Optimization Notice
|
---|
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804
|
This notice covers the following instruction sets: SSE2, SSE4.2, AVX2, AVX-512.
Input Parameters
- compq
- (global)= 'V': update the matrixqof Schur vectors;= 'N': do not updateq.
- select
- (global) array of sizenselectspecifies the eigenvalues in the selected cluster. To select a real eigenvaluew(j),select[must be set to 1. To select a complex conjugate pair of eigenvaluesj-1]w(j) andw(j+1), corresponding to a 2-by-2 diagonal block, eitherselect[orj-1]select[or both must be set to 1; a complex conjugate pair of eigenvalues must be either both included in the cluster or both excluded.j]
- para
- (global)Block parameters:
- para[0]
- maximum number of concurrent computational windows allowed in the algorithm; 0 <para[0]≤min(nprow,npcol) must hold;
- para[1]
- number of eigenvalues in each window; 0 <para[1]<para[2]must hold;
- para[2]
- window size;para[1]<para[2]<mb_tmust hold;
- para[3]
- minimal percentage of FLOPS required for performing matrix-matrix multiplications instead of pipelined orthogonal transformations; 0≤para[3]≤100 must hold;
- para[4]
- width of block column slabs for row-wise application of pipelined orthogonal transformations in their factorized form; 0 <para[4]≤mb_tmust hold.
- para[5]
- the maximum number of eigenvalues moved together over a process border; in practice, this will be approximately half of the cross border window size; 0 <para[5]≤para[1]must hold.
- n
- (global)The order of the globally distributed matrixt.n≥0.
- t
- (local) array of size.lld_t*(LOCcn)The local pieces of the global distributed upper quasi-triangular matrixT, in Schur form.
- it,jt
- (global)The row and column index in the global matrixTindicating the first column ofT.it=jt= 1 must hold (see Application Notes).
- desct
- The array descriptor for the global distributed matrix(global and local) array of sizedlen_.T.
- q
- (local) array of size.lld_q*(LOCcn)On entry, ifcompq= 'V', the local pieces of the global distributed matrixQof Schur vectors.Ifcompq= 'N',qis not referenced.
- iq,jq
- (global)The column index in the global matrixQindicating the first column ofQ.iq=jq= 1 must hold (see Application Notes).
- descq
- (global and local) array of sizedlen_.The array descriptor for the global distributed matrixQ.
- work
- (local workspace) array of sizelwork
- lwork
- (local)The size of the arraywork.Iflwork= -1, then a workspace query is assumed; thefunctiononly calculates the optimal size of theworkarray, returns this value as the first entry of theworkarray, and no error message related tolworkis issued bypxerbla.
- iwork
- (local workspace) array of sizeliwork
- liwork
- (local)The size of the arrayiwork.Ifliwork= -1, then a workspace query is assumed; thefunctiononly calculates the optimal size of theiworkarray, returns this value as the first entry of theiworkarray, and no error message related toliworkis issued bypxerbla
OUTPUT Parameters
- select
- (global) array of sizenThe (partial) reordering is displayed.
- t
- On exit,tis overwritten by the local pieces of the reordered matrixT, again in Schur form, with the selected eigenvalues in the globally leading diagonal blocks.
- q
- On exit, ifcompq= 'V',qhas been postmultiplied by the global orthogonal transformation matrix which reorderst; the leadingmcolumns ofqform an orthonormal basis for the specified invariant subspace.Ifcompq= 'N',qis not referenced.
- wr,wi
- (global ) array of sizenThe real and imaginary parts, respectively, of the reordered eigenvalues of the matrixT. The eigenvalues are in principle stored in the same order as on the diagonal ofT, withwr[=i]and, ifT(i+1,i+1)is a 2-by-2 diagonal block,T(i:i+1,i:i+1)wi[> 0 andi-1]wi[= -i]wi[.i-1]Note also that if a complex eigenvalue is sufficiently ill-conditioned, then its value may differ significantly from its value before reordering.
- m
- (global )The size of the specified invariant subspace.0≤m≤n.
- work[0]
- On exit, ifinfo= 0,work[0]returns the optimallwork.
- iwork[0]
- On exit, ifinfo= 0,iwork[0]returns the optimalliwork.
- info
- (global)= 0: successful exit< 0: ifinfo= -i, thei-th argument had an illegal value. If thei-th argument is an array and thej-th entry, indexedhad an illegal value, thenj-1,info= -(i*1000+j), if thei-th argument is a scalar and had an illegal value, theninfo= -i.> 0: here we have several possibilities
- Reordering oftfailed because some eigenvalues are too close to separate (the problem is very ill-conditioned);tmay have been partially reordered, andwrandwicontain the eigenvalues in the same order as int.On exit,info= {the index oftwhere the swap failed(indexing starts at 1)}.
- A 2-by-2 block to be reordered split into two 1-by-1 blocks and the second block failed to swap with an adjacent block.On exit,info= {the index oftwhere the swap failed}.
- Ifinfo=n+1, there is no valid BLACS context (see the BLACS documentation for details).
Application Notes
The following alignment requirements must hold:
- mb_t=nb_t=mb_q=nb_q
- rsrc_t=rsrc_q
- csrc_t=csrc_q
All matrices must be blocked by a block factor larger than or equal to two (3). This is to simplify reordering across processor borders in the presence of 2-by-2 blocks.
This algorithm cannot work on submatrices of
t
and q
, i.e., it
= jt
= iq
= jq
= 1 must hold. This is however no limitation since p?lahqr
does not compute Schur forms of submatrices anyway.Parallel execution recommendations:
- Use a square grid, if possible, for maximum performance. The block parameters inparashould be kept well below the data distribution block size.
- In general, the parallel algorithm strives to perform as much work as possible without crossing the block borders on the main block diagonal.