Developer Reference

  • 0.10
  • 10/21/2020
  • Public Content
Contents

p?trsen

Reorders the Schur factorization of a matrix and (optionally) computes the reciprocal condition numbers and invariant subspace for the selected cluster of eigenvalues.

Syntax

void pstrsen
(
char*
job
,
char*
compq
,
MKL_INT*
select
,
MKL_INT*
para
,
MKL_INT*
n
,
float*
t
,
MKL_INT*
it
,
MKL_INT*
jt
,
MKL_INT*
desct
,
float*
q
,
MKL_INT*
iq
,
MKL_INT*
jq
,
MKL_INT*
descq
,
float*
wr
,
float*
wi
,
MKL_INT*
m
,
float*
s
,
float*
sep
,
float*
work
,
MKL_INT*
lwork
,
MKL_INT*
iwork
,
MKL_INT*
liwork
,
MKL_INT*
info
);
void pdtrsen
(
char*
job
,
char*
compq
,
MKL_INT*
select
,
MKL_INT*
para
,
MKL_INT*
n
,
double*
t
,
MKL_INT*
it
,
MKL_INT*
jt
,
MKL_INT*
desct
,
double*
q
,
MKL_INT*
iq
,
MKL_INT*
jq
,
MKL_INT*
descq
,
double*
wr
,
double*
wi
,
MKL_INT*
m
,
double*
s
,
double*
sep
,
double*
work
,
MKL_INT*
lwork
,
MKL_INT*
iwork
,
MKL_INT*
liwork
,
MKL_INT*
info
);
Include Files
  • mkl_scalapack.h
Description
p?trsen
reorders the real Schur factorization of a real matrix
A
=
Q
*
T
*
Q
T
, so that a selected cluster of eigenvalues appears in the leading diagonal blocks of the upper quasi-triangular matrix
T
, and the leading columns of
Q
form an orthonormal basis of the corresponding right invariant subspace. The reordering is performed by
p?trord
.
Optionally the
function
computes the reciprocal condition numbers of the cluster of eigenvalues and/or the invariant subspace.
T
must be in Schur form (as returned by
p?lahqr
), that is, block upper triangular with 1-by-1 and 2-by-2 diagonal blocks.
Optimization Notice
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804
This notice covers the following instruction sets: SSE2, SSE4.2, AVX2, AVX-512.
Input Parameters
job
(global )
Specifies whether condition numbers are required for the cluster of eigenvalues (
s
) or the invariant subspace (
sep
):
= 'N': no condition numbers are required;
= 'E': only the condition number for the cluster of eigenvalues is computed (
s
);
= 'V': only the condition number for the invariant subspace is computed (
sep
);
= 'B': condition numbers for both the cluster and the invariant subspace are computed (
s
and
sep
).
compq
(global )
= 'V': update the matrix
q
of Schur vectors;
= 'N': do not update
q
.
select
(global ) array of size
n
select
specifies the eigenvalues in the selected cluster. To select a real eigenvalue
w
(
j
),
select
[
j
-1]
must be set to
a non-zero number
. To select a complex conjugate pair of eigenvalues
w
(
j
) and
w
(
j
+1), corresponding to a 2-by-2 diagonal block, either
select
[
j
-1]
or
select
[
j
]
or both must be set to
a non-zero number
; a complex conjugate pair of eigenvalues must be either both included in the cluster or both excluded.
para
(global )
Block parameters:
para
[0]
maximum number of concurrent computational windows allowed in the algorithm; 0 <
para
[0]
min(NPROW,NPCOL) must hold;
para
[1]
number of eigenvalues in each window; 0 <
para
[1]
<
para
[2]
must hold;
para
[2]
window size;
para
[1]
<
para
[2]
<
mb_t
must hold;
para
[3]
minimal percentage of flops required for performing matrix-matrix multiplications instead of pipelined orthogonal transformations; 0
para
[3]
100 must hold;
para
[4]
width of block column slabs for row-wise application of pipelined orthogonal transformations in their factorized form; 0 <
para
[4]
mb_t
must hold.
para
[5]
the maximum number of eigenvalues moved together over a process border; in practice, this will be approximately half of the cross border window size 0 <
para
[5]
para
[1]
must hold;
n
(global )
The order of the globally distributed matrix
t
.
n
0.
t
(local ) array of size
lld_t
*
LOC
c
(
n
)
.
The local pieces of the global distributed upper quasi-triangular matrix
T
, in Schur form.
it
,
jt
(global )
The row and column index in the global matrix
T
indicating the first column of
T
.
it
=
jt
= 1 must hold (see Application Notes).
desct
(global and local) array of size
dlen_
.
The array descriptor for the global distributed matrix
T
.
q
(local ) array of size
lld_q
*
LOC
c
(
n
)
.
On entry, if
compq
= 'V', the local pieces of the global distributed matrix
Q
of Schur vectors.
If
compq
= 'N',
q
is not referenced.
iq
,
jq
(global )
The column index in the global matrix
Q
indicating the first column of
Q
.
iq
=
jq
= 1 must hold (see Application Notes).
descq
(global and local) array of size
dlen_
.
The array descriptor for the global distributed matrix
Q
.
work
(local workspace) array of size
lwork
lwork
(local )
The size of the array
work
.
If
lwork
= -1, then a workspace query is assumed; the
function
only calculates the optimal size of the
work
array, returns this value as the first entry of the
work
array, and no error message related to
lwork
is issued by
pxerbla
.
iwork
(local workspace) array of size
liwork
liwork
(local )
The size of the array
iwork
.
If
liwork
= -1, then a workspace query is assumed; the
function
only calculates the optimal size of the
iwork
array, returns this value as the first entry of the
iwork
array, and no error message related to
liwork
is issued by
pxerbla
.
OUTPUT Parameters
t
t
is overwritten by the local pieces of the reordered matrix
T
, again in Schur form, with the selected eigenvalues in the globally leading diagonal blocks.
q
On exit, if
compq
= 'V',
q
has been postmultiplied by the global orthogonal transformation matrix which reorders
t
; the leading
m
columns of
q
form an orthonormal basis for the specified invariant subspace.
If
compq
= 'N',
q
is not referenced.
wr
,
wi
(global ) array of size
n
The real and imaginary parts, respectively, of the reordered eigenvalues of the matrix
T
. The eigenvalues are in principle stored in the same order as on the diagonal of
T
, with
wr
[
i
]
=
T
(
i
+1,
i
+1)
and, if
T
(
i
:
i
+1,
i
:
i
+1)
is a 2-by-2 diagonal block,
wi
[
i
-1]
> 0 and
wi
[
i
]
= -
wi
[
i
-1]
.
Note also that if a complex eigenvalue is sufficiently ill-conditioned, then its value may differ significantly from its value before reordering.
m
(global )
The size of the specified invariant subspace. 0
m
n
.
s
(global )
If
job
= 'E' or 'B',
s
is a lower bound on the reciprocal condition number for the selected cluster of eigenvalues.
s
cannot underestimate the true reciprocal condition number by more than a factor of sqrt(
n
). If
m
= 0 or
n
,
s
= 1.
If
job
= 'N' or 'V',
s
is not referenced.
sep
(global )
If
job
= 'V' or 'B',
sep
is the estimated reciprocal condition number of the specified invariant subspace. If
m
= 0 or
n
,
sep
= norm(
t
).
If
job
= 'N' or 'E',
sep
is not referenced.
work
[0]
On exit, if
info
= 0,
work
[0]
returns the optimal
lwork
.
iwork
[0]
On exit, if
info
= 0,
iwork
[0]
returns the optimal
liwork
.
info
(global )
= 0: successful exit
< 0: if
info
= -
i
, the
i
-th argument had an illegal value.
If the
i
-th argument is an array and the
j
-th entry
, indexed
j
-1,
had an illegal value, then
info
= -(
i
*1000+
j
), if the
i
-th argument is a scalar and had an illegal value, then
info
= -
i
.
> 0: here we have several possibilities
  • Reordering of
    t
    failed because some eigenvalues are too close to separate (the problem is very ill-conditioned);
    t
    may have been partially reordered, and
    wr
    and
    wi
    contain the eigenvalues in the same order as in
    t
    .
    On exit,
    info
    = {the index of
    t
    where the swap failed
    (indexing starts at 1)
    }.
  • A 2-by-2 block to be reordered split into two 1-by-1 blocks and the second block failed to swap with an adjacent block.
    On exit,
    info
    = {the index of
    t
    where the swap failed}.
  • If
    info
    =
    n
    +1, there is no valid BLACS context (see the BLACS documentation for details).
Application Notes
The following alignment requirements must hold:
  • mb_t
    =
    nb_t
    =
    mb_q
    =
    nb_q
  • rsrc_t
    =
    rsrc_q
  • csrc_t
    =
    csrc_q
All matrices must be blocked by a block factor larger than or equal to two (3). This to simplify reordering across processor borders in the presence of 2-by-2 blocks.
This algorithm cannot work on submatrices of
t
and
q
, i.e.,
it
=
jt
=
iq
=
jq
= 1 must hold. This is however no limitation since
p?lahqr
does not compute Schur forms of submatrices anyway.
For parallel execution, use a square grid, if possible, for maximum performance. The block parameters in
para
should be kept well below the data distribution block size.
In general, the parallel algorithm strives to perform as much work as possible without crossing the block borders on the main block diagonal.

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804