Developer Reference

  • 0.9
  • 09/09/2020
  • Public Content
Contents

p?laqr1

Sets a scalar multiple of the first column of the product of a 2-by-2 or 3-by-3 matrix and specified shifts.

Syntax

void pslaqr1
(
MKL_INT*
wantt
,
MKL_INT*
wantz
,
MKL_INT*
n
,
MKL_INT*
ilo
,
MKL_INT*
ihi
,
float*
a
,
MKL_INT*
desca
,
float*
wr
,
float*
wi
,
MKL_INT*
iloz
,
MKL_INT*
ihiz
,
float*
z
,
MKL_INT*
descz
,
float*
work
,
MKL_INT*
lwork
,
MKL_INT*
iwork
,
MKL_INT*
ilwork
,
MKL_INT*
info
);
void pdlaqr1
(
MKL_INT*
wantt
,
MKL_INT*
wantz
,
MKL_INT*
n
,
MKL_INT*
ilo
,
MKL_INT*
ihi
,
double*
a
,
MKL_INT*
desca
,
double*
wr
,
double*
wi
,
MKL_INT*
iloz
,
MKL_INT*
ihiz
,
double*
z
,
MKL_INT*
descz
,
double*
work
,
MKL_INT*
lwork
,
MKL_INT*
iwork
,
MKL_INT*
ilwork
,
MKL_INT*
info
);
Include Files
  • mkl_scalapack.h
Description
p?laqr1
is an auxiliary
function
used to find the Schur decomposition and/or eigenvalues of a matrix already in Hessenberg form from columns
ilo
to
ihi
.
This is a modified version of
p?lahqr
from ScaLAPACK version 1.7.3. The following modifications were made:
  • Workspace query functionality was added.
  • Aggressive early deflation is implemented.
  • Aggressive deflation (looking for two consecutive small subdiagonal elements by PSLACONSB) is abandoned.
  • The returned Schur form is now in canonical form, i.e., the returned 2-by-2 blocks really correspond to complex conjugate pairs of eigenvalues.
  • For some reason, the original version of
    p?lahqr
    sometimes did not read out the converged eigenvalues correctly. This is now fixed.
Optimization Notice
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804
This notice covers the following instruction sets: SSE2, SSE4.2, AVX2, AVX-512.
Input Parameters
wantt
(global )
Non-zero
: the full Schur form
T
is required;
Zero
: only eigenvalues are required.
wantz
Non-zero
: the matrix of Schur vectors
Z
is required;
Zero
: Schur vectors are not required.
n
(global )
The order of the Hessenberg matrix
A
(and
Z
if
wantz
is non-zero
).
n
0.
ilo
,
ihi
(global )
It is assumed that the matrix
A
is already upper quasi-triangular in rows and columns
ihi
+1:
n
, and that A(
ilo
,
ilo
-1) = 0 (unless
ilo
= 1).
p?laqr1
works primarily with the Hessenberg submatrix in rows and columns
ilo
to
ihi
, but applies transformations to all of
H
if
wantt
is
non-zero
.
1
ilo
max(1,
ihi
);
ihi
n
.
a
(global ) array of size
lld_a
*
LOC
c
(
n
)
On entry, the upper Hessenberg matrix
A
.
desca
(global and local ) array of size
dlen_
.
The array descriptor for the distributed matrix
A
.
iloz
,
ihiz
(global )
Specify the rows of the matrix
Z
to which transformations must be applied if
wantz
is
non-zero
.
1
iloz
ilo
;
ihi
ihiz
n
.
z
(global ) array of size
lld_z
*
LOC
c
(
n
)
.
If
wantz
is
non-zero
, on entry
z
must contain the current matrix
Z
of transformations accumulated by
p?hseqr
If
wantz
is
zero
,
z
is not referenced.
descz
(global and local ) array of size
dlen_
.
The array descriptor for the distributed matrix
Z
.
work
(local output) array of size
lwork
lwork
(local )
The size of the work array (
lwork
>=1).
If
lwork
=-1, then a workspace query is assumed.
iwork
(global and local ) array of size
ilwork
This holds the some of the IBLK integer arrays.
ilwork
(local )
The size of the
iwork
array (
ilwork
3 ).
OUTPUT Parameters
a
If
wantt
is
non-zero
,
the matrix
A
is upper quasi-triangular in rows and columns
ilo
:
ihi
, with any 2-by-2 or larger diagonal blocks not yet in standard form. If
wantt
equals zero
, the contents of
a
are unspecified on exit.
wr
,
wi
(global replicated ) array of size
n
The real and imaginary parts, respectively, of the computed eigenvalues
ilo
to
ihi
are stored in the corresponding elements of
wr
and
wi
. If two eigenvalues are computed as a complex conjugate pair, they are stored in consecutive elements of
wr
and
wi
, say the
i
-th and (
i
+1)th, with
wi
[
i
-1] > 0 and
wi
[
i
] < 0
. If
wantt
is
non-zero
, the eigenvalues are stored in the same order as on the diagonal of the Schur form returned in
a
.
a
may be returned with larger diagonal blocks until the next release.
z
On exit
z
is updated; transformations are applied only to the submatrix
Z
(
iloz
:
ihiz
,
ilo
:
ihi
).
If
wantz
equals zero
,
z
is not referenced.
work
[0]
On exit, if
info
= 0,
work
[0]
returns the optimal
lwork
.
info
(global )
< 0: parameter number -
info
incorrect or inconsistent
= 0: successful exit
> 0:
p?laqr1
failed to compute all the eigenvalues
ilo
to
ihi
in a total of 30*(
ihi
-
ilo
+1) iterations; if
info
=
i
, elements
i
:
ihi
-1
of
wr
and
wi
contain those eigenvalues which have been successfully computed.
Application Notes
This algorithm is very similar to
p?ahqr
. Unlike
p?lahqr
, instead of sending one double shift through the largest unreduced submatrix, this algorithm sends multiple double shifts and spaces them apart so that there can be parallelism across several processor row/columns. Another critical difference is that this algorithm aggregrates multiple transforms together in order to apply them in a block fashion.
Current Notes and/or Restrictions:
  • This code requires the distributed block size to be square and at least six (6); unlike simpler codes like LU, this algorithm is extremely sensitive to block size. Unwise choices of too small a block size can lead to bad performance.
  • This code requires
    a
    and
    z
    to be distributed identically and have identical contxts.
  • This release currently does not have a
    function
    for resolving the Schur blocks into regular 2x2 form after this code is completed. Because of this, a significant performance impact is required while the deflation is done by sometimes a single column of processors.
  • This code does not currently block the initial transforms so that none of the rows or columns for any bulge are completed until all are started. To offset pipeline start-up it is recommended that at least 2*LCM(NPROW,NPCOL) bulges are used (if possible)
  • The maximum number of bulges currently supported is fixed at 32. In future versions this will be limited only by the incoming
    work
    array.
  • The matrix
    A
    must be in upper Hessenberg form. If elements below the subdiagonal are nonzero, the resulting transforms may be nonsimilar. This is also true with the LAPACK
    function
    .
  • For this release, it is assumed
    rsrc_
    =
    csrc_
    =0
  • Currently, all the eigenvalues are distributed to all the nodes. Future releases will probably distribute the eigenvalues by the column partitioning.
  • The internals of this
    function
    are subject to change.

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804