Sets a scalar multiple of the first column of the product of a 2-by-2 or 3-by-3 matrix and specified shifts.


call pslaqr1 ( wantt , wantz , n , ilo , ihi , a , desca , wr , wi , iloz , ihiz , z , descz , work , lwork , iwork , ilwork , info )

call pdlaqr1 ( wantt , wantz , n , ilo , ihi , a , desca , wr , wi , iloz , ihiz , z , descz , work , lwork , iwork , ilwork , info )


p?laqr1 is an auxiliary routine used to find the Schur decomposition and/or eigenvalues of a matrix already in Hessenberg form from columns ilo to ihi.

This is a modified version of p?lahqr from ScaLAPACK version 1.7.3. The following modifications were made:

  • Workspace query functionality was added.

  • Aggressive early deflation is implemented.

  • Aggressive deflation (looking for two consecutive small subdiagonal elements by PSLACONSB) is abandoned.

  • The returned Schur form is now in canonical form, i.e., the returned 2-by-2 blocks really correspond to complex conjugate pairs of eigenvalues.

  • For some reason, the original version of p?lahqr sometimes did not read out the converged eigenvalues correctly. This is now fixed.

Optimization Notice

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804

Input Parameters


(global ) LOGICAL

= .TRUE. : the full Schur form T is required;

= .FALSE.: only eigenvalues are required.


(global ) LOGICAL

= .TRUE. : the matrix of Schur vectors Z is required;

= .FALSE.: Schur vectors are not required.


(global ) LOGICAL

The order of the Hessenberg matrix A (and Z if wantz). n 0.

ilo, ihi

(global ) INTEGER

It is assumed that the matrix A is already upper quasi-triangular in rows and columns ihi+1:n, and that A(ilo,ilo-1) = 0 (unless ilo = 1). p?laqr1 works primarily with the Hessenberg submatrix in rows and columns ilo to ihi, but applies transformations to all of H if wantt is .TRUE..

1 ilo max(1,ihi); ihi n.


REAL for pslaqr1


(global ) array of size (lld_a,LOCc(n))

On entry, the upper Hessenberg matrix A.


(global and local ) INTEGER array of size dlen_.

The array descriptor for the distributed matrix A.

iloz, ihiz

(global ) INTEGER

Specify the rows of the matrix Z to which transformations must be applied if wantz is .TRUE..

1 iloz ilo; ihi ihiz n.


REAL for pslaqr1


(global ) array of size (lld_z,LOCc(n)).

If wantz is .TRUE., on entry z must contain the current matrix Z of transformations accumulated by p?hseqr

If wantz is .FALSE., z is not referenced.


(global and local ) INTEGER array of size dlen_.

The array descriptor for the distributed matrix Z.


REAL for pslaqr1


(local output) array of size lwork


(local ) INTEGER

The size of the work array (lwork>=1).

If lwork=-1, then a workspace query is assumed.


(global and local ) INTEGER array of size ilwork

This holds the some of the IBLK integer arrays.


(local ) INTEGER

The size of the iwork array (ilwork 3 ).

OUTPUT Parameters


If wantt is .TRUE., the matrix A is upper quasi-triangular in rows and columns ilo:ihi, with any 2-by-2 or larger diagonal blocks not yet in standard form. If wantt is .FALSE., the contents of a are unspecified on exit.

wr, wi

REAL for pslaqr1


(global replicated ) array of size n

The real and imaginary parts, respectively, of the computed eigenvalues ilo to ihi are stored in the corresponding elements of wr and wi. If two eigenvalues are computed as a complex conjugate pair, they are stored in consecutive elements of wr and wi, say the i-th and (i+1)th, with wi(i) > 0 and wi(i+1) < 0. If wantt is .TRUE., the eigenvalues are stored in the same order as on the diagonal of the Schur form returned in a. a may be returned with larger diagonal blocks until the next release.


On exit z is updated; transformations are applied only to the submatrix Z(iloz:ihiz,ilo:ihi).

If wantz is .FALSE., z is not referenced.


On exit, if info = 0, work(1) returns the optimal lwork.


(global ) INTEGER

< 0: parameter number -info incorrect or inconsistent

= 0: successful exit

> 0: p?laqr1 failed to compute all the eigenvalues ilo to ihi in a total of 30*(ihi-ilo+1) iterations; if info = i, elements i+1:ihi of wr and wi contain those eigenvalues which have been successfully computed.

Application Notes

This algorithm is very similar to p?ahqr. Unlike p?lahqr, instead of sending one double shift through the largest unreduced submatrix, this algorithm sends multiple double shifts and spaces them apart so that there can be parallelism across several processor row/columns. Another critical difference is that this algorithm aggregrates multiple transforms together in order to apply them in a block fashion.

Current Notes and/or Restrictions:

  • This code requires the distributed block size to be square and at least six (6); unlike simpler codes like LU, this algorithm is extremely sensitive to block size. Unwise choices of too small a block size can lead to bad performance.

  • This code requires a and z to be distributed identically and have identical contxts.

  • This release currently does not have a routine for resolving the Schur blocks into regular 2x2 form after this code is completed. Because of this, a significant performance impact is required while the deflation is done by sometimes a single column of processors.

  • This code does not currently block the initial transforms so that none of the rows or columns for any bulge are completed until all are started. To offset pipeline start-up it is recommended that at least 2*LCM(NPROW,NPCOL) bulges are used (if possible)

  • The maximum number of bulges currently supported is fixed at 32. In future versions this will be limited only by the incoming work array.

  • The matrix A must be in upper Hessenberg form. If elements below the subdiagonal are nonzero, the resulting transforms may be nonsimilar. This is also true with the LAPACK routine.

  • For this release, it is assumed rsrc_=csrc_=0

  • Currently, all the eigenvalues are distributed to all the nodes. Future releases will probably distribute the eigenvalues by the column partitioning.

  • The internals of this routine are subject to change.

See Also

Einzelheiten zur Compiler-Optimierung finden Sie in unserem Optimierungshinweis.