p?laqr1

Developer Reference for Intel® oneAPI Math Kernel Library for C

Download PDF

ID 766684

Date 11/07/2023

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

Visible to Intel only — GUID: GUID-92F38ECF-79CA-4D58-8331-E1C38B7775C6

View Details

p?laqr1

Sets a scalar multiple of the first column of the product of a 2-by-2 or 3-by-3 matrix and specified shifts.

Syntax

void pslaqr1(MKL_INT* wantt, MKL_INT* wantz, MKL_INT* n, MKL_INT* ilo, MKL_INT* ihi, float* a, MKL_INT* desca, float* wr, float* wi, MKL_INT* iloz, MKL_INT* ihiz, float* z, MKL_INT* descz, float* work, MKL_INT* lwork, MKL_INT* iwork, MKL_INT* ilwork, MKL_INT* info);

void pdlaqr1(MKL_INT* wantt, MKL_INT* wantz, MKL_INT* n, MKL_INT* ilo, MKL_INT* ihi, double* a, MKL_INT* desca, double* wr, double* wi, MKL_INT* iloz, MKL_INT* ihiz, double* z, MKL_INT* descz, double* work, MKL_INT* lwork, MKL_INT* iwork, MKL_INT* ilwork, MKL_INT* info);

Include Files

mkl_scalapack.h

Description

p?laqr1 is an auxiliary function used to find the Schur decomposition and/or eigenvalues of a matrix already in Hessenberg form from columns ilo to ihi.

This is a modified version of p?lahqr from ScaLAPACK version 1.7.3. The following modifications were made:

Workspace query functionality was added.
Aggressive early deflation is implemented.
Aggressive deflation (looking for two consecutive small subdiagonal elements by PSLACONSB) is abandoned.
The returned Schur form is now in canonical form, i.e., the returned 2-by-2 blocks really correspond to complex conjugate pairs of eigenvalues.
For some reason, the original version of p?lahqr sometimes did not read out the converged eigenvalues correctly. This is now fixed.

Product and Performance Information
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex. Notice revision #20201201

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.

Notice revision #20201201

Input Parameters

wantt

(global )

Non-zero : the full Schur form T is required;

Zero: only eigenvalues are required.

wantz

Non-zero : the matrix of Schur vectors Z is required;

Zero: Schur vectors are not required.

n

(global )

The order of the Hessenberg matrix A (and Z if wantzis non-zero). n≥ 0.

ilo, ihi

(global )

It is assumed that the matrix A is already upper quasi-triangular in rows and columns ihi+1:n, and that A(ilo,ilo-1) = 0 (unless ilo = 1). p?laqr1 works primarily with the Hessenberg submatrix in rows and columns ilo to ihi, but applies transformations to all of H if wantt is non-zero.

1 ≤ilo≤ max(1,ihi); ihi≤n.

a

(global ) array of size lld_a * LOC_c(n)

On entry, the upper Hessenberg matrix A.

desca

(global and local ) array of size dlen_.

The array descriptor for the distributed matrix A.

iloz, ihiz

(global )

Specify the rows of the matrix Z to which transformations must be applied if wantz is non-zero.

1 ≤iloz≤ilo; ihi≤ihiz≤n.

z

(global ) array of size lld_z * LOC_c(n).

If wantz is non-zero, on entry z must contain the current matrix Z of transformations accumulated by p?hseqr

If wantz is zero, z is not referenced.

descz

(global and local ) array of size dlen_.

The array descriptor for the distributed matrix Z.

work

(local output) array of size lwork

lwork

(local )

The size of the work array (lwork>=1).

If lwork=-1, then a workspace query is assumed.

iwork

(global and local ) array of size ilwork

This holds the some of the IBLK integer arrays.

ilwork

(local )

The size of the iwork array (ilwork≥ 3 ).

OUTPUT Parameters

a

If wantt is non-zero, the matrix A is upper quasi-triangular in rows and columns ilo:ihi, with any 2-by-2 or larger diagonal blocks not yet in standard form. If wanttequals zero, the contents of a are unspecified on exit.

wr, wi

(global replicated ) array of size n

The real and imaginary parts, respectively, of the computed eigenvalues ilo to ihi are stored in the corresponding elements of wr and wi. If two eigenvalues are computed as a complex conjugate pair, they are stored in consecutive elements of wr and wi, say the i-th and (i+1)th, with wi[i-1] > 0 and wi[i] < 0. If wantt is non-zero, the eigenvalues are stored in the same order as on the diagonal of the Schur form returned in a. a may be returned with larger diagonal blocks until the next release.

z

On exit z is updated; transformations are applied only to the submatrix Z(iloz:ihiz,ilo:ihi).

If wantzequals zero, z is not referenced.

work[0]

On exit, if info = 0, work[0] returns the optimal lwork.

info

(global )

< 0: parameter number -info incorrect or inconsistent

= 0: successful exit

> 0: p?laqr1 failed to compute all the eigenvalues ilo to ihi in a total of 30*(ihi-ilo+1) iterations; if info = i, elements i:ihi-1 of wr and wi contain those eigenvalues which have been successfully computed.

Application Notes

This algorithm is very similar to p?ahqr. Unlike p?lahqr, instead of sending one double shift through the largest unreduced submatrix, this algorithm sends multiple double shifts and spaces them apart so that there can be parallelism across several processor row/columns. Another critical difference is that this algorithm aggregrates multiple transforms together in order to apply them in a block fashion.

Current Notes and/or Restrictions:

This code requires the distributed block size to be square and at least six (6); unlike simpler codes like LU, this algorithm is extremely sensitive to block size. Unwise choices of too small a block size can lead to bad performance.
This code requires a and z to be distributed identically and have identical contxts.
This release currently does not have a function for resolving the Schur blocks into regular 2x2 form after this code is completed. Because of this, a significant performance impact is required while the deflation is done by sometimes a single column of processors.
This code does not currently block the initial transforms so that none of the rows or columns for any bulge are completed until all are started. To offset pipeline start-up it is recommended that at least 2*LCM(NPROW,NPCOL) bulges are used (if possible)
The maximum number of bulges currently supported is fixed at 32. In future versions this will be limited only by the incoming work array.
The matrix A must be in upper Hessenberg form. If elements below the subdiagonal are nonzero, the resulting transforms may be nonsimilar. This is also true with the LAPACK function.
For this release, it is assumed rsrc_=csrc_=0
Currently, all the eigenvalues are distributed to all the nodes. Future releases will probably distribute the eigenvalues by the column partitioning.
The internals of this function are subject to change.

Parent topic: ScaLAPACK Auxiliary Routines

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Developer Reference for Intel® oneAPI Math Kernel Library for C

p?laqr1

Syntax

Include Files

Description

Input Parameters

OUTPUT Parameters

Application Notes

See Also