Developer Reference

  • 0.9
  • 09/09/2020
  • Public Content
Contents

p?heevx

Computes selected eigenvalues and, optionally, eigenvectors of a Hermitian matrix.

Syntax

void
pcheevx
(
char
*jobz
,
char
*range
,
char
*uplo
,
MKL_INT
*n
,
MKL_Complex8
*a
,
MKL_INT
*ia
,
MKL_INT
*ja
,
MKL_INT
*desca
,
float
*vl
,
float
*vu
,
MKL_INT
*il
,
MKL_INT
*iu
,
float
*abstol
,
MKL_INT
*m
,
MKL_INT
*nz
,
float
*w
,
float
*orfac
,
MKL_Complex8
*z
,
MKL_INT
*iz
,
MKL_INT
*jz
,
MKL_INT
*descz
,
MKL_Complex8
*work
,
MKL_INT
*lwork
,
float
*rwork
,
MKL_INT
*lrwork
,
MKL_INT
*iwork
,
MKL_INT
*liwork
,
MKL_INT
*ifail
,
MKL_INT
*iclustr
,
float
*gap
,
MKL_INT
*info
);
void
pzheevx
(
char
*jobz
,
char
*range
,
char
*uplo
,
MKL_INT
*n
,
MKL_Complex16
*a
,
MKL_INT
*ia
,
MKL_INT
*ja
,
MKL_INT
*desca
,
double
*vl
,
double
*vu
,
MKL_INT
*il
,
MKL_INT
*iu
,
double
*abstol
,
MKL_INT
*m
,
MKL_INT
*nz
,
double
*w
,
double
*orfac
,
MKL_Complex16
*z
,
MKL_INT
*iz
,
MKL_INT
*jz
,
MKL_INT
*descz
,
MKL_Complex16
*work
,
MKL_INT
*lwork
,
double
*rwork
,
MKL_INT
*lrwork
,
MKL_INT
*iwork
,
MKL_INT
*liwork
,
MKL_INT
*ifail
,
MKL_INT
*iclustr
,
double
*gap
,
MKL_INT
*info
);
Include Files
  • mkl_scalapack.h
Description
The
p?heevx
function
computes selected eigenvalues and, optionally, eigenvectors of a complex Hermitian matrix
A
by calling the recommended sequence of ScaLAPACK
functions
. Eigenvalues and eigenvectors can be selected by specifying either a range of values or a range of indices for the desired eigenvalues.
Optimization Notice
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804
This notice covers the following instruction sets: SSE2, SSE4.2, AVX2, AVX-512.
Input Parameters
np
= the number of rows local to a given process.
nq
= the number of columns local to a given process.
jobz
(global) Must be
'N'
or
'V'
.
Specifies if it is necessary to compute the eigenvectors:
If
jobz
=
'N'
, then only eigenvalues are computed.
If
jobz
=
'V'
, then eigenvalues and eigenvectors are computed.
range
(global) Must be
'A'
,
'V'
, or
'I'
.
If
range
=
'A'
, all eigenvalues will be found.
If
range
=
'V'
, all eigenvalues in the half-open interval
[
vl
,
vu
]
will be found.
If
range
=
'I'
, the eigenvalues with indices
il
through
iu
will be found.
uplo
(global) Must be
'U'
or
'L'
.
Specifies whether the upper or lower triangular part of the Hermitian matrix
A
is stored:
If
uplo
=
'U'
,
a
stores the upper triangular part of
A
.
If
uplo
=
'L'
,
a
stores the lower triangular part of
A
.
n
(global) The number of rows and columns of the matrix
A
(
n
0)
.
a
(local).
Block cyclic array of global size
n
*
n
and local size
lld_a
*
LOC
c
(
ja
+
n
-1)
. On entry, the Hermitian matrix
A
.
If
uplo
=
'U'
, only the upper triangular part of
A
is used to define the elements of the Hermitian matrix.
If
uplo
=
'L'
, only the lower triangular part of
A
is used to define the elements of the Hermitian matrix.
ia
,
ja
(global) The row and column indices in the global matrix
A
indicating the first row and the first column of the submatrix
A
, respectively.
desca
(global and local) array of size
dlen_
. The array descriptor for the distributed matrix
A
. If
desca
[
ctxt_
- 1]
is incorrect,
p?heevx
cannot guarantee correct error reporting.
vl
,
vu
(global)
If
range
=
'V'
, the lower and upper bounds of the interval to be searched for eigenvalues; not referenced if
range
=
'A'
or
'I'
.
il
,
iu
(global)
If
range
=
'I'
, the indices of the smallest and largest eigenvalues to be returned.
Constraints:
il
≥ 1;
min(
il
,
n
) ≤
iu
n
.
Not referenced if
range
=
'A'
or
'V'
.
abstol
(global).
If
jobz
=
'V'
, setting
abstol
to
p?lamch
(
context
,
'U'
) yields the most orthogonal eigenvectors.
The absolute error tolerance for the eigenvalues. An approximate eigenvalue is accepted as converged when it is determined to lie in an interval
[
a
,
b
]
of width less than or equal to
abstol
+
eps
*max(|
a
|,|
b
|)
, where
eps
is the machine precision. If
abstol
is less than or equal to zero, then
eps
*norm(
T
)
will be used in its place, where
norm(
T
)
is the 1-norm of the tridiagonal matrix obtained by reducing
A
to tridiagonal form.
Eigenvalues are computed most accurately when
abstol
is set to twice the underflow threshold
2*
p?lamch
('S')
, not zero. If this
function
returns with
((
mod
(
info
,2)
0).
or
.(
mod
(
info
/8,2)
0))
, indicating that some eigenvalues or eigenvectors did not converge, try setting
abstol
to
2*
p?lamch
('S')
.
mod(
x
,
y
)
is the integer remainder of
x
/
y
.
orfac
(global).
Specifies which eigenvectors should be reorthogonalized. Eigenvectors that correspond to eigenvalues which are within
tol
=
orfac
*norm(
A
) of each other are to be reorthogonalized. However, if the workspace is insufficient (see
lwork
),
tol
may be decreased until all eigenvectors to be reorthogonalized can be stored in one process. No reorthogonalization will be done if
orfac
equals zero. A default value of 1.0e-3 is used if
orfac
is negative.
orfac
should be identical on all processes.
iz
,
jz
(global) The row and column indices in the global matrix
Z
indicating the first row and the first column of the submatrix
Z
, respectively.
descz
(global and local) array of size
dlen_
. The array descriptor for the distributed matrix
Z
.
descz
[
ctxt_
- 1]
must equal
desca
[
ctxt_
- 1]
.
work
(local).
Array of size
lwork
.
lwork
(local) The size of the array
work
.
If only eigenvalues are requested:
lwork
n
+
max
(
nb
*(
np
0 + 1), 3)
If eigenvectors are requested:
lwork
n
+ (
np
0+
mq
0+
nb
)*
nb
with
nq
0 =
numroc
(
nn
,
nb
, 0, 0,
NPCOL
)
.
lwork
5*
n
+
max
(5*
nn
,
np
0*
mq
0+2*
nb
*
nb
) +
iceil
(
neig
,
NPROW
*
NPCOL
)*
nn
For optimal performance, greater workspace is needed, that is
lwork
max
(
lwork
,
nhetrd_lwork
)
where
lwork
is as defined above, and
nhetrd_lwork
=
n
+ 2*(
anb
+1)*(4*
nps
+2) + (
nps
+1)*
nps
ictxt
=
desca
[
ctxt_
- 1]
anb
=
pjlaenv
(
ictxt
, 3, '
pchettrd
',
'L
', 0, 0, 0, 0)
sqnpc
=
sqrt
(
dble
(
NPROW
*
NPCOL
))
nps
=
max
(
numroc
(
n
, 1, 0, 0,
sqnpc
), 2*
anb
)
If
lwork
= -1
, then
lwork
is global input and a workspace query is assumed; the
function
only calculates the size required for optimal performance for all work arrays. Each of these values is returned in the first entry of the corresponding work arrays, and no error message is issued by
pxerbla
.
rwork
(local)
Workspace array of size
lrwork
.
lrwork
(local) The size of the array
work
.
See below for definitions of variables used to define
lwork
.
If no eigenvectors are requested (
jobz
=
'N'
), then
lrwork
5*
nn
+4*
n.
If eigenvectors are requested (
jobz
=
'V'
), then the amount of workspace required to guarantee that all eigenvectors are computed is:
lrwork
4*
n
+
max
(5*
nn
,
np
0*
mq
0+2*
nb
*
nb
) +
iceil
(
neig
,
NPROW
*
NPCOL
)*
nn
The computed eigenvectors may not be orthogonal if the minimal workspace is supplied and
orfac
is too small. If you want to guarantee orthogonality (at the cost of potentially poor performance) you should add the following values to
lrwork
:
(
clustersize
-1)*
n
,
where
clustersize
is the number of eigenvalues in the largest cluster, where a cluster is defined as a set of close eigenvalues:
{
w
[
k
- 1],...,
w
[
k
+
clustersize
-2]|
w
[
j
] ≤
w
[
j
-1]+
orfac
*2*norm(
A
)}.
Variable definitions:
neig
= number of eigenvectors requested;
nb
=
desca
[
mb_
- 1] =
desca
[
nb_
- 1] =
descz
[
mb_
- 1] =
descz
[
nb_
- 1]
;
nn
=
max
(
n
,
NB
, 2)
;
desca
[
rsrc_
- 1] =
desca
[
nb_
- 1] =
descz
[
rsrc_
- 1] =
descz
[
csrc_
- 1] = 0
;
np
0 =
numroc
(
nn
,
nb
, 0, 0,
NPROW
);
mq
0 =
numroc
(
max
(
neig
,
nb
, 2),
nb
, 0, 0,
NPCOL
);
iceil
(
x
,
y
)
is a ScaLAPACK function returning ceiling(
x
/
y
)
When
lrwork
is too small:
If
lwork
is too small to guarantee orthogonality,
p?heevx
attempts to maintain orthogonality in the clusters with the smallest spacing between the eigenvalues. If
lwork
is too small to compute all the eigenvectors requested, no computation is performed and
info
= -23 is returned. Note that when
range
=
'V'
,
p?heevx
does not know how many eigenvectors are requested until the eigenvalues are computed. Therefore, when
range
=
'V'
and as long as
lwork
is large enough to allow
p?heevx
to compute the eigenvalues,
p?heevx
will compute the eigenvalues and as many eigenvectors as it can.
Relationship between workspace, orthogonality and performance:
If
clustersize
n
/
sqrt
(
NPROW
*
NPCOL
)
, then providing enough space to compute all the eigenvectors orthogonally will cause serious degradation in performance. In the limit (that is,
clustersize
=
n
-1)
p?stein
will perform no better than
?stein
on 1 processor.
For
clustersize
=
n
/
sqrt
(
NPROW
*
NPCOL
)
reorthogonalizing all eigenvectors will increase the total execution time by a factor of 2 or more.
For
clustersize
>
n
/
sqrt
(
NPROW
*
NPCOL
)
execution time will grow as the square of the cluster size, all other factors remaining equal and assuming enough workspace. Less workspace means less reorthogonalization but faster execution.
If
lwork
= -1
, then
lwork
is global input and a workspace query is assumed; the
function
only calculates th