Developer Reference

  • 0.9
  • 09/09/2020
  • Public Content
Contents

p?gesvd

Computes the singular value decomposition of a general matrix, optionally computing the left and/or right singular vectors.

Syntax

void
psgesvd
(
char
*jobu
,
char
*jobvt
,
MKL_INT
*m
,
MKL_INT
*n
,
float
*a
,
MKL_INT
*ia
,
MKL_INT
*ja
,
MKL_INT
*desca
,
float
*s
,
float
*u
,
MKL_INT
*iu
,
MKL_INT
*ju
,
MKL_INT
*descu
,
float
*vt
,
MKL_INT
*ivt
,
MKL_INT
*jvt
,
MKL_INT
*descvt
,
float
*work
,
MKL_INT
*lwork
,
float
*rwork
,
MKL_INT
*info
);
void
pdgesvd
(
char
*jobu
,
char
*jobvt
,
MKL_INT
*m
,
MKL_INT
*n
,
double
*a
,
MKL_INT
*ia
,
MKL_INT
*ja
,
MKL_INT
*desca
,
double
*s
,
double
*u
,
MKL_INT
*iu
,
MKL_INT
*ju
,
MKL_INT
*descu
,
double
*vt
,
MKL_INT
*ivt
,
MKL_INT
*jvt
,
MKL_INT
*descvt
,
double
*work
,
MKL_INT
*lwork
,
double
*rwork
,
MKL_INT
*info
);
void
pcgesvd
(
char
*jobu
,
char
*jobvt
,
MKL_INT
*m
,
MKL_INT
*n
,
MKL_Complex8
*a
,
MKL_INT
*ia
,
MKL_INT
*ja
,
MKL_INT
*desca
,
float
*s
,
MKL_Complex8
*u
,
MKL_INT
*iu
,
MKL_INT
*ju
,
MKL_INT
*descu
,
MKL_Complex8
*vt
,
MKL_INT
*ivt
,
MKL_INT
*jvt
,
MKL_INT
*descvt
,
MKL_Complex8
*work
,
MKL_INT
*lwork
,
float
*rwork
,
MKL_INT
*info
);
void
pzgesvd
(
char
*jobu
,
char
*jobvt
,
MKL_INT
*m
,
MKL_INT
*n
,
MKL_Complex16
*a
,
MKL_INT
*ia
,
MKL_INT
*ja
,
MKL_INT
*desca
,
double
*s
,
MKL_Complex16
*u
,
MKL_INT
*iu
,
MKL_INT
*ju
,
MKL_INT
*descu
,
MKL_Complex16
*vt
,
MKL_INT
*ivt
,
MKL_INT
*jvt
,
MKL_INT
*descvt
,
MKL_Complex16
*work
,
MKL_INT
*lwork
,
double
*rwork
,
MKL_INT
*info
);
Include Files
  • mkl_scalapack.h
Description
The
p?gesvd
function
computes the singular value decomposition (SVD) of an
m
-by-
n
matrix
A
, optionally computing the left and/or right singular vectors. The SVD is written
A
=
U
*
Σ
*
V
T
,
where
Σ
is an
m
-by-
n
matrix that is zero except for its min(
m
,
n
) diagonal elements,
U
is an
m
-by-
m
orthogonal matrix, and
V
is an
n
-by-
n
orthogonal matrix. The diagonal elements of
Σ
are the singular values of
A
and the columns of
U
and
V
are the corresponding right and left singular vectors, respectively. The singular values are returned in array
s
in decreasing order and only the first min(
m
,
n
) columns of
U
and rows of
vt
=
V
T
are computed.
Optimization Notice
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804
This notice covers the following instruction sets: SSE2, SSE4.2, AVX2, AVX-512.
The distributed submatrix sub(
A
) must verify certain alignment properties. These expressions must be true:
  • mb_a
    =
    nb_a
    =
    nb
  • iroffa
    =
    icoffa
where:
  • iroffa
    =
    mod
    (
    ia
    -1,
    nb
    )
  • icoffa
    =
    mod
    (
    ja
    -1,
    nb
    )
Input Parameters
mp
= number of local rows in
A
and
U
nq
= number of local columns in
A
and
VT
size
= min(
m
,
n
)
sizeq
= number of local columns in
U
sizep
= number of local rows in
VT
jobu
(global) Specifies options for computing all or part of the matrix
U
.
If
jobu
=
'V'
, the first
size
columns of
U
(the left singular vectors) are returned in the array
u
;
If
jobu
=
'N'
, no columns of
U
(no left singular vectors)are computed.
jobvt
(global)
Specifies options for computing all or part of the matrix
V
T
.
If
jobvt
=
'V'
, the first
size
rows of
V
T
(the right singular vectors) are returned in the array
vt
;
If
jobvt
=
'N'
, no rows of
V
T
(no right singular vectors) are computed.
m
(global) The number of rows of the matrix
A
(
m
0)
.
n
(global) The number of columns in
A
(
n
0)
.
a
(local).
Block cyclic array, global size (
m
,
n
), local size (
mp
,
nq
).
ia
,
ja
(global) The row and column indices in the global matrix
A
indicating the first row and the first column of the submatrix
A
, respectively.
desca
(global and local) array of size
dlen_
. The array descriptor for the distributed matrix
A
.
iu
,
ju
(global) The row and column indices in the global matrix
U
indicating the first row and the first column of the submatrix
U
, respectively.
descu
(global and local) array of size
dlen_
. The array descriptor for the distributed matrix
U
.
ivt
,
jvt
(global) The row and column indices in the global matrix
VT
indicating the first row and the first column of the submatrix
VT
, respectively.
descvt
(global and local) array of size
dlen_
. The array descriptor for the distributed matrix
VT
.
work
(local).
Workspace array of size
lwork
lwork
(local) The size of the array
work
;
lwork
> 2 + 6*
sizeb
+
max
(
watobd
,
wbdtosvd
)
,
where
sizeb
=
max
(
m
,
n
)
, and
watobd
and
wbdtosvd
refer, respectively, to the workspace required to bidiagonalize the matrix
A
and to go from the bidiagonal matrix to the singular value decomposition
U
S
VT
.
For
watobd
, the following holds:
watobd
=
max
(
max
(
wp?lange
,
wp?gebrd
),
max
(
wp?lared2d
,
wp?lared1d
))
,
where
wp?lange
,
wp?lared1d
,
wp?lared2d
,
wp?gebrd
are the workspaces required respectively for the subprograms
p?lange
,
p?lared1d
,
p?lared2d
,
p?gebrd
. Using the standard notation
mp
=
numroc
(
m
,
mb
,
MYROW
,
desca
[
ctxt_
- 1]
,
NPROW
)
,
nq
=
numroc
(
n
,
nb
,
MYCOL
,
desca
[
lld_
- 1]
,
NPCOL
)
,
the workspaces required for the above subprograms are
wp?lange
=
mp
,
wp?lared1d
=
nq
0
,
wp?lared2d
=
mp
0
,
wp?gebrd
=
nb
*(
mp
+
nq
+ 1) +
nq
,
where
nq
0 and
mp
0 refer, respectively, to the values obtained at
MYCOL
= 0
and
MYROW
= 0
. In general, the upper limit for the workspace is given by a workspace required on processor (0,0):
watobd
nb
*(
mp
0 +
nq
0 + 1) +
nq
0
.
In case of a homogeneous process grid this upper limit can be used as an estimate of the minimum workspace for every processor.
For
wbdtosvd
, the following holds:
wbdtosvd
=
size
*(
wantu
*
nru
+
wantvt
*
ncvt
) +
max
(
w?bdsqr
,
max
(
wantu
*
wp?ormbrqln
,
wantvt
*
wp?ormbrprt
))
,
where
wantu
(
wantvt
) = 1, if left/right singular vectors are wanted, and
wantu
(
wantvt
) =
0, otherwise.
w?bdsqr
,
wp?ormbrqln
, and
wp?ormbrprt
refer respectively to the workspace required for the subprograms
?bdsqr
,
p?ormbr
(
qln
), and
p?ormbr
(
prt
), where
qln
and
prt
are the values of the arguments
vect
,
side
, and
trans
in the call to
p?ormbr
.
nru
is equal to the local number of rows of the matrix
U
when distributed 1-dimensional "column" of processes. Analogously,
ncvt
is equal to the local number of columns of the matrix
VT
when distributed across 1-dimensional "row" of processes. Calling the LAPACK procedure
?bdsqr
requires
w?bdsqr
=
max
(1, 2*
size
+ (2*
size
- 4)*
max
(
wantu
,
wantvt
))
on every processor. Finally,
wp?ormbrqln
=
max
((
nb
*(
nb
-1))/2, (
sizeq
+
mp
)*
nb
)+
nb
*
nb
,
wp?ormbrprt
=
max
((
mb
*(
mb
-1))/2, (
sizep
+
nq
)*
mb
)+
mb
*
mb
,
If
lwork
= -1
, then
lwork
is global input and a workspace query is assumed; the
function
only calculates the minimum size for the work array. The required workspace is returned as the first element of
work
and no error message is issued by
pxerbla
.
rwork
Workspace array of size 1 + 4*
sizeb
. Not used for
psgesvd
and
pdgesvd
.
Output Parameters
a
On exit, the contents of
a
are destroyed.
s
(global).
Array of size
size
.
Contains the singular values of
A
sorted so that
s
(i)
s
(i+1)
.
u
(local).
local size
mp
*
sizeq
, global size
m
*
size
)
If
jobu
=
'V'
,
u
contains the first min(
m
,
n
) columns of
U
.
If
jobu
=
'N'
or
'O'
,
u
is not referenced.
vt
(local).
local size (
sizep
,
nq
), global size (
size
,
n
)
If
jobvt
=
'V'
,
vt
contains the first
size
rows of
V
T
if
jobu
=
'N'
,
vt
is not referenced.
work
On exit, if
info
= 0
, then
work
[0]
returns the required minimal size of
lwork
.
rwork
On exit, if
info
= 0
, then
rwork