Developer Reference

  • 0.9
  • 09/09/2020
  • Public Content
Contents

p?gbtrf

Computes the
LU
factorization of a general n-by-n banded distributed matrix.

Syntax

void
psgbtrf
(
MKL_INT
*n
,
MKL_INT
*bwl
,
MKL_INT
*bwu
,
float
*a
,
MKL_INT
*ja
,
MKL_INT
*desca
,
MKL_INT
*ipiv
,
float
*af
,
MKL_INT
*laf
,
float
*work
,
MKL_INT
*lwork
,
MKL_INT
*info
);
void
pdgbtrf
(
MKL_INT
*n
,
MKL_INT
*bwl
,
MKL_INT
*bwu
,
double
*a
,
MKL_INT
*ja
,
MKL_INT
*desca
,
MKL_INT
*ipiv
,
double
*af
,
MKL_INT
*laf
,
double
*work
,
MKL_INT
*lwork
,
MKL_INT
*info
);
void
pcgbtrf
(
MKL_INT
*n
,
MKL_INT
*bwl
,
MKL_INT
*bwu
,
MKL_Complex8
*a
,
MKL_INT
*ja
,
MKL_INT
*desca
,
MKL_INT
*ipiv
,
MKL_Complex8
*af
,
MKL_INT
*laf
,
MKL_Complex8
*work
,
MKL_INT
*lwork
,
MKL_INT
*info
);
void
pzgbtrf
(
MKL_INT
*n
,
MKL_INT
*bwl
,
MKL_INT
*bwu
,
MKL_Complex16
*a
,
MKL_INT
*ja
,
MKL_INT
*desca
,
MKL_INT
*ipiv
,
MKL_Complex16
*af
,
MKL_INT
*laf
,
MKL_Complex16
*work
,
MKL_INT
*lwork
,
MKL_INT
*info
);
Include Files
  • mkl_scalapack.h
Description
The
p?gbtrf
function
computes the
LU
factorization of a general
n
-by-
n
real/complex banded distributed matrix
A
(1:
n
,
ja
:
ja
+
n
-1) using partial pivoting with row interchanges.
The resulting factorization is not the same factorization as returned from the LAPACK
function
?gbtrf
. Additional permutations are performed on the matrix for the sake of parallelism.
The factorization has the form
A
(1:
n
,
ja
:
ja
+
n
-1) =
P
*
L
*
U
*
Q
where
P
and
Q
are permutation matrices, and
L
and
U
are banded lower and upper triangular matrices, respectively. The matrix
Q
represents reordering of columns for the sake of parallelism, while
P
represents reordering of rows for numerical stability using classic partial pivoting.
Optimization Notice
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804
This notice covers the following instruction sets: SSE2, SSE4.2, AVX2, AVX-512.
Input Parameters
n
(global) The number of rows and columns in the distributed submatrix
A
(1:
n
,
ja
:
ja
+
n
-1);
n
0
.
bwl
(global) The number of sub-diagonals within the band of
A
( 0 ≤
bwl
n-1
)
.
bwu
(global) The number of super-diagonals within the band of
A
( 0 ≤
bwu
n-1
)
.
a
(local)
Pointer into the local memory to an array of local size
lld_a
*
LOCc
(
ja
+
n
-1)
where
lld_a
2*
bwl
+ 2*
bwu +1
.
Contains the local pieces of the
n
-by-
n
distributed banded matrix
A
(1:
n
,
ja
:
ja
+
n
-1) to be factored.
ja
(global) The index in the global matrix
A
indicating the start of the matrix to be operated on (which may be either all of
A
or a submatrix of
A
).
desca
(global and local) array of size
dlen_
. The array descriptor for the distributed matrix
A
.
If
dtype_a
= 501
, then
dlen_
7
;
else if
dtype_a
= 1
, then
dlen_
9
.
laf
(local) The size of the array
af
.
Must be
laf
(
nb_a
+
bwu
)*(
bwl
+
bwu
)+6*(
bwl
+
bwu
)*(
bwl
+2
*bwu
)
.
If
laf
is not large enough, an error code will be returned and the minimum acceptable size will be returned in
af
[0]
.
work
(local) Same type as
a
. Workspace array of size
lwork
.
lwork
(local or global) The size of the
work
array
(
lwork
1)
. If
lwork
is too small, the minimal acceptable size will be returned in
work
[0]
and an error code is returned.
Output Parameters
a
On exit, this array contains details of the factorization. Note that additional permutations are performed on the matrix, so that the factors returned are different from those returned by
LAPACK
.
ipiv
(local) array.
The size of
ipiv
must be
nb_a
.
Contains pivot indices for local factorizations. Note that you
should not alter
the contents of this array between factorization and solve.
af
(local)
Array of size
laf
.
Auxiliary fill-in space. The fill-in space is created in a call to the factorization
function
p?gbtrf
and is stored in
af
.
Note that if a linear system is to be solved using
p?gbtrs
after the factorization
function
,
af
must not be altered after the factorization.
work
[0]
On exit,
work
[0]
contains the minimum value of
lwork
required.
info
(global)
If
info
=0
, the execution is successful.
info
< 0
:
If the
i
-th argument is an array and the
j-
th entry
, indexed
j
- 1,
had an illegal value, then
info
= -(
i
*100+
j
); if the
i-
th argument is a scalar and had an illegal value, then
info
=
-i
.
info
>
0
:
If
info
=
k
NPROCS
, the submatrix stored on processor
info
and factored locally was not nonsingular, and the factorization was not completed.
If
info
=
k
>
NPROCS
, the submatrix stored on processor
info
-
NPROCS
representing interactions with other processors was not nonsingular, and the factorization was not completed.

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804