p?pbtrsv
p?pbtrsv
Solves a single triangular linear system via frontsolve or backsolve where the triangular matrix is a factor of a banded matrix computed by
p?pbtrf
.Syntax
void
pspbtrsv
(
char
*uplo
,
char
*trans
,
MKL_INT
*n
,
MKL_INT
*bw
,
MKL_INT
*nrhs
,
float
*a
,
MKL_INT
*ja
,
MKL_INT
*desca
,
float
*b
,
MKL_INT
*ib
,
MKL_INT
*descb
,
float
*af
,
MKL_INT
*laf
,
float
*work
,
MKL_INT
*lwork
,
MKL_INT
*info
);
void
pdpbtrsv
(
char
*uplo
,
char
*trans
,
MKL_INT
*n
,
MKL_INT
*bw
,
MKL_INT
*nrhs
,
double
*a
,
MKL_INT
*ja
,
MKL_INT
*desca
,
double
*b
,
MKL_INT
*ib
,
MKL_INT
*descb
,
double
*af
,
MKL_INT
*laf
,
double
*work
,
MKL_INT
*lwork
,
MKL_INT
*info
);
void
pcpbtrsv
(
char
*uplo
,
char
*trans
,
MKL_INT
*n
,
MKL_INT
*bw
,
MKL_INT
*nrhs
,
MKL_Complex8
*a
,
MKL_INT
*ja
,
MKL_INT
*desca
,
MKL_Complex8
*b
,
MKL_INT
*ib
,
MKL_INT
*descb
,
MKL_Complex8
*af
,
MKL_INT
*laf
,
MKL_Complex8
*work
,
MKL_INT
*lwork
,
MKL_INT
*info
);
void
pzpbtrsv
(
char
*uplo
,
char
*trans
,
MKL_INT
*n
,
MKL_INT
*bw
,
MKL_INT
*nrhs
,
MKL_Complex16
*a
,
MKL_INT
*ja
,
MKL_INT
*desca
,
MKL_Complex16
*b
,
MKL_INT
*ib
,
MKL_INT
*descb
,
MKL_Complex16
*af
,
MKL_INT
*laf
,
MKL_Complex16
*work
,
MKL_INT
*lwork
,
MKL_INT
*info
);
Include Files
- mkl_scalapack.h
Description
The
p?pbtrsv
function
solves a banded triangular system of linear equations A
(1:
n
, ja
:ja
+n
-1)*X
= B
(
jb
:jb+n
-1, 1:nrhs
) or
A
(1:*
n
, ja
:ja
+n
-1)T
X
= B
(
for real flavors,jb
:jb+n
-1, 1:nrhs
)A
(1:*
n
, ja
:ja
+n
-1)H
X
= B
(
for complex flavors,jb
:jb+n
-1, 1:nrhs
)where
A
(1:
is a banded triangular matrix factor produced by the Cholesky factorization code n
, ja
:ja
+n
-1)p?pbtrf
and is stored in A
(1:
and n
, ja
:ja
+n
-1)af
. The matrix stored in A
(1:
is either upper or lower triangular according to n
, ja
:ja
+n
-1)uplo
. The function
p?pbtrf
must be called first.Optimization Notice
|
---|
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804
|
This notice covers the following instruction sets: SSE2, SSE4.2, AVX2, AVX-512.
Input Parameters
- uplo
- (global) Must be'U'or'L'.If, upper triangle ofuplo='U'A(1:is stored;n,ja:ja+n-1)If, lower triangle ofuplo='L'A(1:is stored.n,ja:ja+n-1)
- trans
- (global) Must be'N'or'T'or'C'.If, solve withtrans='N'A(1:;n,ja:ja+n-1)Ifortrans='T''C'for real flavors, solve withA(1:.n,ja:ja+n-1)TIffor complex flavors, solve with conjugate transpose (trans='C'A(1:)n,ja:ja+n-1.H
- n
- (global)The number of rows and columns to be operated on, that is, the order of the distributed submatrixA(1:.n,ja:ja+n-1).n≥0
- bw
- (global)The number of subdiagonals in'L'or'U', 0≤bw≤n-1.
- nrhs
- (global)The number of right hand sides; the number of columns of the distributed submatrixB(;jb:jb+n-1, 1:nrhs).nrhs≥0
- a
- (local)Pointer into the local memory to an array with the first size, stored inlld_a≥(bw+1)desca.On entry, this array contains the local pieces of then-by-nsymmetric banded distributed Cholesky factorLorL*TA(1:.n,ja:ja+n-1)This local portion is stored in the packed banded format used in LAPACK. See theScaLAPACK manual for more detail on the format of distributed matrices.Application Notesbelow and the
- ja
- (global) The index in the global in the global matrixAthat points to the start of the matrix to be operated on (which may be either all ofAor a submatrix ofA).
- desca
- (global and local) array of sizedlen_. The array descriptor for the distributed matrixA.If1D type (, thendtype_a= 501);dlen≥7If2D type (, thendtype_a= 1).dlen≥9Contains information on mapping ofAto memory. (See ScaLAPACK manual for full description and options.)
- b
- (local)Pointer into the local memory to an array of local lead size.lld_b≥nbOn entry, this array contains the local pieces of the right hand sidesB(.jb:jb+n-1, 1:nrhs)
- ib
- (global) The row index in the global matrixBthat points to the first row of the matrix to be operated on (which may be either all ofBor a submatrix ofB).
- descb
- (global and local) array of sizedlen_. The array descriptor for the distributed matrixB.If1D type (, thendtype_b= 502);dlen≥7If2D type (, thendtype_b= 1).dlen≥9Contains information on mapping ofBto memory. Please, see ScaLAPACK manual for full description and options.
- laf
- (local)The size of user-input auxiliary fill-in spaceaf. Must be. Iflaf≥(nb+2*bw)*bwlafis not large enough, an error code will be returned and the minimum acceptable size will be returned inaf[0].
- work
- (local)The arrayworkis a temporary workspace array of sizelwork. This space may be overwritten in betweenfunction calls.
- lwork
- (local or global) The size of the user-input workspacework, must be at least. If*lwork≥bwnrhslworkis too small, the minimal acceptable size will be returned inand an error code is returned.work[0]
Output Parameters
- af
- (local)
- b
- On exit, this array contains the local piece of the solutions distributed matrixX.
- work[0]
- On exit,contains the minimum value ofwork[0]lwork.
- info
- (local)= 0: successful exit< 0: if thei-th argument is an array and thej-th entry, indexedhad an illegal value,j-1,theninfo= - (i*100 +j),if thei-th argument is a scalar and had an illegal value,theninfo= -i.
Application Notes
If the factorization
function
and the solve function
are to be called separately to solve various sets of right-hand sides using the same coefficient matrix, the auxiliary space af
must not be altered between calls to the factorization function
and the solve function
. The best algorithm for solving banded and tridiagonal linear systems depends on a variety of parameters, especially the bandwidth. Currently, only algorithms designed for the case are implemented. These algorithms go by many names, including Divide and Conquer, Partitioning, domain decomposition-type, etc.
N
/P
>
>
bw
Algorithm description: Divide and Conquer. *
The Divide and Conquer algorithm assumes the matrix is narrowly banded compared with the number of equations. In this situation, it is best to distribute the input matrix
A
one-dimensionally, with columns atomic and rows divided amongst the processes. The basic algorithm divides the banded matrix up into P
pieces with one stored on each processor, and then proceeds in 2 phases for the factorization or 3 for the solution of a linear system. - Local Phase: The individual pieces are factored independently and in parallel. These factors are applied to the matrix creating fill-in, which is stored in a non-inspectable way in auxiliary spaceaf. Mathematically, this is equivalent to reordering the matrixAasPAPand then factoring the principal leading submatrix of size equal to the sum of the sizes of the matrices factored on each processor. The factors of these submatrices overwrite the corresponding parts ofTAin memory.
- Reduced System Phase: A small((bw*P-1)) system is formed representing interaction of the larger blocks and is stored (as are its factors) in the spaceaf. A parallel Block Cyclic Reduction algorithm is used. For a linear system, a parallel front solve followed by an analogous backsolve, both using the structure of the factored matrix, are performed.
- Back Subsitution Phase: For a linear system, a local backsubstitution is performed on each processor in parallel.