Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804
- (global) Must be'U'or'L'.If, upper triangle ofuplo='U'A(1:is stored;n,ja:ja+n-1)If, lower triangle ofuplo='L'A(1:is stored.n,ja:ja+n-1)
- (global) Must be'N'or'T'or'C'.If, solve withtrans='N'A(1:;n,ja:ja+n-1)Ifortrans='T''C'for real flavors, solve withA(1:.n,ja:ja+n-1)TIffor complex flavors, solve with conjugate transpose (trans='C'A(1:)n,ja:ja+n-1.H
- (global)The number of rows and columns to be operated on, that is, the order of the distributed submatrixA(1:.n,ja:ja+n-1).n≥0
- (global)The number of subdiagonals in'L'or'U', 0≤bw≤n-1.
- (global)The number of right hand sides; the number of columns of the distributed submatrixB(;jb:jb+n-1, 1:nrhs).nrhs≥0
- (local)Pointer into the local memory to an array with the first size, stored inlld_a≥(bw+1)desca.On entry, this array contains the local pieces of then-by-nsymmetric banded distributed Cholesky factorLorL*TA(1:.n,ja:ja+n-1)This local portion is stored in the packed banded format used in LAPACK. See theScaLAPACK manual for more detail on the format of distributed matrices.Application Notesbelow and the
- (global) The index in the global in the global matrixAthat points to the start of the matrix to be operated on (which may be either all ofAor a submatrix ofA).
- (global and local) array of sizedlen_. The array descriptor for the distributed matrixA.If1D type (, thendtype_a= 501);dlen≥7If2D type (, thendtype_a= 1).dlen≥9Contains information on mapping ofAto memory. (See ScaLAPACK manual for full description and options.)
- (local)Pointer into the local memory to an array of local lead size.lld_b≥nbOn entry, this array contains the local pieces of the right hand sidesB(.jb:jb+n-1, 1:nrhs)
- (global) The row index in the global matrixBthat points to the first row of the matrix to be operated on (which may be either all ofBor a submatrix ofB).
- (global and local) array of sizedlen_. The array descriptor for the distributed matrixB.If1D type (, thendtype_b= 502);dlen≥7If2D type (, thendtype_b= 1).dlen≥9Contains information on mapping ofBto memory. Please, see ScaLAPACK manual for full description and options.
- (local)The size of user-input auxiliary fill-in spaceaf. Must be. Iflaf≥(nb+2*bw)*bwlafis not large enough, an error code will be returned and the minimum acceptable size will be returned inaf.
- (local)The arrayworkis a temporary workspace array of sizelwork. This space may be overwritten in betweenfunction calls.
- (local or global) The size of the user-input workspacework, must be at least. If*lwork≥bwnrhslworkis too small, the minimal acceptable size will be returned inand an error code is returned.work
- On exit, this array contains the local piece of the solutions distributed matrixX.
- On exit,contains the minimum value ofworklwork.
- (local)= 0: successful exit< 0: if thei-th argument is an array and thej-th entry, indexedhad an illegal value,j-1,theninfo= - (i*100 +j),if thei-th argument is a scalar and had an illegal value,theninfo= -i.
- Local Phase: The individual pieces are factored independently and in parallel. These factors are applied to the matrix creating fill-in, which is stored in a non-inspectable way in auxiliary spaceaf. Mathematically, this is equivalent to reordering the matrixAasPAPand then factoring the principal leading submatrix of size equal to the sum of the sizes of the matrices factored on each processor. The factors of these submatrices overwrite the corresponding parts ofTAin memory.
- Reduced System Phase: A small((bw*P-1)) system is formed representing interaction of the larger blocks and is stored (as are its factors) in the spaceaf. A parallel Block Cyclic Reduction algorithm is used. For a linear system, a parallel front solve followed by an analogous backsolve, both using the structure of the factored matrix, are performed.
- Back Subsitution Phase: For a linear system, a local backsubstitution is performed on each processor in parallel.