Goal

Perform LU factorization of a general block tridiagonal matrix.

Solution

Intel MKL LAPACK provides a wide range of subroutines for LU factorization of general matrices, including dense matrices, band matrices, and tridiagonal matrices. This recipe extends the range of functionality to general block tridiagonal matrices subject to condition all the blocks are square and have the same order.

To perform LU factorization of a block tridiagonal matrix with square blocks of size NB by NB:

1. Sequentially apply partial LU factorization to rectangular blocks of size M by N formed by the first two block rows and first three block columns of the matrix (where M = 2NB, N = 3NB, and K = NB), and moving down along the diagonal until the last but one block row is processed.

Partial LU factorization: for LU factorization of a general block tridiagonal matrix it is useful to have separate functionality for partial LU factorization of a rectangular M-by-N matrix. The partial LU factorization algorithm with parameter K, where K min(M, N), consists of

1. Perform LU factorization of the M-by-K submatrix.

2. Solve the system with triangular coefficient matrix.

3. Update the lower right (M - K)-by-(N - K) block.

The resulting matrix is A = P(L U + A1) where L is a lower trapezoidal M-by-K matrix, U is an upper trapezoidal matrix, P is permutation (pivoting) matrix, and A1 is a matrix with nonzero elements only in the intersection of the last M - K rows and N - K columns.

2. Apply general LU factorization to the last (2NB) by (2NB) block.

Source code: see the BlockTDS_GE/source/dgeblttrf.f file in the samples archive available at http://software.intel.com/en-us/mkl_cookbook_samples.

Performing partial LU factorization

SUBROUTINE PTLDGETRF(M, N, K, A, LDA, IPIV, INFO)
…
CALL DGETRF( M, K, A, LDA, IPIV, INFO )
…
DO I=1,K
IF(IPIV(I).NE.I)THEN
CALL DSWAP(N-K, A(I,K+1), LDA, A(IPIV(I), K+1), LDA)
END IF
END DO
CALL DTRSM('L','L','N','U',K,N-K,1D0, A, LDA, A(1,K+1), LDA)
CALL DGEMM('N', 'N', M-K, N-K, K, -1D0, A(K+1,1), LDA,
&            A(1,K+1), LDA, 1D0, A(K+1,K+1), LDA)
…

Factoring a block tridiagonal matrix

…
DO K=1,N-2
C Form a 2*NB by 3*NB submatrix A with block structure
C     (D_K   C_K 0    )
C     (B_K D_K+1 C_K+1)
…
C Partial factorization of the submatrix
CALL PTLDGETRF(2*NB, 3*NB, NB, A, 2*NB, IPIV(1,K), INFO)
C Factorization results to be copied back to arrays storing blocks of the tridiagonal matrix
…
END DO
C Out of loop factorization of the last 2*NB by 2*NB submatrix
CALL DGETRF(2*NB, 2*NB, A, 2*NB, IPIV(1,N-1), INFO)
C Copy the last result back to arrays storing blocks of the tridiagonal matrix
…

Routines Used

Routine

Description

LU factorization of the M-by-K submatrix

DGETRF

Compute the LU factorization of a general m-by-n matrix

Permute rows of the matrix

DSWAP

Swap two vectors

Solving a system with triangular coefficient matrix

DTRSM

Solve a triangular matrix equation

Update lower-right (M - K)-by-(N - K) block

DGEMM

Compute a matrix-matrix product with general matrices.

Discussion

For partial LU factorization, let A be a rectangular m-by-n matrix: Note

For ease of reading, lower-case indexes such as m, n, k, and nb are used in this discussion. These correspond to the upper-case indexes used in the Fortran solution and code samples.

The matrix can be decomposed using LU factorization of the m-by-k submatrix, where 0 < k n. For this application, k < min(m, n), because ?getrf can be used directly to factor the matrix if m k n or n = k m.

A can be represented as a block matrix: where A 11 is a k-by-k submatrix, A 12 is a k-by-(n - k) submatrix, A 21 is an (m - k)-by-k submatrix, and A 22 is an (m - k)-by-(n - k) submatrix.

The m-by-k panel A 1 can be defined as A 1 can be LU factored (using ?getrf) as A 1 = P L U , where P is a permutation (pivoting) matrix, L is lower trapezoidal with unit elements on the diagonal, and U is upper triangular:  Note

Since the diagonal elements of L do not need to be stored, the array used to store A 1 can be used to store the elements of L and U.

Applying P T to the second panel of A gives: This yields the equation: Introducing the term A''12 defined as and substituting it into the equation for P TA yields: Multiplying the previous equation by P gives: This can be considered a partial LU factorization of the initial matrix.

Note

• The product L -111A'12 can be computed by calling ?trsm and can be stored in place of the array used for A 12. The update A'22 - L 21(L -111A'12) can be computed by calling ?gemm and can be stored in place of the array used for A 22.

• If the submatrices do not have full rank, this method cannot be applied because LU factorization would fail.

• Unlike LU factorization of general matrices, for general block tridiagonal matrices the factorization A = L U described below cannot be written in the form A = P L U (where P is a permutation matrix). Because of pivoting, the structure of the left factor, L, includes permutations. Pivoting also complicates the right factor, U, which has three diagonals instead of two.

For LU factorization of a block tridiagonal matrix, let A be a block tridiagonal matrix where all blocks are square and of the same order n b : The matrix is to be factored as A = L U .

First, consider 2-by-3 block submatrix which can be decomposed as This decomposition can be obtained by applying the partial LU factorization described previously. Here P T1 is a product of n b elementary permutations which can be represented as a 2n b -by-2n b matrix: Introducing an N-by-N block matrix where all blocks are size n b -by-n b : This allows the previous decomposition to be rewritten as: Next, factor the 2-by-3 block matrix of the second and third rows of the matrix on the right-hand side of that equation: where P T2 is defined as: The previous decomposition can be continued as: Introducing this notation for the pivoting matrix simplifies the equations:  where P Tj is 2n b by 2n b , and is located at the intersection of the j-th and (j+1)-st rows and columns. This allows the decomposition above to be written more compactly as At step N - 2 the local factorization is: After this step, multiplying by the pivoting matrix: gives: At the last (N - 1)-st step the matrix is square and factorization is complete: The last step differs from previous ones in the structure of the pivoting as well: all previous P Tj for j = 1, 2, ..., N - 2 were products of n b permutations (they depend on n b integer parameters), whereas P TN-1 is applied to a square matrix of order 2n b ( it depends on 2n b parameters). So in order to store all of the pivoting indices an integer array of length (N - 2) n b + 2n b = N n b is necessary.

Multiplying the previous decomposition from the left by Π TN-1 gives the final decomposition Multiplying this decomposition by Π 1Π 2...Π N-1allows it to be written in LU factorization form: While applying this formula it should be taken into account that Π j for j = 1, 2, …, N-2 are products of n b elementary transpositions applied to block rows with indices j and j+1, but Π N-1 is the product of 2n b transpositions applied to the last two block rows N-1 and N.

Для получения подробной информации о возможностях оптимизации компилятора обратитесь к нашему Уведомлению об оптимизации.