Perform LU factorization of a general block tridiagonal matrix.


Intel MKL LAPACK provides a wide range of subroutines for LU factorization of general matrices, including dense matrices, band matrices, and tridiagonal matrices. This recipe extends the range of functionality to general block tridiagonal matrices subject to condition all the blocks are square and have the same order.

To perform LU factorization of a block tridiagonal matrix with square blocks of size NB by NB:

  1. Sequentially apply partial LU factorization to rectangular blocks of size M by N formed by the first two block rows and first three block columns of the matrix (where M = 2NB, N = 3NB, and K = NB), and moving down along the diagonal until the last but one block row is processed.

    Partial LU factorization: for LU factorization of a general block tridiagonal matrix it is useful to have separate functionality for partial LU factorization of a rectangular M-by-N matrix. The partial LU factorization algorithm with parameter K, where K min(M, N), consists of

    1. Perform LU factorization of the M-by-K submatrix.

    2. Solve the system with triangular coefficient matrix.

    3. Update the lower right (M - K)-by-(N - K) block.

    The resulting matrix is A = P(L U + A1) where L is a lower trapezoidal M-by-K matrix, U is an upper trapezoidal matrix, P is permutation (pivoting) matrix, and A1 is a matrix with nonzero elements only in the intersection of the last M - K rows and N - K columns.

  2. Apply general LU factorization to the last (2NB) by (2NB) block.

Source code: see the BlockTDS_GE/source/dgeblttrf.f file in the samples archive available at http://software.intel.com/en-us/mkl_cookbook_samples.

Performing partial LU factorization

          CALL DGETRF( M, K, A, LDA, IPIV, INFO )
          DO I=1,K
                  CALL DSWAP(N-K, A(I,K+1), LDA, A(IPIV(I), K+1), LDA)
              END IF
          END DO
          CALL DTRSM('L','L','N','U',K,N-K,1D0, A, LDA, A(1,K+1), LDA)
          CALL DGEMM('N', 'N', M-K, N-K, K, -1D0, A(K+1,1), LDA, 
     &            A(1,K+1), LDA, 1D0, A(K+1,K+1), LDA)

Factoring a block tridiagonal matrix

      DO K=1,N-2
C Form a 2*NB by 3*NB submatrix A with block structure        
C     (D_K   C_K 0    )
C     (B_K D_K+1 C_K+1)         
C Partial factorization of the submatrix
          CALL PTLDGETRF(2*NB, 3*NB, NB, A, 2*NB, IPIV(1,K), INFO)
C Factorization results to be copied back to arrays storing blocks of the tridiagonal matrix
      END DO
C Out of loop factorization of the last 2*NB by 2*NB submatrix 
      CALL DGETRF(2*NB, 2*NB, A, 2*NB, IPIV(1,N-1), INFO)
C Copy the last result back to arrays storing blocks of the tridiagonal matrix

Routines Used




LU factorization of the M-by-K submatrix


Compute the LU factorization of a general m-by-n matrix

Permute rows of the matrix


Swap two vectors

Solving a system with triangular coefficient matrix


Solve a triangular matrix equation

Update lower-right (M - K)-by-(N - K) block


Compute a matrix-matrix product with general matrices.


For partial LU factorization, let A be a rectangular m-by-n matrix:


For ease of reading, lower-case indexes such as m, n, k, and nb are used in this discussion. These correspond to the upper-case indexes used in the Fortran solution and code samples.

The matrix can be decomposed using LU factorization of the m-by-k submatrix, where 0 < k n. For this application, k < min(m, n), because ?getrf can be used directly to factor the matrix if m k n or n = k m.

A can be represented as a block matrix:

where A 11 is a k-by-k submatrix, A 12 is a k-by-(n - k) submatrix, A 21 is an (m - k)-by-k submatrix, and A 22 is an (m - k)-by-(n - k) submatrix.

The m-by-k panel A 1 can be defined as

A 1 can be LU factored (using ?getrf) as A 1 = P L U , where P is a permutation (pivoting) matrix, L is lower trapezoidal with unit elements on the diagonal, and U is upper triangular:


Since the diagonal elements of L do not need to be stored, the array used to store A 1 can be used to store the elements of L and U.

Applying P T to the second panel of A gives:

This yields the equation:

Introducing the term A''12 defined as

and substituting it into the equation for P TA yields:

Multiplying the previous equation by P gives:

This can be considered a partial LU factorization of the initial matrix.


  • The product L -111A'12 can be computed by calling ?trsm and can be stored in place of the array used for A 12. The update A'22 - L 21(L -111A'12) can be computed by calling ?gemm and can be stored in place of the array used for A 22.

  • If the submatrices do not have full rank, this method cannot be applied because LU factorization would fail.

  • Unlike LU factorization of general matrices, for general block tridiagonal matrices the factorization A = L U described below cannot be written in the form A = P L U (where P is a permutation matrix). Because of pivoting, the structure of the left factor, L, includes permutations. Pivoting also complicates the right factor, U, which has three diagonals instead of two.

For LU factorization of a block tridiagonal matrix, let A be a block tridiagonal matrix where all blocks are square and of the same order n b :

The matrix is to be factored as A = L U .

First, consider 2-by-3 block submatrix

which can be decomposed as

This decomposition can be obtained by applying the partial LU factorization described previously. Here P T1 is a product of n b elementary permutations which can be represented as a 2n b -by-2n b matrix:

Introducing an N-by-N block matrix where all blocks are size n b -by-n b :

This allows the previous decomposition to be rewritten as:

Next, factor the 2-by-3 block matrix of the second and third rows of the matrix on the right-hand side of that equation:

where P T2 is defined as:

The previous decomposition can be continued as:

Introducing this notation for the pivoting matrix simplifies the equations:

where P Tj is 2n b by 2n b , and is located at the intersection of the j-th and (j+1)-st rows and columns. This allows the decomposition above to be written more compactly as

At step N - 2 the local factorization is:

After this step, multiplying by the pivoting matrix:


At the last (N - 1)-st step the matrix is square and factorization is complete:

The last step differs from previous ones in the structure of the pivoting as well: all previous P Tj for j = 1, 2, ..., N - 2 were products of n b permutations (they depend on n b integer parameters), whereas P TN-1 is applied to a square matrix of order 2n b ( it depends on 2n b parameters). So in order to store all of the pivoting indices an integer array of length (N - 2) n b + 2n b = N n b is necessary.

Multiplying the previous decomposition from the left by Π TN-1 gives the final decomposition

Multiplying this decomposition by Π 1Π 2...Π N-1allows it to be written in LU factorization form:

While applying this formula it should be taken into account that Π j for j = 1, 2, …, N-2 are products of n b elementary transpositions applied to block rows with indices j and j+1, but Π N-1 is the product of 2n b transpositions applied to the last two block rows N-1 and N.

Для получения подробной информации о возможностях оптимизации компилятора обратитесь к нашему Уведомлению об оптимизации.