Intel MKL PARDISO - Parallel Direct Sparse Solver Interface

This section describes the interface to the shared-memory multiprocessing parallel direct sparse solver known as the Intel MKL PARDISO solver. The interface is Fortran, but it can be called from C or C++ programs by observing Fortran parameter passing and naming conventions used by the supported compilers and operating systems.

The Intel MKL PARDISO package is a high-performance, robust, memory efficient, and easy to use software package for solving large sparse linear systems of equations on shared memory multiprocessors. The solver uses a combination of left- and right-looking Level-3 BLAS supernode techniques [Schenk00-2]. To improve sequential and parallel sparse numerical factorization performance, the algorithms are based on a Level-3 BLAS update and pipelining parallelism is used with a combination of left- and right-looking supernode techniques [Schenk00, Schenk01, Schenk02, Schenk03]. The parallel pivoting methods allow complete supernode pivoting to compromise numerical stability and scalability during the factorization process. For sufficiently large problem sizes, numerical experiments demonstrate that the scalability of the parallel algorithm is nearly independent of the shared-memory multiprocessing architecture.

The following table lists the names of the Intel MKL PARDISO routines and describes their general use.

Intel MKL PARDISO Routines
Routine Description
pardisoinit

Initializes Intel MKL PARDISO with default parameters depending on the matrix type.

pardiso

Calculates the solution of a set of sparse linear equations with single or multiple right-hand sides.

pardiso_64

Calculates the solution of a set of sparse linear equations with single or multiple right-hand sides, 64-bit integer version.

pardiso_getenv

Retrieves additional values from the Intel MKL PARDISO handle.

pardiso_setenv

Sets additional values in the Intel MKL PARDISO handle.

mkl_pardiso_pivot

Replaces routine which handles Intel MKL PARDISO pivots with user-defined routine.

pardiso_getdiag

Returns diagonal elements of initial and factorized matrix.

pardiso_handle_store

Store internal structures from pardiso to a file.

pardiso_handle_restore

Restore pardiso internal structures from a file.

pardiso_handle_delete

Delete files with pardiso internal structure data.

pardiso_handle_store_64

Store internal structures from pardiso_64 to a file.

pardiso_handle_restore_64

Restore pardiso_64 internal structures from a file.

pardiso_handle_delete_64

Delete files with pardiso_64 internal structure data.

The Intel MKL PARDISO solver supports a wide range of real and complex sparse matrix types (see the figure below).

Sparse Matrices That Can Be Solved with the Intel MKL PARDISO Solver


Sparse Matrices That Can be Solved With PARDISO

The Intel MKL PARDISO solver performs four tasks:

  • analysis and symbolic factorization
  • numerical factorization
  • forward and backward substitution including iterative refinement
  • termination to release all internal solver memory.

You can find code examples that use Intel MKL PARDISO routines to solve systems of linear equations in the examples folder of the Intel MKL installation directory:

  • Fortran examples: examples/solverf/source

  • C examples: examples/solverc/source

Supported Matrix Types

The analysis steps performed by Intel MKL PARDISO depend on the structure of the input matrix A.

Symmetric Matrices

The solver first computes a symmetric fill-in reducing permutation P based on either the minimum degree algorithm [Liu85] or the nested dissection algorithm from the METIS package [Karypis98] (both included with Intel MKL), followed by the parallel left-right looking numerical Cholesky factorization [Schenk00-2] of PAPT = LLT for symmetric positive-definite matrices, or PAPT = LDLT for symmetric indefinite matrices. The solver uses diagonal pivoting, or 1x1 and 2x2 Bunch-Kaufman pivoting for symmetric indefinite matrices. An approximation of X is found by forward and backward substitution and optional iterative refinement.

Whenever numerically acceptable 1x1 and 2x2 pivots cannot be found within the diagonal supernode block, the coefficient matrix is perturbed. One or two passes of iterative refinement may be required to correct the effect of the perturbations. This restricting notion of pivoting with iterative refinement is effective for highly indefinite symmetric systems. Furthermore, for a large set of matrices from different applications areas, this method is as accurate as a direct factorization method that uses complete sparse pivoting techniques [Schenk04].

Another method of improving the pivoting accuracy is to use symmetric weighted matching algorithms. These algorithms identify large entries in the coefficient matrix A that, if permuted close to the diagonal, permit the factorization process to identify more acceptable pivots and proceed with fewer pivot perturbations. These algorithms are based on maximum weighted matchings and improve the quality of the factor in a complementary way to the alternative of using more complete pivoting techniques.

The inertia is also computed for real symmetric indefinite matrices.

Structurally Symmetric Matrices

The solver first computes a symmetric fill-in reducing permutation P followed by the parallel numerical factorization of PAPT = QLUT. The solver uses partial pivoting in the supernodes and an approximation of X is found by forward and backward substitution and optional iterative refinement.

Nonsymmetric Matrices

The solver first computes a nonsymmetric permutation PMPS and scaling matrices Dr and Dc with the aim of placing large entries on the diagonal to enhance reliability of the numerical factorization process [Duff99]. In the next step the solver computes a fill-in reducing permutation P based on the matrix PMPSA + (PMPSA)T followed by the parallel numerical factorization

QLUR = PPMPSDrADcP

with supernode pivoting matrices Q and R. When the factorization algorithm reaches a point where it cannot factor the supernodes with this pivoting strategy, it uses a pivoting perturbation strategy similar to [Li99]. The magnitude of the potential pivot is tested against a constant threshold of

alpha = eps*||A2||inf,

where eps is the machine precision, A2 = P*PMPS*Dr*A*Dc*P, and ||A2||inf is the infinity norm of A. Any tiny pivots encountered during elimination are set to the sign (lII)*eps*||A2||inf, which trades off some numerical stability for the ability to keep pivots from getting too small. Although many failures could render the factorization well-defined but essentially useless, in practice the diagonal elements are rarely modified for a large class of matrices. The result of this pivoting approach is that the factorization is, in general, not exact and iterative refinement may be needed.

Sparse Data Storage

Sparse data storage in Intel MKL PARDISO follows the 3-array variation of the compressed sparse row (CSR3) format described in Sparse Matrix Storage Format with the Intel MKL PARDISO parameters ja standing for columns, ia for rowIndex, and a for values. The algorithms in Intel MKL PARDISO require column indices ja to be in increasing order per row and that the diagonal element in each row be present for any structurally symmetric matrix. For symmetric or nonsymmetric matrices the diagonal elements which are equal to zero are not necessary.

Caution

Intel MKL PARDISO column indices ja must be in increasing order per row. You can validate the sparse matrix structure with the matrix checker (iparm(27))

Note

While the presence of zero diagonal elements for symmetric matrices is not required, you should explicitly set zero diagonal elements for symmetric matrices. Otherwise, Intel MKL PARDISO creates internal copies of arrays ia, ja, and a full of diagonal elements, which require additional memory and computational time. However, the memory and time required the diagonal elements in internal arrays is usually not significant compared to the memory and the time required to factor and solve the matrix.

Storage of Matrices

By default, Intel MKL PARDISO stores matrices in RAM. However, you can specify that Intel MKL PARDISO store matrices on disk by setting iparm(60). This is referred to as in-core (IC) and out-of-core (OOC), respectively.

OOC parameters can be set in a configuration file. You can set the path to this file and its name using environmental variables MKL_PARDISO_OOC_CFG_PATH and MKL_PARDISO_OOC_CFG_FILE_NAME.

These variables specify the path and filename as follows:

<MKL_PARDISO_OOC_CFG_PATH>/<MKL_PARDISO_OOC_CFG_FILE_NAME> for Linux* OS and OS X*, and

<MKL_PARDISO_OOC_CFG_PATH>\<MKL_PARDISO_OOC_CFG_FILE_NAME> for Windows* OS.

By default, the name of the file is pardiso_ooc.cfg and it is placed in the current directory.

All temporary data files can be deleted or stored when the calculations are completed in accordance with the value of the environmental variable MKL_PARDISO_OOC_KEEP_FILE. If it is not set or if it is set to 1, then all files are deleted. If it is set to 0, then all files are stored.

By default, the OOC version of Intel MKL PARDISO uses the current directory for storing data, and all work arrays associated with the matrix factors are stored in files named ooc_temp with different extensions. These default values can be changed by using the environmental variable MKL_PARDISO_OOC_PATH.

To set the environmental variables MKL_PARDISO_OOC_MAX_CORE_SIZE, MKL_PARDISO_OOC_MAX_SWAP_SIZE, MKL_PARDISO_OOC_KEEP_FILE, and MKL_PARDISO_OOC_PATH, create the configuration file with the following lines:

MKL_PARDISO_OOC_PATH = <path>\ooc_file
MKL_PARDISO_OOC_MAX_CORE_SIZE = N
MKL_PARDISO_OOC_MAX_SWAP_SIZE = K
MKL_PARDISO_OOC_KEEP_FILE = 0 (or 1)

where <path> is the directory for storing data, ooc_file is the file name without any extension, N is the maximum size of RAM in megabytes available for Intel MKL PARDISO (default value is 2000 MB), K is the maximum swap size in megabytes available for Intel MKL PARDISO (default value is 0 MB). Do not set N+K greater than the size of the RAM plus the size of the swap. Be sure to allow enough free memory for the operating system and any other processes which are necessary.

Caution

The maximum length of the path lines in the configuration files is 1000 characters.

Alternatively the environment variables can be set via command line.

For Linux* OS and OS X*:

export MKL_PARDISO_OOC_PATH = <path>/ooc_file
export MKL_PARDISO_OOC_MAX_CORE_SIZE = N
export MKL_PARDISO_OOC_MAX_SWAP_SIZE = K
export MKL_PARDISO_OOC_KEEP_FILE = 0 (or 1)

For Windows* OS:

set MKL_PARDISO_OOC_PATH = <path>\ooc_file
set MKL_PARDISO_OOC_MAX_CORE_SIZE = N
set MKL_PARDISO_OOC_MAX_SWAP_SIZE = K
set MKL_PARDISO_OOC_KEEP_FILE = 0 (or 1)

Note

The values specified in a command line have higher priorities: if a variable is changed in the configuration file and in the command line, OOC version of Intel MKL PARDISO uses only the value defined in the command line.

Direct-Iterative Preconditioning for Nonsymmetric Linear Systems

The solver uses a combination of direct and iterative methods [Sonn89] to accelerate the linear solution process for transient simulation. Most applications of sparse solvers require solutions of systems with gradually changing values of the nonzero coefficient matrix, but with an identical sparsity pattern. In these applications, the analysis phase of the solvers has to be performed only once and the numerical factorizations are the important time-consuming steps during the simulation. Intel MKL PARDISO uses a numerical factorization and applies the factors in a preconditioned Krylow-Subspace iteration. If the iteration does not converge, the solver automatically switches back to the numerical factorization. This method can be applied to nonsymmetric matrices in Intel MKL PARDISO. You can select the method using the iparm(4) input parameter. The iparm(20) parameter returns the error status after running Intel MKL PARDISO.

Single and Double Precision Computations

Intel MKL PARDISO solves tasks using single or double precision. Each precision has its benefits and drawbacks. Double precision variables have more digits to store value, so the solver uses more memory for keeping data. But this mode solves matrices with better accuracy, which is especially important for input matrices with large condition numbers.

Single precision variables have fewer digits to store values, so the solver uses less memory than in the double precision mode. Additionally this mode usually takes less time. But as computations are made less precisely, only some systems of equations can be solved accurately enough using single precision.

Separate Forward and Backward Substitution

The solver execution step (see parameter phase = 33 below) can be divided into two or three separate substitutions: forward, backward, and possible diagonal. This separation can be explained by the examples of solving systems with different matrix types.

A real symmetric positive definite matrix A (mtype = 2) is factored by Intel MKL PARDISO as A = L*LT . In this case the solution of the system A*x=b can be found as sequence of substitutions: L*y=b (forward substitution, phase =331) and LT*x=y (backward substitution, phase =333).

A real nonsymmetric matrix A (mtype = 11) is factored by Intel MKL PARDISO as A = L*U . In this case the solution of the system A*x=b can be found by the following sequence: L*y=b (forward substitution, phase =331) and U*x=y (backward substitution, phase =333).

Solving a system with a real symmetric indefinite matrix A (mtype = -2) is slightly different from the cases above. Intel MKL PARDISO factors this matrix as A=LDLT, and the solution of the system A*x=b can be calculated as the following sequence of substitutions: L*y=b (forward substitution, phase =331), D*v=y (diagonal substitution, phase =332), and finally LT*x=v (backward substitution, phase =333). Diagonal substitution makes sense only for symmetric indefinite matrices (mtype = -2, -4, 6). For matrices of other types a solution can be found as described in the first two examples.

Caution

The number of refinement steps (iparm(8)) must be set to zero if a solution is calculated with separate substitutions (phase = 331, 332, 333), otherwise Intel MKL PARDISO produces the wrong result.

Note

Different pivoting (iparm(21)) produces different LDLT factorization. Therefore results of forward, diagonal and backward substitutions with diagonal pivoting can differ from results of the same steps with Bunch-Kaufman pivoting. Of course, the final results of sequential execution of forward, diagonal and backward substitution are equal to the results of the full solving step (phase=33) regardless of the pivoting used.

Callback Function for Pivoting Control

In-core Intel MKL PARDISO allows you to control pivoting with a callback routine, mkl_pardiso_pivot. You can then use the pardiso_getdiag routine to access the diagonal elements. Set iparm(56) to 1 in order to use the callback functionality.

For more complete information about compiler optimizations, see our Optimization Notice.