# Parallel Direct Sparse Solver for Clusters

By Alexander Kalinkin (Intel), published on March 22, 2014

**Product Overview**

Parallel Direct Sparse Solver for Clusters is a powerful tool set for solving system of linear equations with sparse matrix of millions rows/columns size. Direct Sparse Solvers for Clusters provides an advanced implementation of the modern algorithms and is considered as expansion of Intel MKL Pardiso on cluster computations. For more experienced users, Direct Sparse Solvers for Clusters offers insight into the solvers sufficient to finer tune them for better performance. Direct Sparse Solvers for Clusters is available starting Intel MKL 11.2

The main features in Direct sparse solvers for Clusters functionality:

- Distributed csr format, support distributed matrix, rhs and/or distributed solution
- Solving of system with multiple right hand side
- Cluster support of factorization and solving steps
- C and Fortran examples

A hybrid implementation combines Message Passing Interface (MPI) technology for data exchange between parallel tasks (processes) running on different nodes, and OpenMP* technology for parallelism inside each node of the cluster. This approach effectively uses modern hardware resources such as clusters consisting of nodes with multi-core processors. The solver code is optimized for the latest Intel processors and also performs well on clusters consisting of non-Intel processors.

Direct Sparse Solvers for Clusters provides a Fortran interface, but can be called from C programs by observing Fortran parameter passing and naming conventions used by the supported compilers and operating systems. Code examples are available in the Intel MKL installation examples directory.

Please see the Release Notes for more details on technical requirements, including the list of supported processors and operating systems.

**Q - How do I get started using Parallel Direct Sparse Solvers for Clusters?**

A – The first thing that we recommend you to do before starting to use this tool is to make sure that you have reviewed the Reference Manual for Intel® Math Kernel Library and Startup Guide. These guides provide a detailed explanation on how to get started with Cluster sparse solvers. Please pay attention to the Cluster Sparse Solvers requirements, including the availability of the Intel® C++ and Fortran Compilers, as well as the Intel® Math Kernel Library (Intel® MKL).

**Q - Where can I get support for the use of this utility?**

A - We encourage you to visit our support forum.

**Q – How can I switch from Intel MKL Pardiso to cluster version?**

A – The main changes described by following algorithm:

*Fortran interface:*

PARDISO:

Call PARDISO(pt, maxfct, mnum, mtype, phase, n, a, ia, ja, idum, nrhs, iparm, msglvl, b, x, error);

cluster_sparse_solver :

Call cluster_sparse_solver(pt, maxfct, mnum, mtype, phase, n, a, ia, ja, idum, nrhs, iparm, msglvl, b, x, &error, **comm**);

*C interface:*

PARDISO:

PARDISO (pt, &maxfct, &mnum, &mtype, &phase, &n, a, ia, ja, &idum, &nrhs, iparm, &msglvl, b, x, &error);

cluster_sparse_solver :

** comm = MPI_Comm_c2f(MPI_COMM_WORLD);**

cluster_sparse_solver(pt, &maxfct, &mnum, &mtype, &phase, &n, a, ia, ja, &idum, &nrhs, iparm, &msglvl, b, x, &error, &**comm**);

**Q – How can I put my matrix to Cluster Sparse Solvers interface?**

A – There are 2 variants of input data, Column Sparse Row (CSR) that is supported by PARDISO and Distributed CSR (DCSR) that can handle with distributed matrix between processes. The DCSR format is useful for FEM (Finite Element Method) based algorithms.

In a more general FEA (Finite Element Analysis) approach, let’s consider a 2D basic mesh topology, formed of 4 elements sharing a central node.

Considering only one degree of freedom per node, the connectivity of such system could be expressed by the following 9x9 symmetric matrix:

As mentioned above, in a distributed assembled matrix input format, the above input matrix A should be divided into sequential row subsets, or domains. Each domain belonging to its own MPI process with potentially intersecting neighboring domains. Again for such intersections, the element values of the resulting matrix are calculated as a sum of respective elements of two domains.

Let’s consider each element as a specific domain:

The resulting values, columns and rowIndex arrays on each MPI process with their respective distributed connectivity matrix would be:

As mentioned above, each domain is a sequential row subset, delimited by parameter iparm(41) and iparm(42). In the above example, for Domain 1, the local connectivity matrix had to include the missing degrees of freedom (rows from 3 to 7). This can be easily overcome as the compress row-by-row storage restriction for Parallel Direct Sparse Solver for Clusters was alleviated in order to allow zero diagonal element to be omitted.

**Q – Where can I find any papers/performance results of Cluster Sparse Solvers functionality?**

A –

- A. Kalinkin and K. Arturov “
*Asynchronous approach to memory management in sparse multifrontal methods on multiprocessors*” , Applied Mathematics, Vol. 4 No. 12A, 2013, pp. 33-39. doi: 10.4236/am.2013.412A004. - Kalinkin, Anders, Anders “
*Intel direct sparse solver for clusters, a research project for solving large sparse systems of linear algebraic equations on clusters*”, SparseDays-13, Toulouse, Franse - Kalinkin, Anders, Anders, Kuznetsov, Shustrov, Pudov “
*Sparse Linear Algebra support in Intel® Math Kernel Library*”, Sparse Linear Algebra Solvers for High Performance Computing Workshop, Coventry, England - Kalinkin, Anders, Anders “Intel Direct Sparse Solver for Clusters, a research project for solving large sparse systems of linear algebraic equations on clusters”, Differential Equations. Function Spaces. Approximation Theory conference, August, Novosibirsk, Russia

Please find the Attachment for the same Document Description