Parallel Direct Sparse Solver for Clusters is a powerful tool set for solving system of linear equations with sparse matrix of millions rows/columns size. Direct Sparse Solvers for Clusters provides an advanced implementation of the modern algorithms and is considered as expansion of Intel® Math Kernel Library (Intel® MKL) Pardiso on cluster computations. For more experienced users, Direct Sparse Solvers for Clusters offers insight into the solvers sufficient to finer tune them for better performance. Direct Sparse Solvers for Clusters is available starting Intel MKL 11.2
The main features in Direct sparse solvers for Clusters functionality:
A hybrid implementation combines Message Passing Interface (MPI) technology for data exchange between parallel tasks (processes) running on different nodes, and OpenMP* technology for parallelism inside each node of the cluster. This approach effectively uses modern hardware resources such as clusters consisting of nodes with multi-core processors. The solver code is optimized for the latest Intel processors and also performs well on clusters consisting of non-Intel processors.
Direct Sparse Solvers for Clusters provides a Fortran interface, but can be called from C programs by observing Fortran parameter passing and naming conventions used by the supported compilers and operating systems. Code examples are available in the Intel MKL installation examples directory.
Please see the Release Notes for more details on technical requirements, including the list of supported processors and operating systems.
Q - How do I get started using Parallel Direct Sparse Solvers for Clusters?
A – The first thing that we recommend you to do before starting to use this tool is to make sure that you have reviewed the Reference Manual for Intel® Math Kernel Library and Startup Guide. These guides provide a detailed explanation on how to get started with Cluster sparse solvers. Please pay attention to the Cluster Sparse Solvers requirements, including the availability of the Intel® C++ and Fortran Compilers, as well as the Intel® Math Kernel Library (Intel® MKL).
Q - Where can I get support for the use of this utility?
A - We encourage you to visit our support forum.
Q – How can I switch from Intel MKL Pardiso to cluster version?
A – The main changes described by following algorithm:
Call PARDISO(pt, maxfct, mnum, mtype, phase, n, a, ia, ja, idum, nrhs, iparm, msglvl, b, x, error);
Call cluster_sparse_solver(pt, maxfct, mnum, mtype, phase, n, a, ia, ja, idum, nrhs, iparm, msglvl, b, x, &error, comm);
PARDISO (pt, &maxfct, &mnum, &mtype, &phase, &n, a, ia, ja, &idum, &nrhs, iparm, &msglvl, b, x, &error);
comm = MPI_Comm_c2f(MPI_COMM_WORLD);
cluster_sparse_solver(pt, &maxfct, &mnum, &mtype, &phase, &n, a, ia, ja, &idum, &nrhs, iparm, &msglvl, b, x, &error, &comm);
Q – How can I put my matrix to Cluster Sparse Solvers interface?
A – There are 2 variants of input data, Column Sparse Row (CSR) that is supported by PARDISO and Distributed CSR (DCSR) that can handle with distributed matrix between processes. The DCSR format is useful for FEM (Finite Element Method) based algorithms.
In a more general FEA (Finite Element Analysis) approach, let’s consider a 2D basic mesh topology, formed of 4 elements sharing a central node.
Considering only one degree of freedom per node, the connectivity of such system could be expressed by the following 9x9 symmetric matrix:
As mentioned above, in a distributed assembled matrix input format, the above input matrix A should be divided into sequential row subsets, or domains. Each domain belonging to its own MPI process with potentially intersecting neighboring domains. Again for such intersections, the element values of the resulting matrix are calculated as a sum of respective elements of two domains.
Let’s consider each element as a specific domain:
The resulting values, columns and rowIndex arrays on each MPI process with their respective distributed connectivity matrix would be:
As mentioned above, each domain is a sequential row subset, delimited by parameter iparm(41) and iparm(42). In the above example, for Domain 1, the local connectivity matrix had to include the missing degrees of freedom (rows from 3 to 7). This can be easily overcome as the compress row-by-row storage restriction for Parallel Direct Sparse Solver for Clusters was alleviated in order to allow zero diagonal element to be omitted.
Q – Where can I find any papers/performance results of Cluster Sparse Solvers functionality?
Please find the Attachment for the same Document Description
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804