Hi All,

I have implemented Pardiso to solve my problem and it works fine. But the memory cost and efficiency of Pardiso is not outstanding. We have another sparse solver (**Solver A, not parallelized**) that use level-based preconditioner and CGS-based acceleration.The memory cost for Pardiso is about three to four times of Solver A. For most of my matrix, if I use only one processor for Pardiso, Solver A is much faster, if I use six processors, they are almost the same speed. For some large matrix, Solver A is even faster than Pardiso even if Pardiso uses 6 processors. I compared the runtime of numerical factorization and substitution, and found that solver A is much more efficient. We do not expect Pardiso to beat Solver A if Pardiso runs in serial mode. But we do hope Pardiso to beat Solver A when it runs in parallel mode using 6 processors.

The speedup for Pardiso seems good, I can get a speedup of 4.5 if I use 6 processors.

Here are the parameter I use.

maxfct = 1

mnum = 1

nrhs = 1

error = 0 ! initialize error flag

msglvl = 0 ! print statistical information

mtype = 11 ! real unsymmetric

iparm= 0

iparm(1) = 1 ! no solver default

iparm(2) = 3 ! fill-in reordering from METIS ,0-MIN DEGREE, 2-METIS, 3-OPENMP VERSION

iparm(3) = 0 ! numbers of processors. Input the next call mkl_set_dynamic(0), mkl_set_num_threads(n);

iparm(4) = 61 ! 0-no iterative-direct algorithm; 10*L+K, K=1 CGS, K=2 CGS for symmetric, 1.0E-L: stopping criterion

iparm(5) = 0 ! no user fill-in reducing permutation

iparm(6) = 0 ! if == 0, the array of b is replaced with the solution x.

iparm(7) = 0 ! Output, Number of iterative refinement steps performed

iparm(8) = 9 ! numbers of iterative refinement steps, must be 0 if a solution is calculated with separate substitutions

iparm(9) = 0 ! not in use

iparm(10) = 13 ! Default value 13, perturbe the pivot elements with 1E-13

iparm(11) = 1 ! use nonsymmetric permutation and scaling MPS

iparm(12) = 0 ! not in use

iparm(13) = 1 ! maximum weighted matching algorithm is switched-on (default for non-symmetric)

iparm(14) = 0 ! Output: number of perturbed pivots

iparm(15) = 0 ! Output, Peak memory on symbolic factorization.

iparm(16) = 0 ! Output, Permanent memory on symbolic factorization. This value is only computed in phase 1.

iparm(17) = 0 ! Output, Size of factors/Peak memory on numerical factorization and solution.

iparm(18) = 0 ! Input/output. Report the number of non-zero elements in the factors. >= 0 Disable reporting.

iparm(19) = 0 ! Input/output. Report number of floating point operations to factor matrix A. >= 0 Disable reporting.

iparm(20) = 0 ! Output: Numbers of CG Iterations. >0 CGS succeeded, reports the number of completed iterations.

iparm(24) = 1 ! Parallel factorization control, 0: classic algorithm, 1: two-level factorization algorithm, improve scalability on many threads.

iparm(25) = 0 ! Parallel forward/backward solve control. 0: Use parallel algorithm for the solve step; 1: Use the sequential forward/backward solve.

I tried to change iparm(4) and iparm(10), that does not make much difference. How can I improve the efficiency of Pardiso?

Thanks and best regards,

Daniel