I have implemented Pardiso to solve my problem and it works fine. But the memory cost and efficiency of Pardiso is not outstanding. We have another sparse solver (Solver A, not parallelized) that use level-based preconditioner and CGS-based acceleration.The memory cost for Pardiso is about three to four times of Solver A. For most of my matrix, if I use only one processor for Pardiso, Solver A is much faster, if I use six processors, they are almost the same speed. For some large matrix, Solver A is even faster than Pardiso even if Pardiso uses 6 processors. I compared the runtime of numerical factorization and substitution, and found that solver A is much more efficient. We do not expect Pardiso to beat Solver A if Pardiso runs in serial mode. But we do hope Pardiso to beat Solver A when it runs in parallel mode using 6 processors.
The speedup for Pardiso seems good, I can get a speedup of 4.5 if I use 6 processors.
Here are the parameter I use.
maxfct = 1
mnum = 1
nrhs = 1
error = 0 ! initialize error flag
msglvl = 0 ! print statistical information
mtype = 11 ! real unsymmetric
iparm(1) = 1 ! no solver default
iparm(2) = 3 ! fill-in reordering from METIS ,0-MIN DEGREE, 2-METIS, 3-OPENMP VERSION
iparm(3) = 0 ! numbers of processors. Input the next call mkl_set_dynamic(0), mkl_set_num_threads(n);
iparm(4) = 61 ! 0-no iterative-direct algorithm; 10*L+K, K=1 CGS, K=2 CGS for symmetric, 1.0E-L: stopping criterion
iparm(5) = 0 ! no user fill-in reducing permutation
iparm(6) = 0 ! if == 0, the array of b is replaced with the solution x.
iparm(7) = 0 ! Output, Number of iterative refinement steps performed
iparm(8) = 9 ! numbers of iterative refinement steps, must be 0 if a solution is calculated with separate substitutions
iparm(9) = 0 ! not in use
iparm(10) = 13 ! Default value 13, perturbe the pivot elements with 1E-13
iparm(11) = 1 ! use nonsymmetric permutation and scaling MPS
iparm(12) = 0 ! not in use
iparm(13) = 1 ! maximum weighted matching algorithm is switched-on (default for non-symmetric)
iparm(14) = 0 ! Output: number of perturbed pivots
iparm(15) = 0 ! Output, Peak memory on symbolic factorization.
iparm(16) = 0 ! Output, Permanent memory on symbolic factorization. This value is only computed in phase 1.
iparm(17) = 0 ! Output, Size of factors/Peak memory on numerical factorization and solution.
iparm(18) = 0 ! Input/output. Report the number of non-zero elements in the factors. >= 0 Disable reporting.
iparm(19) = 0 ! Input/output. Report number of floating point operations to factor matrix A. >= 0 Disable reporting.
iparm(20) = 0 ! Output: Numbers of CG Iterations. >0 CGS succeeded, reports the number of completed iterations.
iparm(24) = 1 ! Parallel factorization control, 0: classic algorithm, 1: two-level factorization algorithm, improve scalability on many threads.
iparm(25) = 0 ! Parallel forward/backward solve control. 0: Use parallel algorithm for the solve step; 1: Use the sequential forward/backward solve.
I tried to change iparm(4) and iparm(10), that does not make much difference. How can I improve the efficiency of Pardiso?
Thanks and best regards,