Dear All,

I have a reactive transport model that utilizing pardiso as the solver. The model is discretized by control volume method and solved by newton iteration method. I do sybmolic factorization at first, then do numerical factorization and substitution in every time step. The model can work but I do have a question on the speedup problem.

The parameter for pardiso are as follows:

iparm = 0

iparm(1) = 1 ! no solver default

iparm(2) = 3 ! fill-in reordering from METIS ,0-MIN DEGREE, 2-METIS, 3-OPENMP VERSION

iparm(3) = 0 ! numbers of processors. Input the next call mkl_set_dynamic(0), mkl_set_num_threads(n);

iparm(4) = 61 ! 0-no iterative-direct algorithm; 10*L+K, K=1 CGS, K=2 CGS for symmetric, 1.0E-L: stopping criterion

iparm(5) = 0 ! no user fill-in reducing permutation

iparm(6) = 0 ! if == 0, the array of b is replaced with the solution x.

iparm(7) = 0 ! Output, Number of iterative refinement steps performed

iparm(8) = 9 ! numbers of iterative refinement steps, must be 0 if a solution is calculated with separate substitutions (phase = 331, 332, 333)

iparm(9) = 0 ! not in use

iparm(10) = 13 ! Default value 13, perturbe the pivot elements with 1E-13

iparm(11) = 1 ! use nonsymmetric permutation and scaling MPS

iparm(12) = 0 ! not in use

iparm(13) = 1 ! maximum weighted matching algorithm is switched-on (default for non-symmetric)

iparm(14) = 0 ! Output: number of perturbed pivots

iparm(15) = 0 ! Output, Peak memory on symbolic factorization.

iparm(16) = 0 ! Output, Permanent memory on symbolic factorization. This value is only computed in phase 1.

iparm(17) = 0 ! Output, Size of factors/Peak memory on numerical factorization and solution.

iparm(18) = 0 ! Input/output. Report the number of non-zero elements in the factors. >= 0 Disable reporting.

iparm(19) = 0 ! Input/output. Report number of floating point operations to factor matrix A. >= 0 Disable reporting.

iparm(20) = 0 ! Output: Numbers of CG Iterations. >0 CGS succeeded, reports the number of completed iterations.

iparm(24) = 1 ! Parallel factorization control, 0: classic algorithm, 1: two-level factorization algorithm, improve scalability on many threads.

iparm(25) = 0 ! Parallel forward/backward solve control. 0: Use parallel algorithm for the solve step; 1: Use the sequential forward/backward solve.

iparm(27) = 0 !check matrix error, 0-without check, 1-check

maxfct = 1

mnum = 1

nrhs = 1

error = 0 ! initialize error flag

msglvl = 0 ! print statistical information

mtype = 11 ! real unsymmetric

First, I do symbolic factorization with phase = 11 (One time)

phase = 11 ! only reordering and symbolic factorization

call pardiso (pt_in, maxfct, mnum, mtype, phase, n_in, a_in, ia_in, ja_in, idum, nrhs, iparm, msglvl, ddum, ddum, error)

Then, in every time step, I do newton iteration. For every newton iteration, I do numerical factorization and substitution:

!Newton iteration

phase = 22 ! only factorization

call pardiso (pt_in, maxfct, mnum, mtype, phase, n_in, a_in, ia_in, ja_in, idum, nrhs, iparm_in, msglvl, ddum, ddum, error)

phase = 33 ! only substitution

call pardiso (pt_in, maxfct, mnum, mtype, phase, n_in, a_in, ia_in, ja_in, idum, nrhs, iparm, msglvl, b_in, x_out, error)

!Newton iteration

The matrix is 2573775*2573775 with nozero entry of 12810555.

When I use 4 processors, the time consuming is as follows:

TimeStep NewtonIteration NumericalFactorization Substitution

1 1 2.673447 0.253894

1 2 0.000024 0.614713

1 3 0.000023 0.595046

2 1 0.000023 0.619408

2 2 0.000023 0.613553

...

My question is: Why does substitution cost so much time after the first step? In common sense, time consuming of substitution is much less than numerical factorization.

System is WIN7 X64

Compiler: Intel Parallel Studio XE 2013

Thanks.