# problem with mkl pardiso performances

## problem with mkl pardiso performances

Hello,

i'm using Pardiso to solve a real unsymmetrical problem with very large matrix
( n=40 000 ), 3% non-zeros (45 986 096) and 1 rhs.

I'm quite disappointed about the times of calculation : on my computer (8 CPUs, 16432032 KB total memory, 1595 MHz), it takes 35 min. Does it seem normal to you?
I haven't found any comparison of performance related to the size of the matrix...

Maybe my iparms are not optimized for my problem :
iparm[0] = 1; /* No solver default */
iparm[1] = 2; /* Fill-in reordering from METIS */
iparm[2] = 8;
iparm[3] = 31; /* CGS */
iparm[4] = 0; /* No user fill-in reducing permutation */
iparm[5] = 0; /* Write solution into x */
iparm[6] = 0; /* Not in use */
iparm[7] = 0; /* Max numbers of iterative refinement steps */
iparm[8] = 0; /* Not in use */
iparm[9] = 13; /* Perturb the pivot elements with 1E-13 */
iparm[10] = 1; /* Use nonsymmetric permutation and scaling MPS */
iparm[11] = 0; /* Not in use */
iparm[12] = 0; /* Not in use */
iparm[13] = 0; /* Output: Number of perturbed pivots */
iparm[14] = 0; /* Not in use */
iparm[15] = 0; /* Not in use */
iparm[16] = 0; /* Not in use */
iparm[17] = -1; /* Output: Number of nonzeros in the factor LU */
iparm[18] = -1; /* Output: Mflops for LU factorization */
iparm[19] = 0; /* Output: Numbers of CG Iterations */
iparm[31] = 1; /* iterative solver*/
iparm[60] = 2; /* Out-of-Core resolution */

Yannick

p.s : I don't know where to find the version of pardiso ( is not in mkl_pardiso.h)

8 posts / 0 new
For more complete information about compiler optimizations, see our Optimization Notice.

Yannick,
your case is pretty big and you works in Out-Of-Core mode with is non-threaded and
actually this time is looks reasonable for that case.
what is the size of RAM on your system?
and what CPU type you use?
regarding pardiso version - in the latest version on mkl - pardiso print the version in the case if msglvl == 1 or you can find the version on mkl into
..\Documentation\mklsupport.txt

The CPUs are : Bi-pro, Quad Core Intel Xeon E5335, 2x4MB Cache, 2.0GHz, 1333MHZ,
the size of my RAM is 16 432 032 KB,
and the Package ID of mkl is : l_mkl_p_10.0.011

I turned off Out-Of-Core mode, but it's the same performance.

ok, thanks. I see you are working on the modest CPU but the mkl version is pretty aged. i would say that we did many improvements since 10.0 especially with OOC mode. Can you evaluate the latest 11.0? it's free for 30 days.

Hi Antonie,
Could you provide pardiso output by msglvl=1? It could help us to understand reason of bad performance of your testcase. In your iparm I see only one strange point for me - what the reason of setting iparm[3]=31?
With best regards,

iparm[31] is just a try;
I tried iparm[3]=31 and iparm[3]=0, It doesn't influence a lot on performance

================ PARDISO: solving a real nonsymmetric system ================

Summary PARDISO: ( reorder to reorder )
================

Times:
======

Time reorder: 14.776046 s
Time symbfct: 5.390644 s
Time parlist: 0.193419 s
Time malloc : -0.334824 s
Time total : 37.119933 s total - sum: 14.447996 s

Statistics:
===========
< Parallel Direct Factorization with #processors: > 8
< Hybrid Solver PARDISO with CGS/CG Iteration >

< Linear system Ax = b>
#equations: 40000
#non-zeros in A: 45986096
non-zeros in A (%): 2.874131
#right-hand sides: 1

< Factors L and U >
#columns for each panel: 72
#independent subgraphs: 0
< Preprocessing with state of the art partitioning metis>
#supernodes: 3069
size of largest supernode: 19757
number of nonzeros in L 462436285
number of nonzeros in U 460178265
number of nonzeros in L+U 922614550

Reordering completed ...
Number of nonzeros in factors = 922614550
Number of factorization MFLOPS = 13695630
================ PARDISO: solving a real nonsymmetric system ================

Summary PARDISO: ( factorize to factorize )
================

Times:
======

Time A to LU: 0.000000 s
Time numfct : 2071.496263 s
Time malloc : -0.000183 s
Time total : 2071.496571 s total - sum: 0.000491 s

Statistics:
===========
< Parallel Direct Factorization with #processors: > 8
< Hybrid Solver PARDISO with CGS/CG Iteration >

< Linear system Ax = b>
#equations: 40000
#non-zeros in A: 45986096
non-zeros in A (%): 2.874131
#right-hand sides: 1

< Factors L and U >
#columns for each panel: 72
#independent subgraphs: 0
< Preprocessing with state of the art partitioning metis>
#supernodes: 3069
size of largest supernode: 19757
number of nonzeros in L 462436285
number of nonzeros in U 460178265
number of nonzeros in L+U 922614550
gflop for the numerical factorization: 13695.630000
gflop/s for the numerical factorization: 6.611467

Factorization completed ...
================ PARDISO: solving a real nonsymmetric system ================

Summary PARDISO: ( solve to solve )
================

Times:
======

Time cgs : 6.895049 s cgx iterations 1
Time malloc : -0.000012 s
Time total : 6.895151 s total - sum: 0.000113 s

Statistics:
===========
< Parallel Direct Factorization with #processors: > 8
< Hybrid Solver PARDISO with CGS/CG Iteration >

< Linear system Ax = b>
#equations: 40000
#non-zeros in A: 45986096
non-zeros in A (%): 2.874131
#right-hand sides: 1

< Factors L and U >
#columns for each panel: 72
#independent subgraphs: 0
< Preprocessing with state of the art partitioning metis>
#supernodes: 3069
size of largest supernode: 19757
number of nonzeros in L 462436285
number of nonzeros in U 460178265
number of nonzeros in L+U 922614550
gflop for the numerical factorization: 13695.630000
gflop/s for the numerical factorization: 6.611467

Solve completed ...

===========

Hi Antonie,
Your output a bit confused me - the factorized matrix is almost dense! Could you send this matrix to me (for example in private thread) to understand such situation arose?
With best regards,