problem with mkl pardiso performances

problem with mkl pardiso performances

Аватар пользователя Antoine  A.

Hello,

i'm using Pardiso to solve a real unsymmetrical problem with very large matrix
( n=40 000 ), 3% non-zeros (45 986 096) and 1 rhs.

I'm quite disappointed about the times of calculation : on my computer (8 CPUs, 16432032 KB total memory, 1595 MHz), it takes 35 min. Does it seem normal to you?
I haven't found any comparison of performance related to the size of the matrix...

Maybe my iparms are not optimized for my problem :
        iparm[0] = 1; /* No solver default */
        iparm[1] = 2; /* Fill-in reordering from METIS */
        iparm[2] = 8;
        iparm[3] = 31; /* CGS */
        iparm[4] = 0; /* No user fill-in reducing permutation */
        iparm[5] = 0; /* Write solution into x */
        iparm[6] = 0; /* Not in use */
        iparm[7] = 0; /* Max numbers of iterative refinement steps */
        iparm[8] = 0; /* Not in use */
        iparm[9] = 13; /* Perturb the pivot elements with 1E-13 */
        iparm[10] = 1; /* Use nonsymmetric permutation and scaling MPS */
        iparm[11] = 0; /* Not in use */
        iparm[12] = 0; /* Not in use */
        iparm[13] = 0; /* Output: Number of perturbed pivots */
        iparm[14] = 0; /* Not in use */
        iparm[15] = 0; /* Not in use */
        iparm[16] = 0; /* Not in use */
        iparm[17] = -1; /* Output: Number of nonzeros in the factor LU */
        iparm[18] = -1; /* Output: Mflops for LU factorization */
        iparm[19] = 0; /* Output: Numbers of CG Iterations */
        iparm[31] = 1; /* iterative solver*/
        iparm[60] = 2; /* Out-of-Core resolution */

Thanks for any help you can give me
Yannick

p.s : I don't know where to find the version of pardiso ( is not in mkl_pardiso.h)

8 сообщений / 0 новое
Последнее сообщение
Пожалуйста, обратитесь к странице Уведомление об оптимизации для более подробной информации относительно производительности и оптимизации в программных продуктах компании Intel.
Аватар пользователя Gennady Fedorov (Intel)

Yannick,
your case is pretty big and you works in Out-Of-Core mode with is non-threaded and
actually this time is looks reasonable for that case.
what is the size of RAM on your system?
and what CPU type you use?
regarding pardiso version - in the latest version on mkl - pardiso print the version in the case if msglvl == 1 or you can find the version on mkl into
..\Documentation\mklsupport.txt
--Gennady

Аватар пользователя Antoine  A.

Thanks for your reply

The CPUs are : Bi-pro, Quad Core Intel Xeon E5335, 2x4MB Cache, 2.0GHz, 1333MHZ,
the size of my RAM is 16 432 032 KB,
and the Package ID of mkl is : l_mkl_p_10.0.011

I turned off Out-Of-Core mode, but it's the same performance.

Аватар пользователя Gennady Fedorov (Intel)

ok, thanks. I see you are working on the modest CPU but the mkl version is pretty aged. i would say that we did many improvements since 10.0 especially with OOC mode. Can you evaluate the latest 11.0? it's free for 30 days.

Аватар пользователя Alexander Kalinkin (Intel)

Hi Antonie,
Could you provide pardiso output by msglvl=1? It could help us to understand reason of bad performance of your testcase. In your iparm I see only one strange point for me - what the reason of setting iparm[3]=31?
With best regards,
Alexander Kalinkin

Аватар пользователя Antoine  A.

iparm[31] is just a try;
I tried iparm[3]=31 and iparm[3]=0, It doesn't influence a lot on performance

================ PARDISO: solving a real nonsymmetric system ================

Summary PARDISO: ( reorder to reorder )
================

Times:
======

Time fulladj: 2.646652 s
Time reorder: 14.776046 s
Time symbfct: 5.390644 s
Time parlist: 0.193419 s
Time malloc : -0.334824 s
Time total : 37.119933 s total - sum: 14.447996 s

Statistics:
===========
< Parallel Direct Factorization with #processors: > 8
< Hybrid Solver PARDISO with CGS/CG Iteration >

< Linear system Ax = b>
#equations: 40000
#non-zeros in A: 45986096
non-zeros in A (%): 2.874131
#right-hand sides: 1

< Factors L and U >
#columns for each panel: 72
#independent subgraphs: 0
< Preprocessing with state of the art partitioning metis>
#supernodes: 3069
size of largest supernode: 19757
number of nonzeros in L 462436285
number of nonzeros in U 460178265
number of nonzeros in L+U 922614550

Reordering completed ...
Number of nonzeros in factors = 922614550
Number of factorization MFLOPS = 13695630
================ PARDISO: solving a real nonsymmetric system ================

Summary PARDISO: ( factorize to factorize )
================

Times:
======

Time A to LU: 0.000000 s
Time numfct : 2071.496263 s
Time malloc : -0.000183 s
Time total : 2071.496571 s total - sum: 0.000491 s

Statistics:
===========
< Parallel Direct Factorization with #processors: > 8
< Hybrid Solver PARDISO with CGS/CG Iteration >

< Linear system Ax = b>
#equations: 40000
#non-zeros in A: 45986096
non-zeros in A (%): 2.874131
#right-hand sides: 1

< Factors L and U >
#columns for each panel: 72
#independent subgraphs: 0
< Preprocessing with state of the art partitioning metis>
#supernodes: 3069
size of largest supernode: 19757
number of nonzeros in L 462436285
number of nonzeros in U 460178265
number of nonzeros in L+U 922614550
gflop for the numerical factorization: 13695.630000
gflop/s for the numerical factorization: 6.611467

Factorization completed ...
================ PARDISO: solving a real nonsymmetric system ================

Summary PARDISO: ( solve to solve )
================

Times:
======

Time cgs : 6.895049 s cgx iterations 1
Time malloc : -0.000012 s
Time total : 6.895151 s total - sum: 0.000113 s

Statistics:
===========
< Parallel Direct Factorization with #processors: > 8
< Hybrid Solver PARDISO with CGS/CG Iteration >

< Linear system Ax = b>
#equations: 40000
#non-zeros in A: 45986096
non-zeros in A (%): 2.874131
#right-hand sides: 1

< Factors L and U >
#columns for each panel: 72
#independent subgraphs: 0
< Preprocessing with state of the art partitioning metis>
#supernodes: 3069
size of largest supernode: 19757
number of nonzeros in L 462436285
number of nonzeros in U 460178265
number of nonzeros in L+U 922614550
gflop for the numerical factorization: 13695.630000
gflop/s for the numerical factorization: 6.611467

Solve completed ...

===========

Аватар пользователя Alexander Kalinkin (Intel)

Hi Antonie,
Your output a bit confused me - the factorized matrix is almost dense! Could you send this matrix to me (for example in private thread) to understand such situation arose?
With best regards,
Alexander Kalinkin

Аватар пользователя Antoine  A.

The file weighs 790Mo...
How can i give it to you?

Зарегистрируйтесь, чтобы оставить комментарий.