PARDISO's slow reordering step

PARDISO's slow reordering step

Hi,

PARDISO's reordering step looks very slow. Here's some statistics produced by PARDISO itself:

================  PARDISO: solving a real nonsymmetric system  ================


Summary PARDISO: ( reorder to factorize )
================

Times:
======
      Time fulladj: 0.011505 s
      Time reorder: 47.586437 s
      Time symbfct: 0.035378 s
      Time A to LU: 0.000000 s
      Time numfct : 0.033240 s
      Time malloc : 0.032921 s
      Time total  : 47.733567 s total - sum: 0.034086 s

Statistics:
===========
 < Parallel Direct Factorization with #processors: >         1
 < Numerical Factorization with Level-3 BLAS performance >

 < Linear system Ax = b>  
             #equations:                                     188224
             #non-zeros in A:                                215330
             non-zeros in A (%):                            0.000608

             #right-hand sides:                              0

 < Factors L and U > 
             #columns for each panel:                        128
             #independent subgraphs:                         0
 < Preprocessing with state of the art partitioning metis>
             #supernodes:                                    178361
             size of largest supernode:                      5
             number of nonzeros in L                       212862
             number of nonzeros in U                       4320
             number of nonzeros in L+U                     217182
             gflop   for the numerical factorization:        0.000066

             gflop/s for the numerical factorization:        0.002000

I run eight instances of PARDISO simultaneously on a dual-socket Xeon E5530 machine. Each of them works on a sparse banded matrix (bandwidth=61) similar to what produced the above. Is it normal that METIS takes nearly 50 secs to finish?

8 post / 0 nuovi
Ultimo contenuto
Per informazioni complete sulle ottimizzazioni del compilatore, consultare l'Avviso sull'ottimizzazione

Quoting - styc
Hi,

PARDISO's reordering step looks very slow. Here's some statistics produced by PARDISO itself:

================  PARDISO: solving a real nonsymmetric system  ================


Summary PARDISO: ( reorder to factorize )
================

Times:
======
Time fulladj: 0.011505 s
Time reorder: 47.586437 s
Time symbfct: 0.035378 s
Time A to LU: 0.000000 s
Time numfct : 0.033240 s
Time malloc : 0.032921 s
Time total : 47.733567 s total - sum: 0.034086 s

Statistics:
===========
< Parallel Direct Factorization with #processors: > 1
< Numerical Factorization with Level-3 BLAS performance >

< Linear system Ax = b>
#equations: 188224
#non-zeros in A: 215330
non-zeros in A (%): 0.000608

#right-hand sides: 0

< Factors L and U >
#columns for each panel: 128
#independent subgraphs: 0
< Preprocessing with state of the art partitioning metis>
#supernodes: 178361
size of largest supernode: 5
number of nonzeros in L 212862
number of nonzeros in U 4320
number of nonzeros in L+U 217182
gflop for the numerical factorization: 0.000066

gflop/s for the numerical factorization: 0.002000

I run eight instances of PARDISO simultaneously on a dual-socket Xeon E5530 machine. Each of them works on a sparse banded matrix (bandwidth=61) similar to what produced the above. Is it normal that METIS takes nearly 50 secs to finish?

Hello

There are several ways to reduce the reordering time. First of all, you can try to set iparm(2)=0 (the minimum degree algorithm is used instead of METIS) or iparm(2)=3 (the OpenMP version of the nested dissection from the METIS package is used). It should be noted that iparm(2)=3 is only available in the latestMKL 10.2 Gold.

The another way is to set iparm(5)=1 anddefine your own permutation vector perm like perm(i)=i, i=1,2,.. , n (e.g. no row and column permutations). In this case, PARDISO will not perform reordering and only sympolic factorization will be done. However it might lead to increasing of time in factorization and solver steps.

All the best
Sergey

Hi

I also have the same problem. The PARDISO solver run very slow on the reordering phase. I observed that only 1 core running during reordering even when iparm[1] = 3 (the index is 1 since I called from C). I'm using Intel MKL bundled with Intel Composer 2013. Is it the MKL gold version?

BR

Bui

 

Hi Bui,

yes, Composer 2013 contains MKL version 11.1. This is Gold version.

very slow -- can you give more details?  Did you compare it with another solver? What is the problem size? is that win or lin? 32 or 64 bit.

--Gennady

Yes, the problem size is 287342. Reordering time takes ~56s, factorization is ~20s and back substitution is ~2s. I set iparm[1] = 3. Live monitoring shows just 1 core running during reordering. The system is Debian 64 bit, number of threads=32.

BR

Bui

P/S: I did not compare with other solver in this example, but benchmark with other problems shows that PARDISO gives significant advantage for smaller problem size.

This is the setting for reordering step

        iparm[0] = 1; /* No solver default */
        iparm[1] = 3; /* Fill-in reordering from METIS */
        iparm[2] = omp_get_max_threads();
        iparm[3] = 0; /* no iterative-direct algorithm */
        iparm[4] = 0; /* No user fill-in reducing permutation */
        iparm[5] = 0; /* Write solution into x */
        iparm[6] = 0; /* Not in use */
        iparm[7] = 0; /* Max numbers of iterative refinement steps */
        iparm[8] = 0; /* Not in use */
        iparm[9] = 13; /* Perturb the pivot elements with 1E-13 */
        iparm[10] = 1; /* Use nonsymmetric permutation and scaling MPS */
        iparm[11] = 0; /* Not in use */
        iparm[12] = 1; /* Maximum weighted matching algorithm is switched-on (default for non-symmetric) */
        iparm[13] = 0; /* Output: Number of perturbed pivots */
        iparm[14] = 0; /* Not in use */
        iparm[15] = 0; /* Not in use */
        iparm[16] = 0; /* Not in use */
        iparm[17] = -1; /* Output: Number of nonzeros in the factor LU */
        iparm[18] = -1; /* Output: Mflops for LU factorization */
        iparm[19] = 0; /* Output: Numbers of CG Iterations */

 

one of the potential problem may be the free memory size available while running the application. 

I checked real and spd type on my side wiht the N ~ 400000 and NNZ ~ 1,5*10^6, double precision.  the RAM ~32 Gb, Linux 46 bit,

here is the snipshots from the log I see on my side:

=== PARDISO: solving a symmetric positive definite system ===
The local (internal) PARDISO version is                          : 103911000

..................

Summary: ( reordering phase )    ...............      Total time spent                                                 : 3.267394 s

 

< Linear system Ax = b >
             number of equations:           400000
             number of non-zeros in A:      1582000             

             number of non-zeros in L+U:              188513125  --- this is may be the reason of swapping .... 

Summary: ( factorization phase )  ...... Total time spent                                                 : 5.000359 s

Summary: ( solution phase )  .......... Total time spent                                                 : 0.150976 s

Phase 33  PASSED 

and finally -- here is the total results.
------------------------------------------------------------------
iparm[14] = 388818
iparm[15] = 450473
iparm[16] = 1478810
iparm[17] = 188513125
iparm[60] = 388818
iparm[61] = 450473
iparm[62] = 8968
PARDISO time = 8.423172

 

 

 

 

 

Hi

I ran the problem on the machine of 256GB RAM. Therefore memory should not be the issue. Nevertheless, I benchmarked the reordering phase with various number of threads and obtained this scalability results:

n=4: speedup = 1.13x

n=8: speedup = 1.07x

n=16: speedup = 1.18x

We could say that PARDISO reordering does not scale well for my case. Does it scale for your example? Can we use a pre-computed permutation vector and still doing the parallel factorization later on?

BR

Bui

 

Lascia un commento

Eseguire l'accesso per aggiungere un commento. Non siete membri? Iscriviti oggi