METIS fails with non-diagonal Identity matrix

METIS fails with non-diagonal Identity matrix

Numerical factorization stage seems to break with multi processors run for sparse Identity matrices for METIS or parallel METIS

The details are given here

https://software.intel.com/en-us/node/742812

Regards

Dinesh

 

26 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Have you checked the problem with MKL 2017 u4 or 2018?

what are the differences between MKL 2017 u4 or 2018 as for as PARDISO is concerned?

fails with 2017 u4

and fails with 2018 too

Cita:

Gennady F. (Intel) escribió:

Have you checked the problem with MKL 2017 u4 or 2018?

Hi, fails under both updates.. Any insight would be appreciated

ok, thanks, we will check this case.  Have you checked if this case work with minimum degree algorithm? 

yes, as stated in the link, the error appears only for METIS under mult-processor runs. The minimum degree algorithm is significantly slower compared to METIS

ok, but when i convert from coo to csr format ans pass this input to pardiso ( with matrix checker - on), i see -1 message which mean - the input matrix inconsistency. Could then give us the reproducer you use?

Cita:

Gennady F. (Intel) escribió:

ok, but when i convert from coo to csr format ans pass this input to pardiso ( with matrix checker - on), i see -1 message which mean - the input matrix inconsistency. Could then give us the reproducer you use?

Hi

I do not have a converter. I in fact work on csr format, and wrote this matrix out in coo to test for any mistakes using matlab. I am attaching another case where I have the matrix in csr format. Hopefully that helps.

The first file has col index and the column values, and the other file has offset (but these files are quite simple since it is essentially a diagonal identity matrix in under some matrix permutation)

Dinesh

Attachments: 

AttachmentSize
Downloadtext/plain CSRColVal.txt27.22 KB
Downloadtext/plain CSROffset.txt4.99 KB

It may not dependent on the matrix that specific to my problem. If you create any non-diagonal Identity matrix, and run with METIS it might fail.

Cita:

Gennady F. (Intel) escribió:

ok, but when i convert from coo to csr format ans pass this input to pardiso ( with matrix checker - on), i see -1 message which mean - the input matrix inconsistency. Could then give us the reproducer you use?

do you need anything else from what has been provided? 

Thanks

Dinesh

We quickly checked you matrix on Linux machine and i doesn't see any issues there. Can i ask you to provide iparm set that you use for this test? An of course we will run this matrix on Win

=== PARDISO: solving a real nonsymmetric system ===
Matrix checker is turned ON
0-based array is turned ON
PARDISO double precision computation is turned ON
Parallel METIS algorithm at reorder step is turned ON
Matching is turned ON

Summary: ( reordering phase )
================

Times:
======
Time spent in calculations of symmetric matrix portrait (fulladj): 0.000067 s
Time spent in reordering of the initial matrix (reorder)         : 0.000003 s
Time spent in symbolic factorization (symbfct)                   : 0.013133 s
Time spent in data preparations for factorization (parlist)      : 0.000007 s
Time spent in allocation of internal data structures (malloc)    : 0.011820 s
Time spent in additional calculations                            : 0.005717 s
Total time spent                                                 : 0.030747 s

Statistics:
===========
Parallel Direct Factorization is running on 16 OpenMP

< Linear system Ax = b >
             number of equations:           1035
             number of non-zeros in A:      1035
             number of non-zeros in A (%): 0.096618

             number of right-hand sides:    1

< Factors L and U >
             number of columns for each panel: 72
             number of independent subgraphs:  0
             number of supernodes:                    1035
             size of largest supernode:               1
             number of non-zeros in L:                1035
             number of non-zeros in U:                1
             number of non-zeros in L+U:              1036
time_reorder 0.0550621
=== PARDISO is running in In-Core mode, because iparam(60)=0 ===

Percentage of computed non-zeros for LL^T factorization
 1 %  4 %  5 %  7 %  8 %  9 %  10 %  11 %  13 %  14 %  15 %  16 %  17 %  19 %  20 %  21 %  22 %  23 %  25 %  26 %  27 %  28 %  29 %  30 %  32 %  33 %  34 %  35 %  36 %  38 %  39 %  40 %  41 %  42 %  43 %  59 %  60 %  62 %  63 %  64 %  65 %  67 %  68 %  69 %  71 %  72 %  73 %  74 %  75 %  77 %  78 %  79 %  80 %  82 %  83 %  84 %  85 %  86 %  88 %  89 %  90 %  91 %  93 %  94 %  95 %  96 %  98 %  100 %

=== PARDISO: solving a real nonsymmetric system ===
Single-level factorization algorithm is turned ON

Summary: ( factorization phase )
================

Times:
======
Time spent in copying matrix to internal data structure (A to LU): 0.000000 s
Time spent in factorization step (numfct)                        : 0.028661 s
Time spent in allocation of internal data structures (malloc)    : 0.000029 s
Time spent in additional calculations                            : 0.000002 s
Total time spent                                                 : 0.028692 s

Statistics:
===========
Parallel Direct Factorization is running on 16 OpenMP

< Linear system Ax = b >
             number of equations:           1035
             number of non-zeros in A:      1035
             number of non-zeros in A (%): 0.096618

             number of right-hand sides:    1

< Factors L and U >
             number of columns for each panel: 72
             number of independent subgraphs:  0
             number of supernodes:                    1035
             size of largest supernode:               1
             number of non-zeros in L:                1035
             number of non-zeros in U:                1
             number of non-zeros in L+U:              1036
             gflop   for the numerical factorization: 0.000000

             gflop/s for the numerical factorization: 0.000000

=== PARDISO: solving a real nonsymmetric system ===

Summary: ( solution phase )
================

Times:
======
Time spent in direct solver at solve step (solve)                : 0.005805 s
Time spent in additional calculations                            : 0.000017 s
Total time spent                                                 : 0.005822 s

Statistics:
===========
Parallel Direct Factorization is running on 16 OpenMP

< Linear system Ax = b >
             number of equations:           1035
             number of non-zeros in A:      1035
             number of non-zeros in A (%): 0.096618

             number of right-hand sides:    1

< Factors L and U >
             number of columns for each panel: 72
             number of independent subgraphs:  0
             number of supernodes:                    1035
             size of largest supernode:               1
             number of non-zeros in L:                1035
             number of non-zeros in U:                1
             number of non-zeros in L+U:              1036
             gflop   for the numerical factorization: 0.000000

             gflop/s for the numerical factorization: 0.000000

0: 1013 10.00 1.00 1.00 1.00
Residual 0.000e+00
 

StartThread_ = mkl_get_max_threads();

mkl_set_dynamic(0);

mkl_set_num_threads(SparseConfig_.OpenMpThreads());

 

maxfct = 1;                                            /* Maximum number of numerical factorizations. */

mnum = 1;                                             /* Which factorization to use. */

msglvl = 0;                                             /* Print statistical information in file */

error = 0; 

for (auto i = 0; i < 64; i++) iparm[i] = 0;

 

 

iparm[0] = 1;                                         /* No solver default */

iparm[1] = SparseConfig_.PardisoRO();                                         /* 0: The minimum degree algorithm */

/* 2: The nested dissection algorithm from METIS package*/

/* Numbers of processors, value of OMP_NUM_THREADS */

iparm[2] = 0;

 

iparm[3] = 0;                                         /* No iterative-direct algorithm */

iparm[4] = 0;                                         /* No user fill-in reducing permutation */

iparm[5] = 0;                                         /* Write solution into x */

iparm[6] = 0;                                         /* Not in use */

iparm[7] = SparseConfig_.NumberOfIterativeRefinements();                                         /* Max numbers of iterative refinement steps */

iparm[8] = 0;                                         /* Not in use */

iparm[9] = SparseConfig_.PivotShift();                      /* Perturb the pivot elements with 1E-13 */

iparm[10] = 1;                      /* Use nonsymmetric permutation and scaling MPS */

iparm[11] = 0;                      /* Conjugate transposed/transpose solve */

iparm[12] = 1;                      /* Maximum weighted matching algorithm is switched-on (default for non-symmetric) */

iparm[13] = 0;                      /* Output: Number of perturbed pivots */

iparm[14] = 0;                      /* Not in use */

iparm[15] = 0;                      /* Not in use */

iparm[16] = 0;                      /* Not in use */

iparm[17] = -1;                     /* Output: Number of nonzeros in the factor LU */

iparm[18] = -1;                     /* Output: Mflops for LU factorization */

iparm[19] = 0;                      /* Output: Numbers of CG Iterations */

 

iparm[26] = 1;

 

iparm[34] = 1; /* zero based index */

for (auto i = 0; i < 64; i++) pt[i] = 0;           

 

phase = 11;

mtype = 11;

nrhs = NRhs;

 

Note: I get the crash consistently on debug mode run on MS-VS2012

Same for windows. Can you check that you set iparm[34] to 1 (zero based CSR matrix)?

Thanks,

Alex

Yes it is ; 

iparm[34] = 1; /* zero based index */

are you running it under debug mode, VS2012x64? (Release mode, does not always catch this bug)

If you are setting is lite, you can share with me to test

Regards

Dinesh

Hi,

Is there any resolution on this issue?

Regards

Dinesh

Developers

Any updates on this issue?

Regards

Dinesh

Hi Dinesh,

Could you please try the MKL 2018 update 1 version.  I build one small test case based on the SparseMatrix  you attached in   https://software.intel.com/en-us/node/742812.   It runs ok in MSVS 2017 with multi-threads.

I'm linking the below library:

Searching C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2018.1.156\windows\mkl\lib\intel64_win\mkl_intel_lp64.lib:

1> Searching C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2018.1.156\windows\mkl\lib\intel64_win\mkl_intel_thread.lib:

1> Searching C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2018.1.156\windows\mkl\lib\intel64_win\mkl_core.lib:

1> Searching C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2018.1.156\windows\compiler\lib\intel64_win\libiomp5md.lib:

MKL 2018, minor 0, update 1, version 20180001, build date 20171007

Best Regards,

Ying
non-zero iparm values:
iparm[0] = 1
iparm[1] = 2
iparm[7] = 2
iparm[9] = 13
iparm[10] = 1
iparm[12] = 1
iparm[17] = -1
iparm[18] = -1
iparm[26] = 1
iparm[34] = 1

=== PARDISO: solving a real nonsymmetric system ===
Matrix checker is turned ON
0-based array is turned ON
PARDISO double precision computation is turned ON
METIS algorithm at reorder step is turned ON
Scaling is turned ON
Matching is turned ON

Summary: ( reordering phase )
================

Times:
======
Time spent in calculations of symmetric matrix portrait (fulladj): 0.000018 s
Time spent in reordering of the initial matrix (reorder)         : 0.000007 s
Time spent in symbolic factorization (symbfct)                   : 0.000588 s
Time spent in data preparations for factorization (parlist)      : 0.000002 s
Time spent in allocation of internal data structures (malloc)    : 0.002990 s
Time spent in additional calculations                            : 0.000125 s
Total time spent                                                 : 0.003729 s

Statistics:
===========
Parallel Direct Factorization is running on 4 OpenMP

< Linear system Ax = b >
             number of equations:           587
             number of non-zeros in A:      587
             number of non-zeros in A (%): 0.170358

             number of right-hand sides:    1

< Factors L and U >
             number of columns for each panel: 128
             number of independent subgraphs:  0
< Preprocessing with state of the art partitioning metis>
             number of supernodes:                    587
             size of largest supernode:               1
             number of non-zeros in L:                587
             number of non-zeros in U:                1
             number of non-zeros in L+U:              588
=== PARDISO is running in In-Core mode, because iparam(60)=0 ===

Percentage of computed non-zeros for LL^T factorization
 1 %
 2 %
 3 %
 4 %
 5 %
 6 %
 7 %
 8 %
 9 %
 10 %
 11 %
 12 %
 13 %
 14 %
 15 %
 16 %
 17 %
 18 %
 19 %
 20 %
 21 %
 22 %
 23 %
 24 %
 25 %
 26 %
 27 %
 28 %
 29 %
 30 %
 31 %
 32 %
 33 %
 34 %
 35 %
 36 %
 37 %
 38 %
 39 %
 40 %
 41 %
 42 %
 43 %
 44 %
 45 %
 46 %
 47 %
 48 %
 49 %
 50 %
 51 %
 52 %
 53 %
 54 %
 55 %
 56 %
 57 %
 58 %
 59 %
 60 %
 61 %
 62 %
 63 %
 64 %
 65 %
 66 %
 67 %
 68 %
 69 %
 70 %
 71 %
 72 %
 73 %
 74 %
 75 %
 76 %
 77 %
 78 %
 79 %
 80 %
 81 %
 82 %
 83 %
 84 %
 85 %
 86 %
 87 %
 88 %
 89 %
 90 %
 91 %
 92 %
 93 %
 94 %
 95 %
 96 %
 97 %
 98 %
 99 %
 100 %

=== PARDISO: solving a real nonsymmetric system ===
Single-level factorization algorithm is turned ON

Summary: ( factorization phase )
================

Times:
======
Time spent in copying matrix to internal data structure (A to LU): 0.000000 s
Time spent in factorization step (numfct)                        : 0.034930 s
Time spent in allocation of internal data structures (malloc)    : 0.001086 s
Time spent in additional calculations                            : 0.000005 s
Total time spent                                                 : 0.036021 s

Statistics:
===========
Parallel Direct Factorization is running on 4 OpenMP

< Linear system Ax = b >
             number of equations:           587
             number of non-zeros in A:      587
             number of non-zeros in A (%): 0.170358

             number of right-hand sides:    1

< Factors L and U >
             number of columns for each panel: 128
             number of independent subgraphs:  0
< Preprocessing with state of the art partitioning metis>
             number of supernodes:                    587
             size of largest supernode:               1
             number of non-zeros in L:                587
             number of non-zeros in U:                1
             number of non-zeros in L+U:              588
             gflop   for the numerical factorization: 0.000000

             gflop/s for the numerical factorization: 0.000000

=== PARDISO: solving a real nonsymmetric system ===

Summary: ( solution phase )
================

Times:
======
Time spent in direct solver at solve step (solve)                : 0.000082 s
Time spent in additional calculations                            : 0.000833 s
Total time spent                                                 : 0.000915 s

Statistics:
===========
Parallel Direct Factorization is running on 4 OpenMP

< Linear system Ax = b >
             number of equations:           587
             number of non-zeros in A:      587
             number of non-zeros in A (%): 0.170358

             number of right-hand sides:    1

< Factors L and U >
             number of columns for each panel: 128
             number of independent subgraphs:  0
< Preprocessing with state of the art partitioning metis>
             number of supernodes:                    587
             size of largest supernode:               1
             number of non-zeros in L:                587
             number of non-zeros in U:                1
             number of non-zeros in L+U:              588
             gflop   for the numerical factorization: 0.000000

             gflop/s for the numerical factorization: 0.000000

Input and solution norms:
||A|| = 24.2281
||b|| = 24.2281
||x|| = 24.2281
||Ax-b|| = 0

Press any key to continue . . .

Hello Ying H,

I have already tried 2018 and 2017 Update 4 as stated in the threads above. The case always fails when tested under debug mode (With release mode it becomes bit rare phenomenon)

Regards

Dinesh

Hi Dinesh,

I means the latest version MKL 2018 update 1 version (not 2018 and 2017 update 4) .   i seems be able to see the crash  with early version.

Best Regards,

Ying

HI Danish,

Can you put these lines in your code and rerun example with last MKL that you have?

char buf[buf_len];

MKLVersion ver;

 

printf("\nIntel(R) MKLrelease version:\n");

MKL_Get_Version_String(buf, buf_len);

printf("%s\n", buf);

 

MKL_Get_Version(&ver);

printf("    Major version:          %d\n", ver.MajorVersion);

printf("    Minor version:          %d\n", ver.MinorVersion);

printf("    Update version:         %d\n", ver.UpdateVersion);

printf("    Product status:         %s\n", ver.ProductStatus);

printf("    Build:                  %s\n", ver.Build);

printf("    Platform:               %s\n", ver.Platform);

printf("    Processor optimization: %s\n", ver.Processor);

 

I see the issue on  windows MKL2017.2 and MKL2017.3, but it passed correctly on MKL2017.4 

ok I will do that. I recall that I have tried 2018 Update1 but faced with same issue

Hi Alex

The snapshot can be found at 

https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/...

(I do not know how to attach image to reply)

Fails with both versions

Regards

Dinesh

Could you please submit a ticket to our support site: https://www.intel.com/supporttickets

Here are some detail steps: 

https://software.intel.com/sites/default/files/managed/97/ce/SubmittingSupportIssue.pdf

Our support team will work with you.

I tried, but the website does not allow me to enter my detail

Dinesh

Leave a Comment

Please sign in to add a comment. Not a member? Join today