Error in PARDISO ( numerical_factorization) error_num= -987

Error in PARDISO ( numerical_factorization) error_num= -987

Hello,

I try to solve a sparse system with pardiso, using the evaluation version of the Beta of the MKL
on Windows 7, 64.

As I have to enable out-of-core if necessary I initialize the parameters as follows:

m_piparm[0] = 1; // No solver default
m_piparm[1] = 2;
m_piparm[9] = 0;
m_piparm[17] = -1;
m_piparm[20] = 1;
m_piparm[26] = 1;
m_piparm[59] = 1; // out off core if necessary

Here is the trace of the pardiso run. Any help is appreciated, and if necessary
I could dump the sparse symmetric matrix in a file and make it available.

Best regards,

Andreas Fabri

=== PARDISO is running in Out-Of-Core mode, because iparam(60)=1 and there is no
t enough RAM for In-Core ===

================ PARDISO: solving a symm. posit. def. system ================

Summary PARDISO: ( reorder to reorder )
================

Times:
======
Time fulladj: 1.618750 s
Time reorder: 48.901887 s
Time symbfct: 6.202610 s
Time malloc : 1.084790 s
Time total : 85.589953 s total - sum: 27.781916 s

Statistics:
===========
< Parallel Direct Factorization with #processors: > 1
< Numerical Factorization with Level-3 BLAS performance >

< Linear system Ax = b>
#equations: 2797565
#non-zeros in A: 23286826
non-zeros in A (): 0.000298

#right-hand sides: 1

< Factors L and U >
#columns for each panel: 128
#independent subgraphs: 0
< Preprocessing with state of the art partitioning metis>
#supernodes: 1300322
size of largest supernode: 3421
number of nonzeros in L 604905508
number of nonzeros in U 1
number of nonzeros in L+U 604905509
Percentage of computed non-zeros for LL^T factorization
0 %
1 %
.
.
44 %
Fseek failed
*** Error in PARDISO ( numerical_factorization) error_num= -987
PARDISO Internationalization error! Message -987 is unknown

================ PARDISO: solving a symm. posit. def. system ================

Summary PARDISO: ( factorize to factorize )
================

Times:
======
Time A to LU: 0.000000 s
Factorization: Time for writing to files : 0.000000
Factorization: Time for reading from files : 0.000000
Time numfct : 0.000000 s
Time malloc : 0.053992 s
Time total : 105.836084 s total - sum: 105.782091 s

Statistics:
===========
< Parallel Direct Factorization with #processors: > 1
< Numerical Factorization with Level-3 BLAS performance >

< Linear system Ax = b>
#equations: 2797565
#non-zeros in A: 23286826
non-zeros in A (): 0.000298

#right-hand sides: 1

< Factors L and U >
#columns for each panel: 128
#independent subgraphs: 0
< Preprocessing with state of the art partitioning metis>
#supernodes: 1300322
size of largest supernode: 3421
number of nonzeros in L 604905508
number of nonzeros in U 1
number of nonzeros in L+U 604905509
gflop for the numerical factorization: 886.436031

The error code is : -4

25 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

This is just a guess -- I have no experience with huge matrices--:

An out-of-core solver needs to write and read large temporary files, so the 'fseek error' suggests that you look at the possibility that the program ran out of disk space while processing the temporary files.

Hi,
This problem could occur when during LL^T decomposition zero or negative diagonalelement appeared. Try to change mtype =2 on mtype = -2, probably it could resolve the problem.With best regards,Alexander Kalinkin

Switching to mtype=-2 did not help. Here is the output.
As you are from Intel. the error_num -987 should help
you to help me, shouldn't it?

best regards,

andreas

The file .\pardiso_ooc.cfg was not opened

=== PARDISO is running in Out-Of-Core mode, because iparam(60)=1 and there is no
t enough RAM for In-Core ===

================ PARDISO: solving a symmetric indef. system ================

Summary PARDISO: ( reorder to reorder )
================

Times:
======
Time fulladj: 1.662469 s
Time reorder: 49.211687 s
Time symbfct: 6.262312 s
Time malloc : 1.055497 s
Time total : 86.830331 s total - sum: 28.638366 s

Statistics:
===========
< Parallel Direct Factorization with #processors: > 1
< Numerical Factorization with Level-3 BLAS performance >

< Linear system Ax = b>
#equations: 2796570
#non-zeros in A: 23279108
non-zeros in A (): 0.000298

#right-hand sides: 1

< Factors L and U >
#columns for each panel: 128
#independent subgraphs: 0
< Preprocessing with state of the art partitioning metis>
#supernodes: 1300079
size of largest supernode: 3576
number of nonzeros in L 588215272
number of nonzeros in U 1
number of nonzeros in L+U 588215273
Percentage of computed non-zeros for LL^T factorization

Hi Andreas
The error=-987 is internal error that couldn't appeared in normal situation. Could you check your matrix by setting iparm(27) = 1 in Fortran (iparm[26] in C) and size of free memory on hard disk (you must have around 8Gb free space on HDD). If everything is correct could you send testcase (example with matrix that chrashed) to investigate problem?With best regards,Alexander Kalinkin

Andreas, how about free space availble on your system?nnz is ~ 588215272 will require ~ 5 Gb memory available--Gennady

Hello,

I have 83 GB available, so disk space should not be the problem.

I also had alreadyt set iparm[26]. For completeness, here are the other parameters I've set.
Could you verify that they are correct. I find it rather error-prone that when I only want
to change one parameter(as out of core), I must figure out for all the others, what the default is.

m_piparm[0] = 1; // No solver default
m_piparm[1] = 2;
m_piparm[9] = 8; // iparm(10)- pivoting perturbation.
m_piparm[17] = -1;
m_piparm[20] = 1;
m_piparm[26] = 1;
m_piparm[59] = 1; // out off core if necessary

Do you have any standard file format that I should use for storing the system?

Best regards,

andreas

Andreas,What MKL beta version you are evaluate?Could you check how it will works with clear OOC mode ( iparm[59] == 2) instead of hybrid mode you are using.--Gennady

I downloaded w_mkl_10.3.0.055.exe

Concerning the temporary file, in which directory does it go?
I ask because I am wondering what happens when the virus scanner
(Norton) tries to check it.

andreas

by default -the OOC PARDISO uses the current directory for storing data.

Thierry,the same problem with OOC or hybryd mode?--Gennady

I tested the two modes (iparam[59]=1 and iparam[59]=2) without success.

I am using MKL 10.2.5.035 with Visual Studio 2008 on Windows 7 x64

Thierry

well and you had the similar error == -987?

Here is the log :

ooc_path got by Env = C:\Dev\OptimTopo\Code\ooc_file
ooc_max_core_size got by Env = 3000
ooc_keep_file got by Env = 1

=== PARDISO is running in Out-Of-Core mode, because iparam(60)=2 ===

Percentage of computed non-zeros for LL^T factorization
0 %
1 %
2 %
...
40 %
41 %
42 %
Fseek failed
*** Error in PARDISO ( numerical_factorization) error_num= -987
*** Error in PARDISO: zero pivot

================ PARDISO: solving a real struct. sym. system ================

Summary PARDISO: ( reorder to factorize )
================

Times:
======
Time fulladj: 0.134167 s
Time reorder: 4.507111 s
Time symbfct: 2.230421 s
Time parlist: 2.000479 s
Time A to LU: 0.000000 s
Factorization: Time for writing to files : 0.000000
Factorization: Time for reading from files : 0.000000
Time numfct : 0.000000 s
Time malloc : 10.436600 s
Time total : 294.919680 s total - sum: 275.610902 s

Statistics:
===========
< Parallel Direct Factorization with #processors: > 4
< Numerical Factorization with BLAS3 and O(n) synchronization >

< Linear system Ax = b>
#equations: 408483
#non-zeros in A: 31756329
non-zeros in A (): 0.019032

#right-hand sides: 1

< Factors L and U >
#columns for each panel: 96
#independent subgraphs: 0
< Preprocessing with state of the art partitioning metis>
#supernodes: 39349
size of largest supernode: 9840
number of nonzeros in L 626636223
number of nonzeros in U 605550762
number of nonzeros in L+U 1232186985
gflop for the numerical factorization: 5644.826505

ERROR during symbolic and numerical factorization: -4*** Error in PARDISO (read/write OOC data file) error_num= 0

I tried MKL 10.3 Beta and i had the same error.

With iparam[27]=0, I got :

=== PARDISO is running in Out-Of-Core mode, because iparam(60)=2 ===

Percentage of computed non-zeros for LL^T factorization
0 %
1 %
2 %
...
40 %
41 %
42 %
Fseek failed
*** Error in PARDISO ( numerical_factorization) error_num= -987
PARDISO Internationalization error! Message -987 is unknown

With iparam[27]=1, I got :

=== PARDISO is running in Out-Of-Core mode, because iparam(60)=2 ===

Percentage of computed non-zeros for LL^T factorization
0 %
1 %
2 %
...
83 %
84 %
85 %
Fseek failed
Fseek failed
Fseek failed
*** Error in PARDISO ( numerical_factorization) error_num= -987
PARDISO Internationalization error! Message -987 is unknown

This can perhaps help you...

Hello guys,because it is completely unknown to us the problem and our internal tests do not reproduce it,
I can only ask to send us this information.
At least this will allow us to significantly speed up this error investigation.--Gennady

You can download my matrix (ia, ja and a arrays) here : http://lesommer.free.fr/matrix_ed_lesommer.zip
I know that my matrix has zero elements.

Thanks, we will check and let you know if any update.

Hello,

We downloaded matrix and successfully factorized it with MKL10.2.5 (see log below).

May be the problem is in free space on hard disc. Number of LU-factors is 1 232 186 985. To store them on hard disc, MKL OOC PARDISO requires about 12GB free space (1 232 186985 *8Byte).

How much free space is on hard disc? Also, please print out iparam[63]. It is internal parameter, which can help us identify the version of MKL PARDISO.

************************************ ooc_max_core_size got by Env = 3000

The file .\pardiso_ooc.cfg was not opened

=== PARDISO is running in Out-Of-Core mode, because iparam(60)=2 ===

Percentage of computed non-zeros for LL^T factorization
0 %

1 %

2 %

3 %

...

98 %

99 %

100 %

================ PARDISO: solving a real struct. sym. system ================

Summary PARDISO: ( reorder to factorize )

================

Times:

======

Time fulladj: 0.115263 s

Time reorder: 3.636191 s

Time symbfct: 3.471022 s

Time parlist: 0.321256 s

Time A to LU: 0.000000 s

Factorization: Time for writing to files : 0.000000

Factorization: Time for reading from files : 0.000000

Time numfct : 428.636476 s

Time malloc : 0.586887 s

Time total : 440.670663 s total - sum: 3.903568 s

Statistics:

===========

< Parallel Direct Factorization with #processors: > 4

< Numerical Factorization with BLAS3 and O(n) synchronization >

< Linear system Ax = b>

#equations: 408483

#non-zeros in A: 31756329

non-zeros in A (): 0.019032

#right-hand sides: 1

< Factors L and U >

#columns for each panel: 96

#independent subgraphs: 0

< Preprocessing with state of the art partitioning metis>

#supernodes: 39349

size of largest supernode: 9840

number of nonzeros in L 626636223

number of nonzeros in U 605550762

number of nonzeros in L+U 1232186985

gflop for the numerical factorization: 5644.826505

gflop/s for the numerical factorization: 13.169263

Hello,

The free space on the hard disk is not the problem. I have 100Go free.

I think I found the problem. This comes from the library mkl_intel_thread.lib.
With mkl_intel_thread.lib => OK
With mkl_intel_thread_dll.lib => Error -987

Now it works for me with the versions : 10.2.5, 10.2.6 and 10.3.0 beta

Thierry

Thierry,Could you please clarifyhow did you link application when the erro_num = -987 has been encountered?then, we will try to reproduce the problem on our side also.--Gennady

Tierry,We 've checked and reproduced the issue on our side. Actually the problem affects dynamiclly linked libraries. The cause of the problem has been found and will fix soon. I will inform when the fix will available.thanks agian for the report.--Gennady

Sorry for not replying earlier to your previous message.
I'm glad to hear that you managed to reproduce this problem.

Thierry

we have already fixed that issue but I am not aware when this fix will available for external customer. in any case - many thanks for the good test case :).--Gennady

Hi Thierry,the problem has been fixed in the latest version of MKL 10.2 Update 7. Could you please check if the problem is still there and let us know.--Gennady

Leave a Comment

Please sign in to add a comment. Not a member? Join today