Is PARDISO an out of core solver?. Then, can it access an out of core stored matrix?

Where can I find information on how the matrix has to be stored for out of core solution?

Is there other function in MKL that can solve a linear symmetrix system out of core?

Thanks

# Pardiso Out of Core

## Pardiso Out of Core

For more complete information about compiler optimizations, see our Optimization Notice.

Quoting - atpq2680

Where can I find information on how the matrix has to be stored for out of core solution?

Is there other function in MKL that can solve a linear symmetrix system out of core?

Thanks

Hi!

PARDISO has Out-Of-Core (OOC) mode, butit is assumed thatinput matrix should fit in RAM. Inthis mode PARDISO stores on disk only LU factors and some working arrays.

BTW, what kind of matrix do you solve? Dense or sparse? Please let us know the parameters of solving task.

Best regards,

Sergey

Quoting - Sergey Pudov (Intel)

* Hi!*

PARDISO has Out-Of-Core (OOC) mode, butit is assumed thatinput matrix should fit in RAM. Inthis mode PARDISO stores on disk only LU factors and some working arrays.

BTW, what kind of matrix do you solve? Dense or sparse? Please let us know the parameters of solving task.

*Best regards,Sergey*

How can intel claim PARDISO is an out of core solver when you must have the matrix in RAM ? I have an skyline columnwise stored matrix and I can store it in disk in any format supported by PARDISO. I wonder if MKL has a real out of core solver?. Thank you very much for your time.

Quoting - atpq2680

How can intel claim PARDISO is an out of core solver when you must have the matrix in RAM ? I have an skyline columnwise stored matrix and I can store it in disk in any format supported by PARDISO. I wonder if MKL has a real out of core solver?. Thank you very much for your time.

Hi,

lets me try to clarify this question. Regular (InCore) version of PARDISO uses RAM for solving SLAE and doesn't use hard disk. Very often, the input matrix is very sparse, but LU factors are not so sparse. As result, these factors don't place in RAM and this problem cannot be solved by regular version of PARDISO. To handle with such systems, we developed PARDISO version, which uses hard-disk for storing LU-factors. This version called Out-Of-Core PARDISO. If you solve dense matrix, please use LAPACK routines. If you have sparse matrix, which doesn't placed in RAM, you can submit feature request against OOC PARDISO.

Quoting - atpq2680

*How can intel claim PARDISO is an out of core solver when you must have the matrix in RAM ? I have an skyline columnwise stored matrix and I can store it in disk in any format supported by PARDISO. I wonder if MKL has a real out of core solver?. Thank you very much for your time.*

I wanted to do that too once. But I found that converting from skyline to the MKL sparse format made it so small that it could fit in RAM. Skyline is pretty huge for spare matrices in my application (finite element analysis). But PARDISO OOC solved the RAM problem. But yea for genuinely denser or bigger matrices MKL can't help.

Quoting - kallog

Quoting - atpq2680

* How can intel claim PARDISO is an out of core solver when you must have the matrix in RAM ? I have an skyline columnwise stored matrix and I can store it in disk in any format supported by PARDISO. I wonder if MKL has a real out of core solver?. Thank you very much for your time.*

I wanted to do that too once. But I found that converting from skyline to the MKL sparse format made it so small that it could fit in RAM. Skyline is pretty huge for spare matrices in my application (finite element analysis). But PARDISO OOC solved the RAM problem. But yea for genuinely denser or bigger matrices MKL can't help.

kallog,

if you are really interesting to solve very big matrices like you wrote "* But yea for genuinely denser or bigger matrices MKL can't help.*", then could you please submit the Feature Request at <https://premier.intel.com/>.

If you do not have account to access this channel, please complete your account registration at https://registrationcenter.intel.com/

--Gennady

I found that MKL_PARDISO_OOC_MAX_CORE_SIZE must exceed the value reported by iparm(15) after phase 11 (only reordering and symbolic factorisation). iparm(15) reports "peak memory symbolic factorization". In my case, the matrix in csr format requires 232 MB, whereas iparm(15) reports 878 MB.

As explained, the LU factors appear to be stored on disk, since the size of the ooc_temp files matches 8 bytes/entry * number of nonzeros as mentioned by the statistics obtained via msglvl=1.

For a larger matrix, 248 MB in size, the program crashes at phase 11 with error -2. iparm(15) reports 939 MB:

Peak memory symbolic factorization (MB) = 939

Permanent memory symbolic factorization (MB) = 0

Memory numerical factorization and solution (MB) = 1668

total peak memory solver consumption (MB) = 1668

The program closes with:

ooc_max_core_size got by Env = 2000

The file .\pardiso_ooc.cfg was not opened

*** error PARDISO ( insufficient_memory) error_num= -800

*** error pardiso (memory allocation) STRUC_FI, size to allocate: 362146752 bytes

total memory wanted here: 962126 kbyte

symbolic (max): 962126 symbolic (permanent): 2 real(incl. 1 factor):================ PARDISO: solving a symm. posit. def. system ================

Summary PARDISO: ( reorder to reorder )

================Times:

======Time fulladj: 0.309081 s

Time reorder: 2.700492 s

Time symbfct: 2.213089 s

Time malloc : 0.484451 s

Time total : 5.707339 s total - sum: 0.000227 sStatistics:

===========

< Parallel Direct Factorization with #processors: > 1

< Numerical Factorization with Level-3 BLAS performance >< Linear system Ax = b>

#equations: 737658

#non-zeros in A: 20402451

non-zeros in A (%): 0.003749

#right-hand sides: 1< Factors L and U >

#columns for each panel: 10

#independent subgraphs: 0

< Preprocessing with state of the art partitioning metis>

#supernodes: 101573

size of largest supernode: 2514

number of nonzeros in L 217919528

number of nonzeros in U 1

number of nonzeros in L+U 217919529

I am not sure why the error happens; I use 32-bit windows Vista. Sysinternal's Process Explorer tells that during assembly, the virtual memory size 2142 MB, and Workset similar. After assembly, and deallocating the memory, just before starting phase 11, the Virtual memory size is still(?) 2139 MB, but the working set 869 MB. After that, pardiso is invoked and stops with the above error.

Do you have any clues for me on how to proceed?

Hi!

What version of MKL do you use? Could you print the iparm(64)after step=11and provide us with result?

The version I use is 10.0.1.015.

iparm(64) returns 0.

To avoid constructing the matrix multiple times, the program writes the csr matrix to file. Bootstrapping the program by reading the matrix from file, Procexp reports a Virtual Size of 938 MB and a working set of 827 MB. The program succeeds in doing the calculation, and is using a .lnz file of 1.743E+9 bytes and an .idx file of 98E+6 bytes. The size neatly matches 8 * 217.9E6 for the nonzeros.

Re-running the program from scratch reproduces the error, so it seems that the problem size is on the edge of feasibility.

I am thinking of strategies to solve large problems:

1. in-program, as is done now. This means that part of the memory is occupied by the problem data and can not be used by pardiso.

2. solving Ku=f by writing K and f to disk, and invoking a stand-alone solver; this maximizes the memory available to pardiso.

3. doing more out-of-core. This will require more programming effort. Genny Fedorov suggests to submit a feature request (#5).

Switching to a more recent MKL version will give more room: http://software.intel.com/en-us/articles/pardiso-use-half-the-memory-now/

Currently, I only employ strategy 1. Is it possible to predict how much memory is required?

**Is it possible to predict how much memory is required?**

it impossible exactly to predict how many memory is required to calculation,because of it dependents on combination of sparsity pattern and type of input matrix.

--Gennady

For example - with 10.2 u4, it will report: iparm[63] = 102000110

I expected that that would be the case, but was not sure. Thanks for the confirmation.

In a previous post (#6), the program finished with an error:

*** error pardiso (memory allocation) STRUC_FI, size to allocate: 362146752 bytes

total memory wanted here: 962126 kbyte

symbolic (max): 962126 symbolic (permanent): 2 real(incl. 1 factor)

Do I understand correctly that the program desires 962 MB, and 362 MB is available, in other words, that the program would work given the missing 600 MB of storage space? Or is the memory allocated multiple times, as required, during the solution, in other words, would the program succeed in this memory allocation only to fail at the next attempt at memory allocation?

In the first case, I could test for the difference in desired and available memory, see if it fits in the stand-alone solver strategy, and tell the user to invoke that. If not, I can tell the user that is model is to large and its size must be reduced.

Is the regular pardiso which you can download from the pardiso-project homepage also an out of core solver, or is it in core and only intel provides a specialized version which works ooc?

thanks

Gennady,

I read your response with great interest. You wrote "if you are really interesting to solve very big matrices like you wrote "* But yea for genuinely denser or bigger matrices MKL can't help.*",then could you please submit the Feature Request at <https://premier.intel.com/>"

^{}

Does this mean that you do have some separate out-of-core solver that can be accessed(with extra fund, I guess?) Is this on Intel's product list or Intel wants to do it on a project-specific basis?

I do have some very big matrix problem that I need to solve, currently with a 30,000x30,000 dense matrix to invert, and a sparse matrix in the order of 450,000 to solve. If submit the request to premier.intel.com can help me solve the problem, I certainly would do it. (I do have an account at premier). Please let me know.

Best regards,

Nan Deng

Hello,

what might this error be? Is it really to do with memory? I am able to run the same matrix (and in fact even larger matrices with more than 1.4 x 10^9 elements, using the memory we have (64G) in ooc mode (and this particular matrix even in in-core mode). The difference here is that my inputs (matrix data - input values, ia, ja and rhs values etc.) are memory-mapped. I am able to run smaller matrices with my inputs memory-mapped in the same way (i.e. with the same executable), but when I get to this size, I get this memory error - which really can't be about memory, since now there should be even more memory available given that my inputs are now memory mapped (and without the memory mapping I am able to run this and larger - as mentioned before - matrices). The negative numbers in the message from PARDISO seem to indicate some sort of overflow issue, though I haven't reached the limits of an int yet (next on my agenda is to move to pardiso_64 and work with a matrix that is bigger than 2x10^9 in terms of number of non-zeros).

There is close to 62G of free memory available on the machine when this happens, this is not a real memory issue - an incorrect error message?

Unsuccessful run, message:

gcc -o pardiso pardiso_sym_c.c -I/home/sudha.rangan.ctr/intel-beta/mkl/include -L/home/sudha.rangan.ctr/intel-beta/mkl/lib/include -L/home/sudha.rangan.ctr/intel-beta/mkl/lib/intel64 -L/home/sudha.rangan.ctr/intel-beta/lib/intel64 -liomp5 -lmkl_solver_lp64 -Wl,--start-group -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -Wl,--end-group -lpthread -lm

pardiso_sym_c.c: In function :

pardiso_sym_c.c:91: warning: incompatible implicit declaration of built-in function

pardiso_sym_c.c:92: warning: incompatible implicit declaration of built-in function

[sudha.rangan.ctr@slot04 pardiso]$ !./pardiso

./pardiso matrix512b

You entered matrix512b

Nonzero elements: 361304064 Size (number of equations): 884736

first value = -4422846.000000

first ia index = 0

first ja index = 0

first rhs = -0.121058

a0: -4.422846e+06 a_end: -9.250865e+03

ia 0: 0 ai end: 361303848

ja 0: 0 ja end: 884735

b 0: -0.121058 b end: -0.004208

ooc_max_core_size got by Env = 54000

The file ./pardiso_ooc.cfg was not opened

*** Error in PARDISO ( insufficient_memory) error_num= -206

*** Error in PARDISO memory allocation: FACT_ADR, size to allocate: -1404534784 bytes

total memory wanted here: -1364687 kbyte

symbolic (max): -1364687 symbolic (permanent): 0

real(including 1 factor): 0

Peak Mem needed... 0

ERROR during symbolic factorization: -2[sudha.rangan.ctr@slot04 pardiso]$ vi pardiso_sym_c.c

I can send/post more information (output of successful runs, code etc.). I am using 3.0 beta.

Thanks and any help appreciated. We do have an older purchased version (2.x) of the mkl, but am currently using an evaluation copy of 10.3.0-beta.

Sudha Rangan

Hello - I think I see the answer to my question. It is something to do with C-style indexing. Since I was using the mmapped files I decided to use C-style indexing and leave my input the way I had generated it (and not use the FORTRAN style). For some reason that's causing a problem.

Thanks,

Sudha

Hello, I'm using 10.3.0.050 (beta). I got over the last problem as indicated. I then used the 64-bit interface (compiled with the MKL_ILP64 flag and sent in MKL_INTs and MKL_INT* to the pardiso call) for an even larger matrix and then it died when iparm[1] (C-style - i.e. Fill-in reducing ordering for the input matrix) was 2 ( METIS), so I changed things to use 0(minimum degree) and it ran producing seemingly correct results. Pardiso is looking very promising and we intend to use it for problems of much, much greater size (10^10 unknowns). While given that the matrix is being held in memory by pardiso (couldn't the input matrix we supply be re-used? The input matrix I supply is memory mapped), I need to figure out how to become an intel premier customer and have that be OOC too (other than the factors). While we may still be able to use pardiso as is on our scope of problem on super-computers, it would be nice to be able to use it on machines with 64GB-256GB of memory for our problem size.

I also tried running things with an even larger problem size and I keep getting the -180 error during reordering. I have looked at other threads and saw that this may be to do with linking to incorrect libraries, but I have tried every possible combination and still get this error. I also tried both 10.2.6.038 (With 10.2.6.038 I get the -800 error with insufficient memory) and 10.3.0 beta. Would anyone have an idea what this might be? Could it be to do with the large sizes? Pardiso really can't handle these large sizes (It did handle half these sizes (in numRows and NNZs)? Again, my input matrix is memory-mapped, I am using OOC (set param[59] (C-style) to 2) and have 63+GB of memory available). My matrix occupies only about 46G of space and since I have memory mapped the matrix, I would expect that pardiso has enough memory to hold its copy of the matrix in memory - since the factors are OOC).

You entered matrix4096bf

Nonzero elements: 2890432512 Size (number of equations): 7077888

first value = -4422846.000000

first ia index = 1

, 2nd ia index = 217first ja index = 1

first rhs = -0.121058

a0: -4.422846e+06 a_end: -9.250865e+03

ia 0: 1 ai end: 2890432297

ja 0: 1 ja end: 7077888

b 0: -0.121058 b end: -0.004208

ooc_max_core_size got by Env = 256000

The file ./pardiso_ooc.cfg was not opened

*** Error in PARDISO ( reordering_phase) error_num= -180

*** error PARDISO: reordering, symb. factorization

================ PARDISO: solving a real struct. sym. system ================

Summary PARDISO: ( reorder to reorder )

================

Times:

======

Time fulladj: 283.975595 s

Time reorder: 5.503694 s

Time symbfct: 23.955411 s

Time malloc : 269.677817 s

Time total : 602.334729 s total - sum: 19.222212 s

Statistics:

===========

< Parallel Direct Factorization with #processors: > 8

< Numerical Factorization with BLAS3 and O(n) synchronization >

< Linear system Ax = b>

#equations: 7077888

#non-zeros in A: 2890432511

non-zeros in A (%): 0.005770

#right-hand sides: 1

< Factors L and U >

< Preprocessing with multiple minimum degree, tree height >

< Reduction for efficient parallel factorization >

#columns for each panel: 72

#independent subgraphs: 0

#supernodes: 125067

size of largest supernode: 810

number of nonzeros in L 5062328928

number of nonzeros in U 4636596168

number of nonzeros in L+U 9698925096

ERROR during symbolic factorization: -3

This looks like it really might be something to do with the large size in terms of non-zeros? Even though I'm using the 64-bit interface?

If an intel engineer is reading this, would love to get a response.

Thanks,

Sudha

Hello!

This failure is expected in 10.3.0.beta because ILP64 version of METIS was implemented since 10.3.0Gold and 10.2.6. Could you provide us with log of failure of MKL10.2.6 ? How much size of swap does have your system? The fact is that METIS uses additional memory to reorder input matrix. Probably there is no enough memory for it.

Sergey, thank you for your response. We actually don't have any swap space set up on the machine, we have 64GB of RAM and that's it - I can make sure we set up enough swap space on the machine (maybe 128GB or more). But using the minimum degree algorithm instead of METIS also did not work, should that be the case (probably is if that also needs a significant amount of memory)?

So if we purchase the latest version of the MKL, it will be 10.3.0 Gold?

With METIS, I get a segmentation fault (immediately after the message about not opening the file ./pardiso_ooc.cfg).

With minimum degree, as posted before, the error message is as below. I will try things again as soon as we have swap space set up.

You entered matrix4096bf

Nonzero elements: 2890432512 Size (number of equations): 7077888

first value = -4422846.000000

first ia index = 1

, 2nd ia index = 217first ja index = 1

first rhs = -0.121058

a0: -4.422846e+06 a_end: -9.250865e+03

ia 0: 1 ai end: 2890432297

ja 0: 1 ja end: 7077888

b 0: -0.121058 b end: -0.004208

ooc_max_core_size got by Env = 256000

The file ./pardiso_ooc.cfg was not opened

*** Error in PARDISO ( reordering_phase) error_num= -180

*** error PARDISO: reordering, symb. factorization

================ PARDISO: solving a real struct. sym. system ================

Summary PARDISO: ( reorder to reorder )

================

Times:

======

Time fulladj: 283.975595 s

Time reorder: 5.503694 s

Time symbfct: 23.955411 s

Time malloc : 269.677817 s

Time total : 602.334729 s total - sum: 19.222212 s

Statistics:

===========

< Parallel Direct Factorization with #processors: > 8

< Numerical Factorization with BLAS3 and O(n) synchronization >

< Linear system Ax = b>

#equations: 7077888

#non-zeros in A: 2890432511

non-zeros in A (%): 0.005770

#right-hand sides: 1

< Factors L and U >

< Preprocessing with multiple minimum degree, tree height >

< Reduction for efficient parallel factorization >

#columns for each panel: 72

#independent subgraphs: 0

#supernodes: 125067

size of largest supernode: 810

number of nonzeros in L 5062328928

number of nonzeros in U 4636596168

number of nonzeros in L+U 9698925096

ERROR during symbolic factorization: -3

Thanks,

Sudha

Sudha, In MKL10.3.0 Beta minimum degree couldnt work too.

I think that the latest available version is MKL10.2.6 and 10.3.Beta (http://software.intel.com/en-us/forums/intel-math-kernel-library/ ).

My recommendations are:

1) Use MKL10.2.6

2) Set 128G swap

3) Set MKL_PARDISO_OOC_MAX_CORE_SIZE not more than size of free RAM. (As I see, size of input matrix is about 48G. So free RAM is just 12000. )

4) Print iparm(57) and iparm(64) after reordering step and provide us with log.

An additional question: Could you variety the size of problem? What is the largest problem, which you can solve by PARDISO ILP64?

Sergey - again thanks for your response. I will be sure to use 10.2.6, follow your suggestion on setting the MKL_PARDISO_OOC_MAX_CORE_SIZE and see what happens (as soon as we have our swap space set up).

The largest size of problem I have been able to solve thus far on this hardware:

# of rows: 3538944 and

# of non-zeros: 1445216255

As you can see, I am just doubling this matrix to get the one that fails. These are real, structurally symmetric matrices

Since I hadn't gotten to the ~2^31 limit yet in terms of NNZs, I was just hoping that there was no additional problem relating to int (vs. int64s) that this run was uncovering, hopefully there isn't and I will find out as soon as we can do the larger run (it is more difficult for me to produce a matrix that is closer to 2^31 in NNZs, while simple to produce the larger one with > 2.8 x 10^9 non-zeros), else I would try it with a matrix larger than the one that ran successfully, but smaller than the one I'm trying to run. (Our ultimate aim is to actually solve a matrix whose size in terms of non zeros is closer to 500x10^9).

Thanks,

Sudha

Sergey - hello. It took a while to follow up, but we just recently obtained the resources to re-run this test. On a machine with 500G of RAM (no swap space, but humungous RAM ), using iparm(2) = 2 (nested dissection algorithm from Metis (actually this is iparm[1] here, C-style indexing), the matrix containing > 2.8x10^9 non-zeros and > 7x10^6 rows ran successfully, very quickly (under 1/2 an hour, I think) and produced seemingly correct results..However, the same matrix still cannot be run in Out of Core mode, regardless of which reordering algorithm is used (Metis/minimum tree etc.). I get either the same error as reported before or a segmentation fault as before. (Smaller matrices are fine). Either in OOC mode, there is some problem when we cross the 32-bit threshold of roughly 2.2 x 10^9 non-zeros - or something (even with OOC_MKL_MAX_CORE_SIZE set to 256000 etc. - and now we do have enough memory for me to be able to set to such big sizes)..

Thanks,Sudha

Is there the same error if OOC_MKL_MAX_CORE_SIZE = 20000, 40000 or 80000?

Best regards,

Sergey Solovev