Pardiso hangs or crashes intermittently

Pardiso hangs or crashes intermittently

The sparse PARDISO solver doesn't seem to be fully functional yet. It sometimes hangs as shown in the attached screen shot or crashes with other input matrices. My machine is 2.4GHz dual core running 64-bit Ubuntu and I'm using Intel MKL 10.1.

I have attached makefile and c files for Linux compilation (pardiso.tar.gz). To reproduce the problem, run the executable with the matrix market (mtx) file at the following URL.

http://www.cise.ufl.edu/research/sparse/matrices/HB/orsirr_2.html

$ tar -xvzf pardiso.tar.gz
$ cd pardiso
$ make
$ ./pardiso ~/Desktop/orsirr_2.mtx

I also found that with the following matrix market file Pardiso crashes on 64-bit Windows.
http://www.cise.ufl.edu/research/sparse/matrices/Sandia/fpga_dcop_04.html

Is this a known problem? Is there any workaround?Thank you for your time.

Jaewon

AttachmentSize
Download orsirr_2.jpg99.38 KB
Download pardiso.tar.gz6.65 KB
13 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Please let me know the package ID of your version.
Look at /doc/mklsupport.txt file and you can find there smth like Package ID: l_mkl_p_10.1.0.015
--gennady

Quoting - jaewonj

The sparse PARDISO solver doesn't seem to be fully functional yet. It sometimes hangs as shown in the attached screen shot or crashes with other input matrices. My machine is 2.4GHz dual core running 64-bit Ubuntu and I'm using Intel MKL 10.1.

I have attached makefile and c files for Linux compilation (pardiso.tar.gz). To reproduce the problem, run the executable with the matrix market (mtx) file at the following URL.

http://www.cise.ufl.edu/research/sparse/matrices/HB/orsirr_2.html

$ tar -xvzf pardiso.tar.gz
$ cd pardiso
$ make
$ ./pardiso ~/Desktop/orsirr_2.mtx

I also found that with the following matrix market file Pardiso crashes on 64-bit Windows.
http://www.cise.ufl.edu/research/sparse/matrices/Sandia/fpga_dcop_04.html

Is this a known problem? Is there any workaround?Thank you for your time.

Jaewon

Dear Jaemon,

It seems to me this is the known problem. MKL Pardiso depends on gnulibc and the problem appears on small general matrices(matrix type is 11) in parallel mode (MKL_NUM_THREADS > 1) and onlywhen libc versionis 2.5 or newer.

Could you please check your libc version typing
> /lib64/libc.so.6
GNU C Library stable release version 2.9 (20081117), by Roland McGrath et al
...
andinform me the version of libc installed on your Ubuntu?

Thereis a couple of simple workaroundsforthis particular problem. The first one is to setMKL_NUM_THREADS to one. Sincetypically the problemappears on verysmall matrices, the usage of parallelism doesn't give essential performance advantages.I'd like to notice that in MKL 10.1 iparm[2] is ignored and the parallel execution of MKL PARDISO is controlled by explicitly setting MKL_NUM_THREADS environment variable or using mkl_set_num_threads routine. So the expression iparm[2] = mkl_get_max_threads() in your code is reduntant for MKL 10.1.

The other workaround is to change PARDISO reordering scheme by setting

iparm[1] = 1;

In the case of iparm[1], MKL PARDISO uses the multiple minimum degree scheme instead of METIS and the implementation of the MMDscheme is less dependent onlibc functions.

Let me know if these workarounds don'thelp you.

We would bemuch obliged to you if you give more details about your Window 64 where you observed another failure: CPU type,version of Visual Studio installed, compilerand linker used as well as full linking line.

Thanks in advance
All the best
Sergey

Quoting - Gennady Fedorov (Intel)

Please let me know the package ID of your version.
Look at /doc/mklsupport.txt file and you can find there smth like Package ID: l_mkl_p_10.1.0.015
--gennady

Hi Gennady,

I'm not sure this is what you want. Let me know if it's not. Thanks.

Jaewon

MKLGetVersion(&ver);
printf(" MKL information :n");
printf(" Version : %d.%d.%dn", ver.MajorVersion,
ver.MinorVersion, ver.BuildNumber);
printf(" Product status : %sn", ver.ProductStatus);
printf(" Build : %sn", ver.Build);
printf(" Processor optimization : %sn", ver.Processor);
printf(" CPU frequency = %.2f GHzn", getcpufrequency());
printf(" Number of threads = %dn", mkl_get_max_threads());

MKL information :
Version : 10.1.1
Product status : Product
Build : 082212.12
Processor optimization : Intel Core 2 Duo Processor
CPU frequency = 2.39 GHz
Number of threads = 2

Quoting - Sergey Kuznetsov (Intel)

Hi Sergey,

Here is the output.

jaewonj@jaewonjlx:~/code/c/pardiso$ /lib64/libc.so.6
GNU C Library stable release version 2.7, by Roland McGrath et al.
Copyright (C) 2007 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 4.2.4 (Ubuntu 4.2.4-1ubuntu1).
Compiled on a Linux >>2.6.24-16-server<< system on 2008-09-12.
Available extensions:
crypt add-on version 2.1 by Michael Glad and others
GNU Libidn by Simon Josefsson
Native POSIX Threads Library by Ulrich Drepper et al
BIND-8.2.3-T5B
For bug reporting instructions, please see:
.

1) Unless we really have to, we probably don't want to control the nubmer of work threads by using OMP_NUM_THREADS or MKL_NUM_THREADS. Also I believethe very same problemoccurs with larger sparse matrices.

2) MMD (iparm[1] = 0) does resolve deadlock (?) problem at least for this input matrix. I'm not so sure but I think most of the example c files from MKL documentations and author's website use the nested dissection ordering since it's better when the application is multi-threaded though minimum degree typically introduces small number of fill-ins for most of thesparsity patterns. However, I will definitely fix the value of iparm[1] to 0 if we can guarantee there will be no hang problem with minimum degree ordering.

3) My Windows machine:

CPU : Intel Xeon E5410 @ 2.33GHz 8-cores
OS : 64-bit Windows Vista Business w/ service pack 1
Visual Studio 2008 professional w/ service pack 1

VC linker -> Input -> Additional Dependencies :
mkl_solver_lp64.lib
mkl_core_dll.lib
mkl_cdft_core_dll.lib
mkl_intel_lp64_dll.lib
mkkl_intel_thread_dll.lib
libiomp5md.lib

/O2 /Oi /GL /I ".src" /I "....ComponentsMKL10.1Windows-x86-64vendor" /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_UNICODE" /D "UNICODE" /FD /EHsc /MD /Gy /Fo"x64Release" /Fd"x64Releasevc90.pdb" /W3 /nologo /c /Zi /TP /errorReport:prompt

I also tried /MT.

/OUT:"C:Usersjaewonjcodecpardisox64Releasepardiso.exe" /INCREMENTAL:NO /NOLOGO /MANIFEST /MANIFESTFILE:"x64Releasepardiso.exe.intermediate.manifest" /MANIFESTUAC:"level='asInvoker' uiAccess='false'" /DEBUG /PDB:"c:Usersjaewonjcodecpardisox64Releasepardiso.pdb" /SUBSYSTEM:CONSOLE /OPT:REF /OPT:ICF /LTCG /DYNAMICBASE /NXCOMPAT /MACHINE:X64 /ERRORREPORT:PROMPT kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib

Again, thank you so much for your time.

Jaewon

Hi Sergey,

http://www.cise.ufl.edu/research/sparse/matrices/vanHeukelum/cage8.html

The above matrix causes pardiso to hang on 64-bit Linux while it causes pardiso to crash on 64-bit Windows with iparm[1]=2. So I guess this issue is not just due to gnu libc.

Jaewon

Quoting - jaewonj

Hi Sergey,

http://www.cise.ufl.edu/research/sparse/matrices/vanHeukelum/cage8.html

The above matrix causes pardiso to hang on 64-bit Linux while it causes pardiso to crash on 64-bit Windows with iparm[1]=2. So I guess this issue is not just due to gnu libc.

Jaewon

Dear Jaemon,

Thank you very much for providingmore details about Windows.

Under Linux Gnu libc is usedas well.Let uslook at output of this command under 64 bit Linux:

$ ldd pardiso
libmkl_intel_lp64.so => /mkl/mkl_release/mkl1010_15/__release_lnx/lib/em64t/libmkl_intel_lp64.so (0x0000002a95557000)
libmkl_intel_thread.so => /mkl/mkl_release/mkl1010_15/__release_lnx/lib/em64t/libmkl_intel_thread.so (0x0000002a958b3000)
libmkl_core.so => /mkl/mkl_release/mkl1010_15/__release_lnx/lib/em64t/libmkl_core.so (0x0000002a965dc000)
libiomp5.so => /mkl/mkl_release/mkl1010_15/__release_lnx/lib/em64t/libiomp5.so (0x0000002a967cf000)
libpthread.so.0 => /lib64/tls/libpthread.so.0 (0x0000003394d00000)
libm.so.6 => /lib64/tls/libm.so.6 (0x0000003394700000)
libc.so.6 => /lib64/tls/libc.so.6 (0x0000003394400000)
libdl.so.2 => /lib64/libdl.so.2 (0x0000003394900000)
/lib64/ld-linux-x86-64.so.2 (0x0000003394000000)

Sometimes, the failures are sporadical and for exampe,they might disappear after rebooting.So the root cause of both failure is the same.

We have intensively testediparm[1]=1 (MMD reordering) under Linuxwith different libc versions and we have not found anyfailures.

All the best
Sergey

Quoting - Sergey Kuznetsov (Intel)

Dear Jaemon,

Thank you very much for providingmore details about Windows.

Under Linux Gnu libc is usedas well.Let uslook at output of this command under 64 bit Linux:

$ ldd pardiso
libmkl_intel_lp64.so => /mkl/mkl_release/mkl1010_15/__release_lnx/lib/em64t/libmkl_intel_lp64.so (0x0000002a95557000)
libmkl_intel_thread.so => /mkl/mkl_release/mkl1010_15/__release_lnx/lib/em64t/libmkl_intel_thread.so (0x0000002a958b3000)
libmkl_core.so => /mkl/mkl_release/mkl1010_15/__release_lnx/lib/em64t/libmkl_core.so (0x0000002a965dc000)
libiomp5.so => /mkl/mkl_release/mkl1010_15/__release_lnx/lib/em64t/libiomp5.so (0x0000002a967cf000)
libpthread.so.0 => /lib64/tls/libpthread.so.0 (0x0000003394d00000)
libm.so.6 => /lib64/tls/libm.so.6 (0x0000003394700000)
libc.so.6 => /lib64/tls/libc.so.6 (0x0000003394400000)
libdl.so.2 => /lib64/libdl.so.2 (0x0000003394900000)
/lib64/ld-linux-x86-64.so.2 (0x0000003394000000)

Sometimes, the failures are sporadical and for exampe,they might disappear after rebooting.So the root cause of both failure is the same.

We have intensively testediparm[1]=1 (MMD reordering) under Linuxwith different libc versions and we have not found anyfailures.

All the best
Sergey

Hi Sergey,

Thanks for the update. I will run some sanity testswith iparm[1]=1 and get back to you if Ifind any.

Two questions.

1) Have you also run the Pardiso sanity test using iparm[1]=1 on Windows and Mac?

2)The MKL reference manual (page 2353) saysiparm[1] = 0 isfor minimum degree while iparm[1]=2 is for nested dissection. I cannot findany explanation of settingiparm[1]equal to 1. However, by considering your recommendation of using iparm[1]=1 instead of iparm[1]=0, I thinkiparm[1]=1 is better than iparm[1]=0, amI right?

Jaewon

Quoting - Sergey Kuznetsov (Intel)

Dear Jaemon,

Thank you very much for providingmore details about Windows.

Under Linux Gnu libc is usedas well.Let uslook at output of this command under 64 bit Linux:

$ ldd pardiso
libmkl_intel_lp64.so => /mkl/mkl_release/mkl1010_15/__release_lnx/lib/em64t/libmkl_intel_lp64.so (0x0000002a95557000)
libmkl_intel_thread.so => /mkl/mkl_release/mkl1010_15/__release_lnx/lib/em64t/libmkl_intel_thread.so (0x0000002a958b3000)
libmkl_core.so => /mkl/mkl_release/mkl1010_15/__release_lnx/lib/em64t/libmkl_core.so (0x0000002a965dc000)
libiomp5.so => /mkl/mkl_release/mkl1010_15/__release_lnx/lib/em64t/libiomp5.so (0x0000002a967cf000)
libpthread.so.0 => /lib64/tls/libpthread.so.0 (0x0000003394d00000)
libm.so.6 => /lib64/tls/libm.so.6 (0x0000003394700000)
libc.so.6 => /lib64/tls/libc.so.6 (0x0000003394400000)
libdl.so.2 => /lib64/libdl.so.2 (0x0000003394900000)
/lib64/ld-linux-x86-64.so.2 (0x0000003394000000)

Sometimes, the failures are sporadical and for exampe,they might disappear after rebooting.So the root cause of both failure is the same.

We have intensively testediparm[1]=1 (MMD reordering) under Linuxwith different libc versions and we have not found anyfailures.

All the best
Sergey

Hi Sergey,

Unfortunately, setting iparm[1] equal to 1 introduced another problem under 64-bit Linux. Paridso crashes with the following mtx file.You will see thatthis is notsporadic crash.

http://www.cise.ufl.edu/research/sparse/matrices/MathWorks/Pd.html

iparm[1] = 0 : OK
iparm[1] = 1 : Segmentation fault
iparm[1] = 2 : OK

Now I'm really confused. I guess there is no single pattern for crash and hang problem.Maybe the issue should be forwarded to the author for through investigation.

Jaewon

Quoting - jaewonj

Hi Sergey,

Unfortunately, setting iparm[1] equal to 1 introduced another problem under 64-bit Linux. Paridso crashes with the following mtx file.You will see thatthis is notsporadic crash.

http://www.cise.ufl.edu/research/sparse/matrices/MathWorks/Pd.html

iparm[1] = 0 : OK
iparm[1] = 1 : Segmentation fault
iparm[1] = 2 : OK

Now I'm really confused. I guess there is no single pattern for crash and hang problem.Maybe the issue should be forwarded to the author for through investigation.

Jaewon

Dear Jaewon,

Sorry, I made a typo. iparm[1]=1 goes for the constraint mimimum degree (CMD) ordering and it is different from MMD. So MKL reference manual is correct.

Thank you very much for finding issue with Pd.mtx.By the way,PARDISO passed with MKL_NUM_THREADS=1.So looks like MKL_NUM_THREADS=1 is only workable workaround for small matrices.

As concerns as iparm[1]=0 and iparm[1]=2, Gennady Fedorov willinform youabout availabilities ofthe binaries with the fix for glibc.

Thank you very much again
All the best
Sergey

Quoting - Gennady Fedorov (Intel)

Hi Jaewon,
We have uploaded to the intel ftp server the package which contains the fixes for the problems you announced.
Please try to get it on our ftp server: ftp.intel.com:/pub/outgoing/mklnightly_1022_20090609_win.tgz

P.S.
md5sum:
9253f6a67d5414ab2f31f8046204adec *./mklnightly_1022_20090609_win.tgz

PPS:
Jaewon, one comment - there is one restriction regarding our ftp server:
all downloading is valid for 24 hours only.

Please let me know if any problem.

--Gennady

Hi Gennady,

I got the files. Thanks. I will get back to you if I encounter any problems.

Jaewon

Quoting - Gennady Fedorov (Intel)

Quoting - jaewonj
Hi Jaewon,
please let me 1 biz. day - i will check the problem on my side. One more questopn - I have to use zip you sent into post#25. is it right?
--Gennady

Hi Gennady,

If you mean"pardiso.tar.gz" I have sent before, as you know, the zip filecontains a simple driver that reads any mtx file as it is.However, thetwo screen shotsI sent yesterday were generated by a different program.

I'm sure you also have codes on your side that read any mtx file and convert it into upper or lower triangular matrix.

Jaewon

For whom who are interested, we have to mentioned ( because of this is mkl hot topics issue ) that the problem discussed in this thread has been already fixed and the fix available in the latest MKL version 10.2 Update 3

--Gennady

Leave a Comment

Please sign in to add a comment. Not a member? Join today