Forum Jump

Select Group :
Select Forum :
Sorted By :
Sort Order :
From The :
 
Thread Tools  Search this thread 
jaewonj
June 9, 2009 8:53 AM PDT
Pardiso hangs or crashes intermittently

The sparse PARDISO solver doesn't seem to be fully functional yet. It sometimes hangs as shown in the attached screen shot or crashes with other input matrices. My machine is 2.4GHz dual core running 64-bit Ubuntu and I'm using Intel MKL 10.1.

I have attached makefile and c files for Linux compilation (pardiso.tar.gz). To reproduce the problem, run the executable with the matrix market (mtx) file at the following URL.

http://www.cise.ufl.edu/research/sparse/matrices/HB/orsirr_2.html

$ tar -xvzf pardiso.tar.gz
$ cd pardiso
$ make
$ ./pardiso ~/Desktop/orsirr_2.mtx

I also found that with the following matrix market file Pardiso crashes on 64-bit Windows.
http://www.cise.ufl.edu/research/sparse/matrices/Sandia/fpga_dcop_04.html

Is this a known problem? Is there any workaround? Thank you for your time.

Jaewon

 Attachments 
Gennady Fedorov (Intel)
Total Points:
12,866
Status Points:
12,366
Brown Belt
June 9, 2009 11:06 PM PDT
Rate
 
#1
Please let me know the package ID of your version.
Look at <mklroot>/doc/mklsupport.txt file and you can find there smth like Package ID: l_mkl_p_10.1.0.015
--gennady



Sergey Kuznetsov (Intel)
Total Points:
1,310
Status Points:
810
Brown Belt
June 10, 2009 5:56 AM PDT
Rate
 
#2
Quoting - jaewonj

The sparse PARDISO solver doesn't seem to be fully functional yet. It sometimes hangs as shown in the attached screen shot or crashes with other input matrices. My machine is 2.4GHz dual core running 64-bit Ubuntu and I'm using Intel MKL 10.1.

I have attached makefile and c files for Linux compilation (pardiso.tar.gz). To reproduce the problem, run the executable with the matrix market (mtx) file at the following URL.

http://www.cise.ufl.edu/research/sparse/matrices/HB/orsirr_2.html

$ tar -xvzf pardiso.tar.gz
$ cd pardiso
$ make
$ ./pardiso ~/Desktop/orsirr_2.mtx

I also found that with the following matrix market file Pardiso crashes on 64-bit Windows.
http://www.cise.ufl.edu/research/sparse/matrices/Sandia/fpga_dcop_04.html

Is this a known problem? Is there any workaround? Thank you for your time.

Jaewon


Dear Jaemon,

It seems to me this is the known problem. MKL Pardiso depends on gnu libc and the problem appears on small general matrices (matrix type is 11)  in parallel mode (MKL_NUM_THREADS > 1)  and only when libc version is 2.5 or newer.

Could you please check your libc version typing
> /lib64/libc.so.6
GNU C Library stable release version 2.9 (20081117), by Roland McGrath et al
...
and inform me the version of libc installed on your Ubuntu? 

There is a couple of simple workarounds for this particular problem. The first one is to set MKL_NUM_THREADS to one. Since typically the problem appears on very small matrices,  the usage of parallelism doesn't give essential performance advantages. I'd like to notice that in MKL 10.1  iparm[2] is ignored and the parallel execution of MKL PARDISO is controlled by explicitly setting MKL_NUM_THREADS environment variable or using  mkl_set_num_threads routine. So the expression iparm[2] = mkl_get_max_threads() in your code is reduntant for MKL 10.1.

The other workaround is to change PARDISO reordering scheme by setting 
 
iparm[1] = 1;

In the case of iparm[1],  MKL PARDISO uses the multiple minimum degree scheme instead of METIS and the implementation of the MMD scheme is less dependent on libc functions. 

Let me know if these workarounds don't  help you.  

We would be much obliged to you if you give more details about your Window 64 where you observed another failure: CPU type, version of Visual Studio installed, compiler and linker used as well as full linking  line. 

Thanks in advance
All the best
Sergey  

jaewonj
June 10, 2009 11:57 AM PDT
Rate
 
#3 Reply to #1
Please let me know the package ID of your version.
Look at <mklroot>/doc/mklsupport.txt file and you can find there smth like Package ID: l_mkl_p_10.1.0.015
--gennady


Hi Gennady,

I'm not sure this is what you want. Let me know if it's not. Thanks.

Jaewon


MKLGetVersion(&ver);
printf(" MKL information :\n");
printf(" Version : %d.%d.%d\n", ver.MajorVersion,
ver.MinorVersion, ver.BuildNumber);
printf(" Product status : %s\n", ver.ProductStatus);
printf(" Build : %s\n", ver.Build);
printf(" Processor optimization : %s\n", ver.Processor);
printf(" CPU frequency = %.2f GHz\n", getcpufrequency());
printf(" Number of threads = %d\n", mkl_get_max_threads());


MKL information :
Version : 10.1.1
Product status : Product
Build : 082212.12
Processor optimization : Intel(R) Core(TM) 2 Duo Processor
CPU frequency = 2.39 GHz
Number of threads = 2




jaewonj
June 10, 2009 12:55 PM PDT
Rate
 
#4 Reply to #2

Hi Sergey,

Here is the output.

jaewonj@jaewonjlx:~/code/c/pardiso$ /lib64/libc.so.6
GNU C Library stable release version 2.7, by Roland McGrath et al.
Copyright (C) 2007 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 4.2.4 (Ubuntu 4.2.4-1ubuntu1).
Compiled on a Linux >>2.6.24-16-server<< system on 2008-09-12.
Available extensions:
        crypt add-on version 2.1 by Michael Glad and others
        GNU Libidn by Simon Josefsson
        Native POSIX Threads Library by Ulrich Drepper et al
        BIND-8.2.3-T5B
For bug reporting instructions, please see:
<http://www.gnu.org/software/libc/bugs.html>.


1) Unless we really have to, we probably don't want to control the nubmer of work threads by using OMP_NUM_THREADS or MKL_NUM_THREADS. Also I believe the very same problem occurs with larger sparse matrices. 

2) MMD (iparm[1] = 0) does resolve deadlock (?) problem at least for this input matrix. I'm not so sure but I think most of the example c files from MKL documentations and author's website use the nested dissection ordering since it's better when the application is multi-threaded though minimum degree typically introduces small number of fill-ins for most of the sparsity patterns. However, I will definitely fix the value of iparm[1] to 0 if we can guarantee there will be no hang problem with minimum degree ordering.

3) My Windows machine:

CPU : Intel Xeon E5410 @ 2.33GHz 8-cores
OS : 64-bit Windows Vista Business w/ service pack 1
Visual Studio 2008 professional w/ service pack 1

VC linker -> Input -> Additional Dependencies :
mkl_solver_lp64.lib
mkl_core_dll.lib
mkl_cdft_core_dll.lib
mkl_intel_lp64_dll.lib
mkkl_intel_thread_dll.lib  
libiomp5md.lib

/O2 /Oi /GL /I ".\src" /I "..\..\Components\MKL\10.1\Windows-x86-64\vendor" /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_UNICODE" /D "UNICODE" /FD /EHsc /MD /Gy /Fo"x64\Release\\" /Fd"x64\Release\vc90.pdb" /W3 /nologo /c /Zi /TP /errorReport:prompt

I also tried /MT.

/OUT:"C:\Users\jaewonj\code\c\pardiso\x64\Release\pardiso.exe" /INCREMENTAL:NO /NOLOGO /MANIFEST /MANIFESTFILE:"x64\Release\pardiso.exe.intermediate.manifest" /MANIFESTUAC:"level='asInvoker' uiAccess='false'" /DEBUG /PDB:"c:\Users\jaewonj\code\c\pardiso\x64\Release\pardiso.pdb" /SUBSYSTEM:CONSOLE /OPT:REF /OPT:ICF /LTCG /DYNAMICBASE /NXCOMPAT /MACHINE:X64 /ERRORREPORT:PROMPT kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib


Again, thank you so much for your time.

Jaewon

 

 

 




jaewonj
June 10, 2009 3:59 PM PDT
Rate
 
#5 Reply to #2

Hi Sergey,

http://www.cise.ufl.edu/research/sparse/matrices/vanHeukelum/cage8.html

The above matrix causes pardiso to hang on 64-bit Linux while it causes pardiso to crash on 64-bit Windows with iparm[1]=2. So I guess this issue is not just due to gnu libc.

Jaewon



Sergey Kuznetsov (Intel)
Total Points:
1,310
Status Points:
810
Brown Belt
June 11, 2009 2:41 AM PDT
Rate
 
#6 Reply to #5
Quoting - jaewonj

Hi Sergey,

http://www.cise.ufl.edu/research/sparse/matrices/vanHeukelum/cage8.html

The above matrix causes pardiso to hang on 64-bit Linux while it causes pardiso to crash on 64-bit Windows with iparm[1]=2. So I guess this issue is not just due to gnu libc.

Jaewon


Dear Jaemon,

Thank you very much for providing more details about Windows.  

Under Linux Gnu libc is used as well.  Let us look at output of this command under 64 bit Linux:

$ ldd pardiso
        libmkl_intel_lp64.so => /mkl/mkl_release/mkl1010_15/__release_lnx/lib/em64t/libmkl_intel_lp64.so (0x0000002a95557000)
        libmkl_intel_thread.so => /mkl/mkl_release/mkl1010_15/__release_lnx/lib/em64t/libmkl_intel_thread.so (0x0000002a958b3000)
        libmkl_core.so => /mkl/mkl_release/mkl1010_15/__release_lnx/lib/em64t/libmkl_core.so (0x0000002a965dc000)
        libiomp5.so => /mkl/mkl_release/mkl1010_15/__release_lnx/lib/em64t/libiomp5.so (0x0000002a967cf000)
        libpthread.so.0 => /lib64/tls/libpthread.so.0 (0x0000003394d00000)
        libm.so.6 => /lib64/tls/libm.so.6 (0x0000003394700000)
        libc.so.6 => /lib64/tls/libc.so.6 (0x0000003394400000)
        libdl.so.2 => /lib64/libdl.so.2 (0x0000003394900000)
        /lib64/ld-linux-x86-64.so.2 (0x0000003394000000)

Sometimes, the failures are sporadical and for exampe, they might disappear after rebooting. So the root cause of both failure is the same. 

We have intensively tested iparm[1]=1 (MMD reordering) under Linux with different libc versions and we have not found any failures. 

All the best
Sergey

  

jaewonj
June 11, 2009 9:05 AM PDT
Rate
 
#7 Reply to #6

Dear Jaemon,

Thank you very much for providing more details about Windows.  

Under Linux Gnu libc is used as well.  Let us look at output of this command under 64 bit Linux:

$ ldd pardiso
        libmkl_intel_lp64.so => /mkl/mkl_release/mkl1010_15/__release_lnx/lib/em64t/libmkl_intel_lp64.so (0x0000002a95557000)
        libmkl_intel_thread.so => /mkl/mkl_release/mkl1010_15/__release_lnx/lib/em64t/libmkl_intel_thread.so (0x0000002a958b3000)
        libmkl_core.so => /mkl/mkl_release/mkl1010_15/__release_lnx/lib/em64t/libmkl_core.so (0x0000002a965dc000)
        libiomp5.so => /mkl/mkl_release/mkl1010_15/__release_lnx/lib/em64t/libiomp5.so (0x0000002a967cf000)
        libpthread.so.0 => /lib64/tls/libpthread.so.0 (0x0000003394d00000)
        libm.so.6 => /lib64/tls/libm.so.6 (0x0000003394700000)
        libc.so.6 => /lib64/tls/libc.so.6 (0x0000003394400000)
        libdl.so.2 => /lib64/libdl.so.2 (0x0000003394900000)
        /lib64/ld-linux-x86-64.so.2 (0x0000003394000000)

Sometimes, the failures are sporadical and for exampe, they might disappear after rebooting. So the root cause of both failure is the same. 

We have intensively tested iparm[1]=1 (MMD reordering) under Linux with different libc versions and we have not found any failures. 

All the best
Sergey

  

Hi Sergey,

Thanks for the update. I will run some sanity tests with iparm[1]=1 and get back to you if I find any.

Two questions.  

1) Have you also run the Pardiso sanity test using iparm[1]=1 on Windows and Mac? 

2) The MKL reference manual (page 2353) says iparm[1] = 0 is for minimum degree while iparm[1]=2 is for nested dissection. I cannot find any explanation of setting iparm[1] equal to 1. However, by considering your recommendation of using iparm[1]=1 instead of iparm[1]=0, I think iparm[1]=1 is better than iparm[1]=0, am I right?
  
Jaewon


jaewonj
June 11, 2009 10:28 AM PDT
Rate
 
#8 Reply to #6

Dear Jaemon,

Thank you very much for providing more details about Windows.  

Under Linux Gnu libc is used as well.  Let us look at output of this command under 64 bit Linux:

$ ldd pardiso
        libmkl_intel_lp64.so => /mkl/mkl_release/mkl1010_15/__release_lnx/lib/em64t/libmkl_intel_lp64.so (0x0000002a95557000)
        libmkl_intel_thread.so => /mkl/mkl_release/mkl1010_15/__release_lnx/lib/em64t/libmkl_intel_thread.so (0x0000002a958b3000)
        libmkl_core.so => /mkl/mkl_release/mkl1010_15/__release_lnx/lib/em64t/libmkl_core.so (0x0000002a965dc000)
        libiomp5.so => /mkl/mkl_release/mkl1010_15/__release_lnx/lib/em64t/libiomp5.so (0x0000002a967cf000)
        libpthread.so.0 => /lib64/tls/libpthread.so.0 (0x0000003394d00000)
        libm.so.6 => /lib64/tls/libm.so.6 (0x0000003394700000)
        libc.so.6 => /lib64/tls/libc.so.6 (0x0000003394400000)
        libdl.so.2 => /lib64/libdl.so.2 (0x0000003394900000)
        /lib64/ld-linux-x86-64.so.2 (0x0000003394000000)

Sometimes, the failures are sporadical and for exampe, they might disappear after rebooting. So the root cause of both failure is the same. 

We have intensively tested iparm[1]=1 (MMD reordering) under Linux with different libc versions and we have not found any failures. 

All the best
Sergey

  

Hi Sergey,

Unfortunately, setting iparm[1] equal to 1 introduced another problem under 64-bit Linux. Paridso crashes with the following mtx file. You will see that this is not sporadic crash.

http://www.cise.ufl.edu/research/sparse/matrices/MathWorks/Pd.html

iparm[1] = 0 : OK
iparm[1] = 1 : Segmentation fault 
iparm[1] = 2 : OK

Now I'm really confused. I guess there is no single pattern for crash and hang problem. Maybe the issue should be forwarded to the author for through investigation. 

Jaewon
    



Sergey Kuznetsov (Intel)
Total Points:
1,310
Status Points:
810
Brown Belt
June 14, 2009 11:50 PM PDT
Rate
 
#9 Reply to #8
Quoting - jaewonj

Hi Sergey,

Unfortunately, setting iparm[1] equal to 1 introduced another problem under 64-bit Linux. Paridso crashes with the following mtx file. You will see that this is not sporadic crash.

http://www.cise.ufl.edu/research/sparse/matrices/MathWorks/Pd.html

iparm[1] = 0 : OK
iparm[1] = 1 : Segmentation fault 
iparm[1] = 2 : OK

Now I'm really confused. I guess there is no single pattern for crash and hang problem. Maybe the issue should be forwarded to the author for through investigation. 

Jaewon
    


Dear Jaewon,

Sorry, I made a typo. iparm[1]=1 goes for the constraint mimimum degree (CMD) ordering and it is different from MMD. So MKL reference manual is correct. 

Thank you very much for finding issue with Pd.mtx. By the way, PARDISO passed with MKL_NUM_THREADS=1. So looks like MKL_NUM_THREADS=1 is only workable workaround for small matrices.

As concerns as iparm[1]=0 and iparm[1]=2, Gennady Fedorov will inform you about availabilities of the binaries with the fix for glibc.

Thank you very much again
All the best
Sergey 

jaewonj
June 19, 2009 7:28 AM PDT
Rate
 
#17 Reply to #16

Hi Jaewon,
We have uploaded to the intel ftp server the package which contains the fixes for the problems you announced.
Please try to get it on our ftp server: ftp.intel.com:/pub/outgoing/mklnightly_1022_20090609_win.tgz

P.S.
md5sum:
9253f6a67d5414ab2f31f8046204adec *./mklnightly_1022_20090609_win.tgz

PPS:
Jaewon, one comment - there is one restriction regarding our ftp server:
all downloading is valid for 24 hours only.

Please let me know if any problem.

--Gennady


Hi Gennady,

I got the files. Thanks. I will get back to you if I encounter any problems.

Jaewon



jaewonj
July 14, 2009 10:18 AM PDT
Rate
 
#30 Reply to #29
Quoting - jaewonj
Hi Jaewon,
please let me 1 biz. day  - i will check the problem on my side. One more questopn - I have to use zip you sent into post#25. is it right?
--Gennady


Hi Gennady,

If you mean "pardiso.tar.gz" I have sent before, as you know, the zip file contains a simple driver that reads any mtx file as it is. However, the two screen shots I sent yesterday were generated by a different program.

I'm sure you also have codes on your side that read any mtx file and convert it into upper or lower triangular matrix.

Jaewon



Intel Software Network Forums Statistics

8446 users have contributed to 31554 threads and 100408 posts to date.
In the past 24 hours, we have 7 new thread(s) 35 new posts(s), and 45 new user(s).

In the past 3 days, the most popular thread for everyone has been Evaluation of if statements The most posts were made to TBB on linux segfaulting The post with the most views is Quoting - Lorri Menard (In

Please welcome our newest member tconrado