I am beta testing Pardiso for Mac OS X. I am doing this on a Mac Pro that has a dual processor Intel Xeon with 4 cores per processor (8 cores total), and 4 Gb of RAM. The OS is Mac OS X Leopard 10.5.3.
My matrices have sizes about 50,000 x 50,000, and they are quite sparse (about 3 million non-zero elements). They arise from nonlinear elasticity problems (and the Finite Element Method), and are in my opinion very common kind of matrices that one would use with Pardiso.
*** Issue #1: small parallel speedup
I am able to make the direct solver work. However, the parallel speedup is small. I get about 20% speedup when switching from 1 core to 2 cores. With more cores, I start getting even negative speedup (slowdown). These timings include all phases 1,2,3.
num cores | solve time (seconds)
1 | 6.93
2 | 5.60
3 | 5.64
4 | 5.74
6 | 7.14
8 | 7.57
I always set the environment variable MKL_NUM_THREADS to the desired number of threads, and I am passing the same value for iparm(3). My computer is lightly loaded - I am not running anything else other than Pardiso.
It would be very helpful if Intel provided non-trivial size sparse test matrices (e.g., 50,000 x 50,000), together with some performance numbers obtained with those matrices by Intel (most notably, the parallel speedup as a function of the number of cores). So that one can roughly know the expected speedup before committing to coding the interface to the solver.
I got excited over Intel's MKL Pardiso because of all the speedup claims, but got little speedups in my case. It would be good to hear what speedups other people are getting - or if there are any tricks to make it faster.
*** Issue #2: direct-iterative solver crashes
I am unable to make the direct-iterative solver to work. It crashes inside the solver routine:
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0xa0b5ed80
0x904c8150 in strlen ()
#0 0x904c8150 in strlen ()
#1 0x000f1034 in pardiso_open_ooc_file_ ()
Cannot access memory at address 0xfdf8b55c
The out-of-core parameter iparm(60) is set to zero (I have no intention of using OOC), so it makes no sense why OOC should be called. I just spent 5 hours trying to set the iparam values to many different settings; I always get the crash. I am compiling with the GNU C/C++ compiler, in 32-bit mode, with LP64. What I do is I first call Pardiso with phase=11, then attempt to repeatedly call it with phase=23 (loading a different matrix each time, with same sparsity pattern). I get the crash the first time I call it with phase=23. It appears that this crash occurs after Pardiso has internally computed the first factorization (judging this from the elapsed time before the crash). I am using iparm(4)=62. My matrices are symmetric, and I am only passing the upper triangle, including the diagonal (and all diagonal elements are set).
It would be very helpful if a code example was provided illustrating the direct-iterative solver.
Also, the Pardiso section of the manual is currently not always easy to read. More space between the iparm() paragraphs would help.
Pardiso issues (small parallel speedup, direct-iterative solver crashes)