I'm trying the PARDISO solver in MKL v9.0 on a
shared memory SGI Altix machine with Itanium 2 cpus. My matrix is
symmetric positive definite, with sizes up to 350000 x 350000 and I
tried setting OMP_NUM_THREADS up to 12 cpus. There is a batch system,
so there is definitely no one else using the cpus I'm running on.
However, as I increase the number of processors, the performance
degrades, i.e. the fastest execution is with one cpu. Other parts of
the code scale well with OpenMP, except the PARDISO calls (all phases).
use -O3 -openmp -mtune=itanium2 as compiler flags and link the
following libraries: -lstdc++ -lmkl_solver -lmkl_lapack -lmkl_ipf
-lmkl_lapack64 -lmkl -lvml -lguide -lpthread.
I have in mkl.cfg:
MKL_SERIAL = OMP
MKL_INPUT_CHECK = OFF
Is there anything fundamental I'm missing or not doing right? Could anyone point me to the right direction?
Thank you in advance,