I maintain the Elk code (elk.sourceforge.net) and I need some help with parallelism.
The Elk code contains nested MPI and OpenMP regions (down four levels in places), and within these are calls to LAPACK.
The code runs fine on our new Intel X5650 cluster (I've tested it with up to 240 cores running across 20 nodes each with 12 cores). The problem is that using threaded MKL together with OpenMP spawns many more threads than there are cores (with 'top -H' reporting some running at 5%), making it run more slowly than non-threaded MKL in some cases. I've tried many combinations of the MKL and OpenMP environment variables but nothing seems to work properly.
Here is the most successful combination of variables:
export OMP_NUM_THREADS=12 export OMP_NESTED=true export OMP_MAX_ACTIVE_LEVELS=4 export OMP_DYNAMIC=true export MKL_NUM_THREADS=12 export MKL_DYNAMIC=false
...and here are the Fortran linker command line options
-L/cluster/intel/mkl/lib/intel64/ /cluster/intel/mkl/lib/intel64/libmkl_solver_lp64.a -Wl,--start-group -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -Wl,--end-group -openmp -lpthread
What would be ideal is if MKL creates new threads only if there are idle cores.
Is there some way of doing this?
(Max Planck Institute, Halle)