I have some confusion regarding how MKL execute in parallel. The problem I have is that after making some changes to a program, calls to DGBTRS, DGETRS, DGETRF and DGBTRF are no longer executed in parallel by MKL even though I am using the complier option /Qmkl:parallel.
Let me explain a little more. I have the following code structure for solving a medium size set of ODES (~5000 differential equations).
PROGRAM -Allocates space, etc CALL ODE_SOLVER -save, cleanup , etc END PROGRAM SUBROUTINE ODE_SOLVER CALL USER_ODES ... CALL DGBTRF CALL DGBTRS CALL DGETRF CALL DGETRS ... RETURN END SUBROUTINE ODE_SOLVER SUBROUTINE USER_ODES -this is where I made some changes, in particular larger vector/matrix multiplications -Note, however, ODE system size has not changed, so ODE_SOLVER sees the same system size that is no change has occur that ODE_SOLVER sees. END SUBROUTINE USER_ODES
For the initial version of the program, the above LAPACK calls made within ODE_SOLVER were executed in parallel by MKL, and I got very good execution time speedup (across 8 cores). I made some changes to USER_ODES, but I did not change the size of the ODE system, so ODE_SOLVER was effectively solving the same problem. However, USER_ODES did allocate larger matrices to compute the ODES.
The problem is, after making changes to USER_ODES, calls to the LAPACK routines stopped executing in parallel (only get serial execution). If I use the Intel fortran compilier option /Qparallel, all cores become busy, but performance is terrible.
Sorry this is not much to go on. My guess is that USER_ODES is generating multiple threads now, and this prevents MKL for producing parallel threads for the LAPACK calls. Any suggestions?