MKL parallel execution confusion

MKL parallel execution confusion

I have some confusion regarding how MKL execute in parallel.  The problem I have is that after making some changes to a program, calls to DGBTRS, DGETRS, DGETRF and DGBTRF are no longer executed in parallel by MKL even though I am using the complier option /Qmkl:parallel.

Let me explain a little more.  I have the following code structure for solving a medium size set of ODES (~5000 differential equations).

PROGRAM

	-Allocates space, etc

	CALL ODE_SOLVER

	-save, cleanup , etc

	END PROGRAM
SUBROUTINE ODE_SOLVER

	CALL USER_ODES

	...

	CALL DGBTRF

	CALL DGBTRS

	CALL DGETRF

	CALL DGETRS

	...

	RETURN

	END SUBROUTINE ODE_SOLVER
SUBROUTINE USER_ODES

	-this is where I made some changes, in particular larger vector/matrix multiplications

	-Note, however, ODE system size has not changed, so ODE_SOLVER sees the same system size

	that is no change has occur that ODE_SOLVER sees.

	END SUBROUTINE USER_ODES

For the initial version of the program, the above LAPACK calls made within ODE_SOLVER were executed in parallel by MKL, and I got very good execution time speedup (across 8 cores).  I made some changes to USER_ODES, but I did not change the size of the ODE system, so ODE_SOLVER was effectively solving the same problem.  However, USER_ODES did allocate larger matrices to compute the ODES.

The problem is, after making changes to USER_ODES, calls to the LAPACK routines stopped executing in parallel (only get serial execution).  If I use the Intel fortran compilier option /Qparallel, all cores become busy, but performance is terrible.

Sorry this is not much to go on.  My guess is that USER_ODES is generating multiple threads now, and this prevents MKL for producing parallel threads for the LAPACK calls.  Any suggestions?

Thanks

-joe

4 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Sorry, disregard the above post (I don't see a way of removing it).  Turns out my changes USER_ODES were more computational time consuming than I had thought.  MKL is running in parallel, it just doesn't spend much time running.   Need to optimize/parallelize my own code.

Hi Joe,

How are you checking if the MKL functions are spawning the threadings? In some cases, Intel MKL functions may not create more threading. For example, if the high level code is threaded with Intel OpenMP, and MKL functions find there functions are in the OpenMP parallel region, MKL may not create the threading( to avoid over-threading there).
In your case, it looks the high level code is not threaded. True?  Also, the /Qparallel, and  /Qmkl:parallel are totally different. With the /Qparallel,  Intel compiler may threaded some of your source with OpenMP, and  /Qmkl:parallel is enabling the MKL internal threading.

Thanks,
Chao

Hi Chao,

Thanks for your comment.  I was crudely checking thread creation by just following core activity.  As I noted above, MKL was generating threads as expected, it just that my modified code, which has a lot of serial execution, was taking much longer than I had anticipated.  At first appearance, I thought MKL was also executing in serial, but that was incorrect.  MKL did execute code in parallel, it just did it so quickly that I missed it at first.

cheers,

-joe

Leave a Comment

Please sign in to add a comment. Not a member? Join today