Cannot run DSYEVR parallel

Cannot run DSYEVR parallel

The last serial part of my application is a call to DSYEVR. My attempts to parallelize it resulted in very strange behavior hope someone help me to understand.

Depending on the data I run DSYERV alone or two/three of them in a OMP PARALLEL SECTIONS. My application is compiled with icc on Cray with MKL 10.3 update 3 (the parallel version). The matrices are small, 61x61.

As suggested elsewhere, I call omp_set_nested(1), mkl_set_dynamic(0) and mkl_set_num_threads(n) (n: 1-8) at the beginning of the code. Then run my application on a varying number of threads (1-16).

With the above setup the performances drops dramatically going above 2 threads whathever number of threads I reserve to MKL.

To check my code I linked with --mkl=sequential and the scaling is what I expected. So I presume the culprit is MKL and its interactions with omp_set_nested.

I implemented also the "fake nesting" suggested in this forum (cannot find the reference anymore, but was about starting more threads than requested by OMP_NUM_THREADS) and there is a small speed advantage running on 4 nodes, but overall the scaling does not change. I interpret this as no parallelization of the DSYEVR calls.

Any idea? This call is clearly reducing my code scalability as seen also with profilers as Vampir.
Thanks!
mario

7 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

mario, your interpretation is correct - ?syevr routines are not threaded. 

Intel MKL the clever: she knows that small matrixes do not need to be considered. You take the big matrix: DSYEVR uses dsytrd, which partially parallelize and dlarfb which is good parallelize. It is necessary to organize the program code differently. The refined version is included in the last versions of Intel MKL dlarfb: http://redfort-software.intel.com/en-us/forums/showthread.php?t=77331

Legendary intelligence officer Drozdov was nicknamed «Fabergé» owing to his unique capability to work with information, to get information, and to convert it into the most precious treasures.

OK, understand. I'm rethinking my code.
Just a question. How small is small? That is, which is the size threshold above which dsyevr start parallelizing?
Also I cannot access the last reference http://redfort-software.intel.com/en-us/forums/showthread.php?t=77331 is there any alternative location?
Thanks!
mario

there are no single answer on that question because it depends on many factors, but since sizes of 128x128 we have to apply threading to that code.
--Gennady

That last quoted URL is still blocked for non-Intel accounts.

The redfort-software URL looks same as this one http://software.intel.com/en-us/forums/topic/287728

Leave a Comment

Please sign in to add a comment. Not a member? Join today