OpenMP threads gone dead

OpenMP threads gone dead

I am running Fortarn XE2013 in VS2010

I have an openMP single thread block that gets run many times succesfully but suddenly I get a deadlock at the next statement after the block. I am running 6 threads. Looking at the threads window in VS2010 I see a Main thread and 6 worker threads (one too many). Four worker threads plus the main thread are waiting at the next statement after the block. Two of the worker threads dont show a call stack from my code: 

ntdll.dll!00000000774c135a()

[Frames below may be incorrect and/or missing, no symbols loaded for ntdll.dll]

KernelBase.dll!000007fefdbf10dc()

libiomp5md.dll!0000000180091a11()

libiomp5md.dll!00000001800672bc()

libiomp5md.dll!000000018006f920()

llibiomp5md.dll!000000018006e2eb()

libiomp5md.dll!000000018009250e()

kernel32.dll!000000007705652d()

ntdll.dll!000000007749c521()  

8 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

More information:

I am calling OMP_Set_Dynamic(.true.) and MKL_SET_DYNAMIC(.true.) before I start the parallel region. If I comment out these calls, the deadlock is fixed.

Please ignore last post, still getting deadlock with fixed number of threads.

Have you tried the thread checker feature of Intel Inspector XE? It may help you locate the problem. You can download a 30-day free trial.

Steve - Intel Developer Support

I see you are using OpenMP and MKL. MKL can be configured (selected) to be single-threaded or multi-threaded (internally OpenMP).

For _single-threaded_ application you would typically sellect the OpenMP varient of MKL
For _multi-threaded_ application you would typically select the single-threaded varient of MKL

Depending on the version of MKL combining OpenMP app with OpenMP MKL could potentially yield
n pools of n threads.
Newer versions of MKL (may) attempt to reduce the impact.

In the limited circumstance were MKL calls are _only_ made from the single-threaded portion of a multi-threaded application, then combining the parallel MKL to the parallel app may make sense... provided you also set KMP_BLOCK_TIME to 0 as well.

Jim Dempsey

www.quickthreadprogramming.com

As Jim pointed out, it's frequently effective to limit MKL to a single thread when the threading can be performed by OpenMP at a higher level in the call stack.
I suspect the combination of omp_set_dynamic and mkl_set_dynamic isn't well tested. You'd be more likely to find an expert on that question in the MKL forum.
Jim alludes to the possible advantage of shortening KMP_BLOCKTIME to speed up transition between parallel regions which use different threading models. As Fortran and MKL share the OpenMP library, thus condition ought not to be as likely to show up as when one of the parallel regions uses the Cilk+ or tbb threading.

My code is one parallel block doing 30% of the work itself and 70% in calls to MKL. My code has many omp do and a few work sections. The MKL pardiso call is done in an omp single block and MKL is in parallel mode. I did it like this to reduce the overhead of starting and stopping the parallel region each side of the MKL call. The MKL call is done a few hundred times during a typical run of about only 1 or 2 minutes work. Do you think MKL may be disrupting the operation of the threads in my code ? The error is happening near the end of the run when MKL has been called many times. Would setting KMP_BLOCKTIME to zero help in this situation ? I am linking against mkl_blas95_lp64.lib mkl_lapack95_lp64.lib mkl_intel_lp64_dll.lib mkl_intel_thread_dll.lib mkl_core_dll.lib. Steve, I am not able to quickly employ the Intel Inspector XE as my application is a like a service appliaction and must be attached to for debugging. I need to modify it to allow it to be started from Visual Studio.

Change your omp single block to omp master. Reason being
First time single block is entered, MKL will instantiate an OpenMP thread team within the context of that thread (similar to OpenMP nested had code in single block instantiated a new nested parallel region).
Subsequent entries to single block should same thread become owner of single region then same MKL thread team used (efficient).
Should different thread team member acquire ownership of single block then new MKL thread team will be constructed (inefficient).

There may be (likely are) other issues relating to why the deadlock (blocking) occures, possibly relating to thread local storage.

This should be an easy test to try out (changing single to master).

Jim Dempsey

www.quickthreadprogramming.com

Leave a Comment

Please sign in to add a comment. Not a member? Join today