OpenMP pthread_cond_wait deadlock in v18u1

OpenMP pthread_cond_wait deadlock in v18u1

I wish I had a small bit of code to demonstrate this, but unfortunately I don't. Wondering if anyone else has run into something similar, though.

I followed the instructions to link mkl_rt at https://software.intel.com/en-us/articles/build-r-301-with-intel-c-compi.... I’m building/running on a Skylake with AVX-512, on a fresh install of RHEL/CentOS 7.4 with glibc-2.17-196.el7_4.2, and Intel Compiler 18 update 1. The compile finishes without errors, and R mostly works, but deadlocks under certain workloads—typically if a lot of computation has been done and then it forks. A guaranteed way to trigger it is running “make check-all” after compilation. MKL_THREADING_LAYER=intel deadlocks even on some very basic tests (during forks to system calls), =tbb gets around it most—but not all—of the time (tests involving the R parallel library fail), and =sequential (or setting OMP_NUM_THREADS=1 in the other two modes) passes all tests.

Here’s the relevant stack of deadlocked process:

#0  0x00007fae3d963945 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007fae3de53b77 in ___kmp_suspend_template_aux (th_gtid=<optimized out>, th=<optimized out>, flag=<optimized out>) at ../../src/z_Linux_util.cpp:1781
#2  __kmp_suspend_template (th_gtid=<optimized out>, flag=<optimized out>) at ../../src/z_Linux_util.cpp:1910
#3  __kmp_suspend_64 (th_gtid=635251012, flag=0x80) at ../../src/z_Linux_util.cpp:2019
#4  0x00007fae3dde2f13 in suspend (this=<optimized out>, th_gtid=<optimized out>) at ../../src/kmp_wait_release.h:731
#5  __kmp_wait_template (this_thr=<optimized out>, flag=<optimized out>, final_spin=<optimized out>, itt_sync_obj=<optimized out>) at ../../src/kmp_wait_release.h:343
#6  wait (this=<optimized out>, this_thr=<optimized out>, final_spin=<optimized out>, itt_sync_obj=<optimized out>) at ../../src/kmp_wait_release.h:742
#7  _INTERNAL_25_______src_kmp_barrier_cpp_ce635104::__kmp_hyper_barrier_release (bt=635251012, this_thr=0x80, gtid=1, tid=-1, propagate_icvs=635250944, itt_sync_obj=0x0) at ../../src/kmp_barrier.cpp:865
#8  0x00007fae3dde4556 in __kmp_fork_barrier (gtid=635251012, tid=128) at ../../src/kmp_barrier.cpp:2177
#9  0x00007fae3de1cc1f in __kmp_launch_thread (this_thr=0x7fae25dd2944) at ../../src/kmp_runtime.cpp:5768
#10 0x00007fae3de4fc00 in _INTERNAL_26_______src_z_Linux_util_cpp_c3d2e46c::__kmp_launch_worker (thr=0x7fae25dd2944) at ../../src/z_Linux_util.cpp:585
#11 0x00007fae3d95fe25 in start_thread () from /lib64/libpthread.so.0
#12 0x00007fae3d68d34d in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7fae3eef3780 (LWP 27709)):
#0  0x00007fae3d671e47 in sched_yield () from /lib64/libc.so.6
#1  0x00007fae3de530a4 in _INTERNAL_26_______src_z_Linux_util_cpp_c3d2e46c::__kmp_atfork_prepare () at ../../src/z_Linux_util.cpp:1531
#2  0x00007fae3d654232 in fork () from /lib64/libc.so.6
#3  0x00007fae3d601bbc in _IO_proc_open@@GLIBC_2.2.5 () from /lib64/libc.so.6
#4  0x00007fae3d601e4c in popen@@GLIBC_2.2.5 () from /lib64/libc.so.6
#5  0x00007fae3e756617 in do_system () from /tmp/rbuild/lib/libR.so

If you strace the parent, you see sched_yield() being called infinitely, and the CPU is pinned at 100% (seen a similar issue on here, but it’s related to v15 and was fixed in v16). I’ve recompiled probably near 100 times with different compiler options, but the result is always the same.  I’ve tried the entire process with the kernel from 7.3, as well, and that fails similarly—so I’m more inclined to think it’s something to do with the interaction between Intel’s OpenMP and glibc. clang builds against the system-provided OpenMP execute with threading fine. I haven’t had a chance to test on other distros or a different processor.

Any thoughts?

12 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

There is a known issue with fork() usage in OpenMP application.

It will be fixed in the next compiler update.

 

Olga,

Does the deadlock occur using v17?

Under what programming circumstances will the deadlock occur?

Jim Dempsey

Jim,

It looks like affected compiler versions are 17.0.5 and 18.0.1.

The hang may occur when fork() is used between two parallel regions that use the same team.

I have a similar problem when using MKL as a drop in for BLAS.  Here's a simple reproducer using numpy.  I've noted this is a problem in 2018 update 1 as well.

env LD_PRELOAD=/opt/intel/mkl/lib/intel64/libmkl_rt.so python -c "import subprocess; import numpy as np; A = np.random.normal(size=(1000, 1000)); A.dot(A); subprocess.Popen(['/bin/true']); print 'OK'"

...HANGS...

When stracing you notice the processes are in sched_yield(), and nothing has even exec'd.  However, if I use MKL_THREADING_LAYER=sequential or MKL_THREADING_LAYER=tbb it works as expected and doesn't hang in sched_yield().  I've verified that if I go back to an older version of MKL that I have from 2016 this problem doesn't occur.

env MKL_THREADING_LAYER=sequential LD_PRELOAD=/opt/intel/mkl/lib/intel64/libmkl_rt.so python -c "import subprocess; import numpy as np; A = np.random.normal(size=(1000, 1000)); A.dot(A); subprocess.Popen(['/bin/true']); print 'OK'"
OK

env MKL_THREADING_LAYER=tbb LD_PRELOAD=/opt/intel/mkl/lib/intel64/libmkl_rt.so python -c "import subprocess; import numpy as np; A = np.random.normal(size=(1000, 1000)); A.dot(A); subprocess.Popen(['/bin/true']); print 'OK'"
OK

You mentioned this is a known issue with fork(), and that versions 17.0.5 and 18.0.1 are affected.  What is the ETA for the next compiler update?  What version do you recommend using until that is released?  Also, is there somewhere I can go to track the status of this known issue?

Olga,

Consider the following scenario (18.0.1)

start parallel region
   outerLoop
       task firstprivate(i) //
            outerTask(i)
       end task
   end outerLoop
end parallel region
...
outerTask(i):
    innerLoop
       task firstprivate(j)
          innerTask(j)
       end task
    end innerLoop
end outerTask

IOW there is one parallel region, however tasks are nested (in above case to two levels).

Jim Dempsey

What version do you recommend using until that is released?  Also, is there somewhere I can go to track the status of this known issue?

17.0 updates prior 17.0.5 as well as 18.0.0 wouldn't have this hang issue.

The issue is being tracked internally. I'm not aware if it was submitted to support externally.

Consider the following scenario (18.0.1)

...

IOW there is one parallel region, however tasks are nested (in above case to two levels).

Jim, do you mean you see hang in this scenario?

However, if I use MKL_THREADING_LAYER=sequential or MKL_THREADING_LAYER=tbb it works as expected and doesn't hang in sched_yield().  I've verified that if I go back to an older version of MKL that I have from 2016 this problem doesn't occur.

MKL uses OpenMP for parallelization by default. That's why you see the hang with the default layer and don't see it with sequential or tbb layers.

Olga,

I have not experienced a hang now. Technically there is one parallel region, however each task in the outer level is enqueuing a (nested) task-level using available threads from the outer task level.

Jim Dempsey

Putting this here in case anyone else needs it.

Setting KMP_INIT_AT_FORK=FALSE is a workaround for the OpenMP bug.  At least it is for my problem.

Thank you so much for the information, Olga and Michael. I can confirm that KMP_INIT_AT_FORK=FALSE fixes the issue in the code I'm running. I'm afraid to admit, however, I have no idea what that flag actually does in terms of behavior or performance, and a cursory glance at Google isn't very illuminating.

I'll also give the 18.0 initial release a try (using the -fPIC workaround for the glibc issue).

Leave a Comment

Please sign in to add a comment. Not a member? Join today