Separate monitor thread

Separate monitor thread



First off, I think I mistakenly posted this under "Open source OpenMP":

I am using the Intel Composer Fortran Compiler 14.0.0.

What is the purpose of the separate monitor thread OpenMP creates?


In my Fortran application, the additional thread is always spawned, even when setting OMP_SET_NUM_THREADS(1).

Granted, it doesn't look like it does much, per the Linux "ps -L" command, but I haven't seen any easily accessible information describing the purpose of the additional thread at a high level.

Thanks in advance.

7 帖子 / 0 new
Andrey Churbanov (Intel)的头像


Could you please share a test case that would demonstrate your problem (small if possible). As I already replied to you in previous forum, this is unexpected behavior that might be caused by a bug in the OpenMP runtime. Or you may be observing some other thread, not the monitor launched by the OpenMP runtime. It is hard to say without test case.




Okay, I guess the following code might work

      program parallel

!$    use omp_lib

      implicit none

      integer(4)            :: i, j, k
      integer(4), parameter :: nmax=500
!$    integer(4)            :: nthreads
      real(8)               :: a(2,nmax,nmax,nmax)

      ! Initialize

!$    nthreads=omp_get_max_threads()

!$    call omp_set_num_threads(nthreads)

!$    write(*,*) 'NTHREADS= ', nthreads

!$omp parallel do private(i,j,k) reduction(+:a)
      do i=1, nmax
        do j=1, nmax
          do k=1, nmax
          end do
        end do
      end do
!$omp end parallel do

      end program parallel

I compiled as follows

$ make
ifort -O -openmp -openmp-link static -o test.exe main.f90

and ran the code

$ ./test.exe &

NTHREADS=            4

$ ps -L
  PID   LWP TTY          TIME CMD
 9517  9517 pts/8    00:00:00 csh
 9783  9783 pts/8    00:00:01 test.exe
 9783  9784 pts/8    00:00:00 test.exe
 9783  9785 pts/8    00:00:01 test.exe
 9783  9786 pts/8    00:00:01 test.exe
 9783  9787 pts/8    00:00:01 test.exe
 9788  9788 pts/8    00:00:00 ps

Single process number (PID), 4 threads requested, but 5 LWP shown by ps threads option (-L).


Andrey Churbanov (Intel)的头像


I tried your example with the following result:

$ OMP_NUM_THREADS=4 ./a.out &
 NTHREADS=            4

$ ps -L
   PID    LWP TTY          TIME CMD
 63182  63182 pts/2    00:00:01 a.out
 63182  63183 pts/2    00:00:00 a.out
 63182  63184 pts/2    00:00:00 a.out
 63182  63185 pts/2    00:00:00 a.out
 63182  63186 pts/2    00:00:00 a.out
 63187  63187 pts/2    00:00:00 ps

$ OMP_NUM_THREADS=1 ./a.out &
 NTHREADS=            1

$ ps -L
   PID    LWP TTY          TIME CMD
 63190  63190 pts/2    00:00:01 a.out
 63191  63191 pts/2    00:00:00 ps

So I see the expected behavior of the OpenMP runtime: it creates 4 working threads + monitor thread for parallel execution, and no additional threads created for serial execution.

The purpose of the monitor thread is time bookkeeping that is used by working threads on barriers.




Why can't the first thread to the barrier perform any desired bookkeeping?
(this would save a context switch)

Jim Dempsey
Andrey Churbanov (Intel)的头像


The problem is that when OMP tasking is involved all working threads on barrier execute tasks. Probably it is possible to implement combination of tasks execution and time bookkeeping, but it does not look an easy project. If we dedicate one of working threads to time bookkeeping exclusively this will have significant performance impact.



Presumably, when OMP is tasking, you do not preempt a task, therefore barrier bookkeeping can be done by any thread before/after each task steal. The problem (resulting from tasking) then becomes you are unable to get all the threads entering the barrier to resume at ~ the same time if any of them are off performing a task. For algorithms requiring synchronicity task stealing is bad news. Meaning, if you are using omp task model, you probably should NOT use barriers. Or if you require barriers, consider the implications of adding tasking.

Jim Dempsey