OpenMP Threadpooling Implementation for Nested Parallelism

OpenMP Threadpooling Implementation for Nested Parallelism

Hi, 

The GNU implementation seems to use pooled threads only for non-nested parallel regions. Does anyone know if the Intel OMP implementation uses thread pooling when a nested parallel construct is encountered ? 

Thanks 

5 Beiträge / 0 neu
Letzter Beitrag
Nähere Informationen zur Compiler-Optimierung finden Sie in unserem Optimierungshinweis.

Hi,

I am not sure what do you mean by thread pooling. The current Intel implementation does re-use threads in subsequent parallel regions, but for the case of nested parallels it is not guaranteed that particular master thread will have same worker threads in the nested team. So, if the number of threads is not changed, then the complete pool of threads will be the same, but in nested regions worker threads may migrate from one team to another.

Example:

#pragma omp parallel
{
    #pragma omp parallel
    {
        // make some synchronization here to be sure all parallel regions have started,
        //   otherwise one parallel may re-use threads of the other which is complete.
    }
    #pragma omp parallel
    {
        // here the same threads will be re-used, but particular master may have different workers in the team
    }
}

Regards, Andrey

>>...I am not sure what do you mean by thread pooling...

A very short explanation is as follows: An application or some library creates some number of threads at startup and then reuses those threads to do some processing. It allows to reduce overheads related to creation of threads.

Bild des Benutzers jimdempseyatthecove

int main(...)
{
  // one thread in app
  #pragma omp parallel
  {
    // first time entering parallel region creates thread pool
    // all threads of thread pool running here
  } // end of parallel region
  // main thread running here
  // thread pool NOT disbanded, additional threads in spinwait for KMP_BLOCKTIME
  #pragma omp parallel
  {
    // all threads running here again, however this time pool not created (re-use of former pool)
    // no overhead to create pool, may have overhead to restart suspended thread if spinwait timed out
  }

The original questions has some subtlties:

int main(...)
{
  for(int i=0; i != N; ++i)
  {
    omp_set_nested(1);
    omp_set_num_threads(4);
    #pragma omp parallel
    {
      // 4 threads running here
      omp_set_num_threads(4);
      #pragma omp parallel
      {
        // each thread (of 4) of enclosing parallel region now has a team of 4 threads
        // 16 threads running here assuming nested permitted and not exceeding max threads.
        // *** the qestion then becomes, are the same threads used on each iteration of the for(1= loop?
      } // end inner parallel region
    } // end outer paralle region
  } // end for
} // end main

Years ago, when I asked this question (IFV) the answer was same threads get reused/assigned.
Andrey's remark indicates there is uncertanty (no requirement) that same threads get reused/assigned.
In looking at OpenMP 3.1 I cannot find information as to if reuse is defined or implimentation dependent.

It would be desirable on a NUMA system that the same threads get reused. For this reason I ask Andrey to verify his comments.

Jim Dempsey

www.quickthreadprogramming.com

In my reading of Andrey's reply, he says the threads are reused, but in nested parallelism they may not belong to the same team.

I find it difficult to compare the details of Intel libiomp5 behavior against libgomp, even on linux.  With libgomp,and OMP_WAIT_POLICY=passive,  environment variable GOMP_SPINCOUNT becomes active, with default value 300000.   I can't establish behavior similar to libiomp5 KMP_BLOCKTIME.  setting GOMP_AFFINITY is important on a dual CPU platform, as are KMP_AFFINITY et al. for libiomp5, but I don't see useful settings for OMP_NESTED.

Melden Sie sich an, um einen Kommentar zu hinterlassen.