OpenMP Intel Implementation for Determining Number of Threads in a Parallel Region

OpenMP Intel Implementation for Determining Number of Threads in a Parallel Region

Hi, 

The OMP standard gives an algorithm for determining the number of threads in a parallel region (page 36, algorithm 2.1 here: http://www.openmp.org/mp-documents/OpenMP3.1.pdf). The last two lines of the algorithm are reproduced here:

else if (dyn-var = false) and (ThreadsRequested > ThreadsAvailable)
   then behavior is implementation defined;

Do we know what the Intel implementation is for the last line ? 

Thanks 

 

5 post / 0 nuovi
Ultimo contenuto
Per informazioni complete sulle ottimizzazioni del compilatore, consultare l'Avviso sull'ottimizzazione

The number of threads is only guaranteed to be uncertain. What you observe today may be different tomorrow. What you see in one entry may be different in different entry. Therefore if your code is written (for the affected section) to assume (require) the number of threads requested is the number of threads provided, then you have a coding error or at lease with unreasonable expectations. Note, the affected code would not run with compilation without -openmp as well as with stubs.

In this circumstance, to grant the number of threads requested would require either: a) increase the thread-limit-var, or b) have reduced team members perform missing thread team member's work by proxy (uses other team members thread number to do work). Note, b) would muck up thread local storage. a) could cause over subscription and in highly recursive situation could consume all resources.

Jim Dempsey

www.quickthreadprogramming.com

>>...Do we know what the Intel implementation is for the last line?..

No we don't know because Intel will never release C/C++ source codes for its OpenMP library. Regarding Jim's comment:

>>...The number of threads is only guaranteed to be uncertain...

Try the following code and you will see that omp_set_num_threads and OMP_STACKSIZE environment variable define everything. However, a significant over subscription will be created if the number of created OpenMP threads is greater than actual number of hardware threads supported by a CPU.
...
RTint iRetCode = CrtSetEnv( RTU("OMP_STACKSIZE=32K") );
if( iRetCode == 0 )
CrtPrintf( RTU("OMP_STACKSIZE=%s\n"), CrtGetEnv( RTU("OMP_STACKSIZE") ) );
else
CrtPrintf( RTU("Error: Failed to Set Environment Variable OMP_STACKSIZE\n") );

RTuint uiNumThreads = 0;

// uiNumThreads = 64;
// uiNumThreads = 512;
// uiNumThreads = 1024;
// uiNumThreads = 2048;
// uiNumThreads = 4096;
// uiNumThreads = 8192;
uiNumThreads = 16384;
// uiNumThreads = 32768;

omp_set_num_threads( uiNumThreads );

#pragma omp parallel for
for( int i = 0; i < 32768; i++ )
{
volatile int iValue = 2;
CrtPrintf( RTU("Iteration: %5ld - Thread %5ld out of %5ld\n"),
( RTint )i, ( RTint )omp_get_thread_num() + 1, ( RTint )uiNumThreads );
}
...

Also, the following numbers are based on my tests for #pragma omp parallel directive ( completed in 2012 / search for a related thread on the forum for more details ):
...
// #define _DEFAULT_NUMBER_OF_THREADS 64 // Maximum number of OpenMP threads for Microsoft C++ compiler
...
// #define _DEFAULT_NUMBER_OF_THREADS 16384 // Maximum number of OpenMP threads for Intel C++ compiler ( XE v12.1.3 )
...

Note: Please take into account that OMP_STACKSIZE has to be set to 32KB (!).

>>...search for a related thread on the forum for more details...

Take a look at:

Forum Topic: Stress testing of Intel OpenMP library - More than 18,600 OpenMP threads created in a parallel region
Web-link: software.intel.com/en-us/forums/topic/278302

Hi,

Intel implementation will abort with the diagnostics that not enough system resources available.

Regards, Andrey

Accedere per lasciare un commento.