OpenMP Loop Collapse Directive

Compiler Methodology for Intel® MIC Architecture

OpenMP* Loop Collapse Directive

Use the OpenMP collapse-clause to increase the total number of iterations that will be partitioned across the available number of OMP threads by reducing the granularity of work to be done by each thread. If the amount of work to be done by each thread is non-trivial (after collapsing is applied), this may improve the parallel scalability of the OMP application.

You can improve performance by avoiding use of the collapsed-loop indices (if possible) inside the collapse loop-nest (since the compiler has to recreate them from the collapsed loop-indices using divide/mod operations AND the uses are complicated enough that they don't get dead-code-eliminated as part of compiler optimizations):

 
#pragma omp parallel for collapse(2) 
  for (i = 0; i < imax; i++) { 
    for (j = 0; j < jmax; j++) a[ j + jmax*i] = 1.; 
  } 

Modified example for better performance:

#pragma omp parallel for collapse(2) 
  for (i = 0; i < imax; i++) { 
     for (j = 0; j < jmax; j++) a[ k++] = 1.; 
  } 


NEXT STEPS

It is essential that you read this guide from start to finish using the built-in hyperlinks to guide you along a path to a successful port and tuning of your application(s) on Intel® Xeon Phi™ coprocessor. The paths provided in this guide reflect the steps necessary to get best possible application performance.

Back to the chapter Efficient Parallelization

Per informazioni complete sulle ottimizzazioni del compilatore, consultare l'Avviso sull'ottimizzazione