User Guide

  • 2021.2
  • 05/21/2021
  • Public Content
  • Download as PDF
Contents

Enable Task Chunking

Chunking
means that the parallel framework will merge several tasks into a single task, with little or no overhead between them. For instance, if tasks are loop iterations, chunking would mean that several iterations are executed together (as a chunk) before heavyweight task control is performed.
Chunking is typically implemented when you convert to a parallel framework:
  • With
    Intel® oneAPI Threading Building Blocks
    , by using a
    parallel_for()
    instance.
  • With OpenMP*, by using the C/C++ pragma
    #pragma omp parallel for
    or the Fortran directive
    !$omp parallel do
    .
You can also restructure your code to enable chunking. This can be done by modifying a single loop to create a new outer loop where the two loops cover the same iteration space. A technique called strip-mining allows the inner loop to use vector operations in small chunks. Loop vectorization allows hardware to process data independently in smaller units (usually 64-byte), such as operations on data arrays.
Once these two loops exist, move the inner loop inside the task annotations so the task begin and end annotations encapsulate the inner loop. The outer loop strides by some chunk size, and the inner loop iterates sequentially through each chunk.
In cases where the CPU time and the elapsed time are about the same, the
Suitability Report
window under
Runtime impact for this site
may recommend that you enable task chunking.
If you check an item under to the right of the
Scalability of Maximum Site Gain
graph (such as
Enable Task Chunking
), its value will be added to the
Site Gain
and possibly the
Maximum Site Gain for All Sites
values.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.