Chunking means that the parallel framework will merge several tasks into a single task, with little or no overhead between them. For instance, if tasks are loop iterations, chunking would mean that several iterations are executed together (as a chunk) before heavyweight task control is performed.
Chunking is typically implemented when you convert to a parallel framework:
With Intel® TBB, by using a parallel_for() instance.
With OpenMP*, by using the C/C++ pragma #pragma omp parallel for or the Fortran directive !$omp parallel do.
You can also restructure your code to enable chunking. This can be done by modifying a single loop to create a new outer loop where the two loops cover the same iteration space. A technique called strip-mining allows the inner loop to use vector operations in small chunks. Loop vectorization allows hardware to process data independently in smaller units (usually 64-byte), such as operations on data arrays.
Once these two loops exist, move the inner loop inside the task annotations so the task begin and end annotations encapsulate the inner loop. The outer loop strides by some chunk size, and the inner loop iterates sequentially through each chunk.
In cases where the CPU time and the elapsed time are about the same, the Suitability Report window under Runtime impact for this site may recommend that you enable task chunking.
If you check an item under to the right of the Scalability of Maximum Site Gain graph (such as Enable Task Chunking), its value will be added to the Site Gain and possibly the Maximum Site Gain for All Sites values.