Developer Guide and Reference

Contents

par-schedule, Qpar-schedule

Lets you specify a scheduling algorithm for loop iterations.

Syntax

Linux and macOS:
-par-schedule-
keyword
[
=
n
]
Windows:
/Qpar-schedule-
keyword
[
[:]
n
]
Arguments
keyword
Specifies the scheduling algorithm or tuning method. Possible values are:
auto
Lets the compiler or run-time system determine the scheduling algorithm.
static
Divides iterations into contiguous pieces.
static-balanced
Divides iterations into even-sized chunks.
static-steal
Divides iterations into even-sized chunks, but allows threads to steal parts of chunks from neighboring threads.
dynamic
Gets a set of iterations dynamically.
guided
Specifies a minimum number of iterations.
guided-analytical
Divides iterations by using exponential distribution or dynamic distribution.
runtime
Defers the scheduling decision until run time.
n
Is the size of the chunk or the number of iterations for each chunk. This setting can only be specified for static, dynamic, and guided. For more information, see the descriptions of each keyword below.
Default
static-balanced
Iterations are divided into even-sized chunks and the chunks are assigned to the threads in the team in a round-robin fashion in the order of the thread number.
Description
This option lets you specify a scheduling algorithm for loop iterations. It specifies how iterations are to be divided among the threads of the team.
This option is only useful when specified with option
[Q]parallel
.
This option affects performance tuning and can provide better performance during auto-parallelization.
It does nothing if it is used with option
[q or Q]openmp
.
Option
Description
[Q]par-schedule-auto
Lets the compiler or run-time system determine the scheduling algorithm. Any possible mapping may occur for iterations to threads in the team.
[Q]par-schedule-static
Divides iterations into contiguous pieces (chunks) of size
n
. The chunks are assigned to threads in the team in a round-robin fashion in the order of the thread number. Note that the last chunk to be assigned may have a smaller number of iterations.
If no
n
is specified, the iteration space is divided into chunks that are approximately equal in size, and each thread is assigned at most one chunk.
[Q]par-schedule-static-balanced
Divides iterations into even-sized chunks. The chunks are assigned to the threads in the team in a round-robin fashion in the order of the thread number.
[Q]par-schedule-static-steal
Divides iterations into even-sized chunks, but when a thread completes its chunk, it can steal parts of chunks assigned to neighboring threads.
Each thread keeps track of L and U, which represent the lower and upper bounds of its chunks respectively. Iterations are executed starting from the lower bound, and simultaneously, L is updated to represent the new lower bound.
[Q]par-schedule-dynamic
Can be used to get a set of iterations dynamically. Assigns iterations to threads in chunks as the threads request them. The thread executes the chunk of iterations, then requests another chunk, until no chunks remain to be assigned.
As each thread finishes a piece of the iteration space, it dynamically gets the next set of iterations. Each chunk contains
n
iterations, except for the last chunk to be assigned, which may have fewer iterations. If no
n
is specified, the default is 1.
[Q]par-schedule-guided
Can be used to specify a minimum number of iterations. Assigns iterations to threads in chunks as the threads request them. The thread executes the chunk of iterations, then requests another chunk, until no chunks remain to be assigned.
For a chunk of size 1, the size of each chunk is proportional to the number of unassigned iterations divided by the number of threads, decreasing to 1.
For an
n
with value
k
(greater than 1), the size of each chunk is determined in the same way with the restriction that the chunks do not contain fewer than
k
iterations (except for the last chunk to be assigned, which may have fewer than
k
iterations). If no
n
is specified, the default is 1.
[Q]par-schedule-guided-analytical
Divides iterations by using exponential distribution or dynamic distribution. The method depends on run-time implementation. Loop bounds are calculated with faster synchronization and chunks are dynamically dispatched at run time by threads in the team.
[Q]par-schedule-runtime
Defers the scheduling decision until run time.
The scheduling algorithm and chunk size are then taken from the setting of environment variable OMP_SCHEDULE.
This option may behave differently on Intel® microprocessors than on non-Intel microprocessors.
Alternate Options
None