Worksharing Using OpenMP*
- The loop variable must be of type signed or unsigned integer, random access iterator, or pointer.
- The comparison operation must be in the formloop_variable <, <=, >, or >= loop_invariant_expressionof a compatible type.
- The third expression or increment portion of theforloop must be either addition or subtraction by a loop invariant value.
- If the comparison operation is < or <=, the loop variable must increment on every iteration; conversely, if the comparison operation is > or >=, the loop variable must decrement on every iteration.
- The loop body must be single-entry-single-exit, meaning no jumps are permitted from inside to outside the loop, with the exception of theexitstatement that terminates the whole application. If the statementsgotoorbreakare used, the statements must jump within the loop, not outside it. Similarly, for exception handling, exceptions must be caught within the loop.
Basics of Compilation
A Few Simple Examples
Avoiding Data Dependencies and Race Conditions
Managing Shared and Private Data
- Declare the variable inside the loop-really inside the parallel OpenMP pragma-without the static keyword.
- Specify the private clause on an OpenMP pragma.
privateVariable Initialization Value
& (bitwise and)
| (bitwise or)
^ (bitwise exclusive or)
&& (conditional and)
|| (conditional or)
- can be listed in just one reduction.
- cannot be declared constant.
- cannot be declared private in theparallelconstruct.
Load Balancing and Loop Scheduling
Divide the loop into equal-sized chunks or as equal as possible in the case where the number of loop iterations is not evenly divisible by the number of threads multiplied by the chunk size. By default, chunk size is
Set chunk to 1 to interleave the iterations.
Use the internal work queue to give a chunk-sized block of loop iterations to each thread. When a thread is finished, it retrieves the next block of loop iterations from the top of the work queue.
By default, the chunk size is 1. Be careful when using this scheduling type because of the extra overhead involved.
Similar to dynamic scheduling, but the chunk size starts off large and decreases to better handle load imbalance between iterations. The optional chunk parameter specifies them minimum size chunk to use.
By default the chunk size is approximately
When schedule (auto) is specified, the decision regarding scheduling is delegated to the compiler. The programmer gives the compiler the freedom to choose any possible mapping of iterations to threads in the team.
OMP_SCHEDULEenvironment variable to specify which one of the three loop-scheduling types should be used.
is a string formatted exactly the same as would appear on the parallel construct.
OpenMP Tasking Model
- final (scalar expression)
- default(shared | none)
- in_reduction(reduction-identifier : list)
- depend(dependence-type : list)
- the point immediately following the generation of an explicit task.
- after the last instruction of ataskregion.
- in ataskwaitregion.
- in implicit and explicit barrier regions.
- begin execution of a tied task bound to the current team.
- resume any suspended task region, bound to the current team, to which it is tied.
- begin execution of an untied task bound to the current team.
- resume any suspended untied task region bound to the current team.
- An explicit task whose construct contained an if clause whose if clause expression evaluated to false is executed immediately after generation of the task.
- Other scheduling of new tied tasks is constrained by the set of task regions that are currently tied to the thread, and that are not suspended in a barrier region. If this set is empty, any new tied task may be scheduled. Otherwise, a new tied task may be scheduled only if it is a descendant of every task in the set. A program relying on any other assumption about task scheduling is non-conforming.