Developer Guide and Reference

  • 2021.2
  • 04/07/2021
  • Public Content
  • Download as PDF
Contents

OpenMP*
Pragmas
Summary

This is a summary of the OpenMP*
pragmas
supported in the
Intel® oneAPI
DPC++/C++
Compiler
. For detailed information about the OpenMP API, see the
OpenMP Application Program Interface
Version TR4: Version 5.0 specification, which is available from the OpenMP web site.

PARALLEL
Pragma

Use this
pragma
to form a team of threads and execute those threads in parallel.
Pragma
Description
omp parallel
Specifies that a structured block should be run in parallel by a team of threads.

TASKING
Pragma

Use this
pragma
for deferring execution.
Pragma
Description
omp task
Specifies the beginning of a code block whose execution may be deferred.
omp taskloop
Specifies that the iterations of one or more associated
for
loops should be executed in parallel using OpenMP tasks. The iterations are distributed across tasks that are created by the construct and scheduled to be executed.

WORKSHARING
Pragmas

Use these
pragmas
to share work among a team of threads.
Pragma
Description
omp for
Specifies a parallel loop. Each iteration of the loop is executed by one of the threads in the team.
omp single
Specifies that a block of code is to be executed by only one thread in the team at a time.

SYNCHRONIZATION
Pragmas

Use these
pragmas
to synchronize between threads.
Pragma
Description
omp atomic
Specifies a computation that must be executed atomically.
omp barrier
Specifies a point in the code where each thread must wait until all threads in the team arrive.
omp critical
Specifies a code block that is restricted to access by only one thread at a time.
omp flush
Identifies a point at which the view of the memory by the thread becomes consistent with the memory.
omp master
Specifies the beginning of a code block that must be executed only once by the master thread of the team.
omp ordered
Specifies a block of code that the threads in a team must execute in the natural order of the loop iterations.
omp taskgroup
Causes the program to wait until the completion of all enclosed and descendant tasks.
omp taskwait
Specifies a wait on the completion of child tasks generated since the beginning of the current task.
omp taskyield
Specifies that the current task can be suspended at this point in favor of execution of a different task.

Data Environment
Pragma

Use this
pragma
to give threads global private data.
Pragma
Description
omp threadprivate
Specifies a list of globally-visible variables that will be allocated private to each thread.

Offload Target Control
Pragmas

Use these
pragmas
to control execution on one or more offload targets. Offload is not supported on Windows* systems.
Pragma
Description
omp distribute
Specifies that the iterations of one or more loops should be distributed among the master threads of all thread teams in a league.
omp target enter data
Specifies that variables are mapped to a device data environment.
omp target exit data
Specifies that variables are unmapped from a device data environment. .
omp teams
Creates a league of thread teams inside a target region to execute a structured block in the master thread of each team.

Vectorization
Pragmas

Use these
pragmas
to control execution on vector hardware.
Pragma
Description
omp simd
Transforms the loop into a loop that will be executed concurrently using SIMD instructions.
The
early_exit
clause is an Intel-specific extension of the OpenMP* specification.
early_exit
Allows vectorization of multiple exit loops. When this clause is specified:
  • Each operation before last lexical early exit of the loop may be executed as if early exit were not triggered within the SIMD chunk.
  • After the last lexical early exit of the loop, all operations are executed as if the last iteration of the loop was found.
  • Each list item specified in the
    linear
    clause is computed based on the last iteration number upon exiting the loop.
  • The last value for
    linear
    clauses
    and conditional
    lastprivates
    clauses
    are preserved with respect to scalar execution.
  • The last value for
    reductions
    clauses
    are computed as if the last iteration in the last SIMD chunk was executed up on exiting the loop.
  • The shared memory state may not be preserved with regard to scalar execution.
  • Exceptions are not allowed.
omp declare simd
Creates a version of a function that can process multiple arguments using Single Instruction Multiple Data (SIMD) instructions from a single invocation from a SIMD loop.
omp inclusive_scan
Specifies a boundary between definitions and uses. This pragma should be used with the
scan
clause and must not be used in nested loops.

Cancellation Constructs

Pragma
Description
omp cancel
Requests cancellation of the innermost enclosing region of the
type
specified, and causes the encountering task to proceed to the end of the cancelled construct.
omp cancellation point
Defines a point at which implicit or explicit tasks check to see if cancellation has been requested for the innermost enclosing region of the type specified.
This construct does not implement a synchronization between threads or tasks.

User-Defined Reduction
Pragma

Use this pragma to define reduction identifiers that can be used as reduction operators in a reduction clause.
Pragma
Description
omp declare reduction
Declares User-Defined Reduction (UDR) functions (reduction identifiers) that can be used as reduction operators in a reduction clause.

Combined
Pragmas

Use these
pragmas
as shortcuts for multiple
pragmas
in sequence. A combined construct is a shortcut for specifying one construct immediately nested inside another construct. A combined construct is semantically identical to that of explicitly specifying the first construct containing one instance of the second construct and no other statements.
A composite construct is composed of two constructs but does not have identical semantics to specifying one of the constructs immediately nested inside the other. A composite construct either adds semantics not included in the constructs from which it is composed or the nesting of the one construct inside the other is not conforming.
Pragma
Description
omp distribute parallel for
1
Specifies a loop that can be executed in parallel by multiple threads that are members of multiple teams.
omp distribute parallel for simd
1
Specifies a loop that will be executed in parallel by multiple threads that are members of multiple teams. It will be executed concurrently using SIMD instructions.
omp distribute simd
1
Specifies a loop that will be distributed across the master threads of the teams region. It will be executed concurrently using SIMD instructions.
omp for simd
1
Specifies that the iterations of the loop will be distributed across threads in the team. Iterations executed by each thread can also be executed concurrently using SIMD instructions.
omp parallel for
Provides an abbreviated way to specify a parallel region containing a single FOR construct.
omp parallel for simd
Specifies a parallel construct that contains one for simd construct and no other statement.
omp parallel sections
Specifies a parallel construct that contains a single sections construct.
omp target parallel
Creates a device data environment and executes the parallel region on that device.
omp target parallel for
Provides an abbreviated way to specify a
target
construct that contains a
n omp target parallel for
construct and no other statement between them.
omp target parallel for simd
Specifies a
target
construct that contains a
n omp target parallel for simd
construct and no other statement between them.
omp target simd
Specifies a
target
construct that contains a
n omp simd
construct and no other statement between them.
omp target teams
Creates a device data environment and executes the construct on the same device. It also creates a league of thread teams with the master thread in each team executing the structured block.
omp target teams distribute
Creates a device data environment and then executes the construct on that device. It also specifies that loop iterations will be distributed among the master threads of all thread teams in a league created by a
teams
construct.
omp target teams distribute parallel for
Creates a device data environment and then executes the construct on that device. It also specifies a loop that can be executed in parallel by multiple threads that are members of multiple teams created by a
teams
construct.
omp target teams distribute parallel for simd
Creates a device data environment and then executes the construct on that device. It also specifies a loop that can be executed in parallel by multiple threads that are members of multiple teams created by a
teams
construct. The loop will be distributed across the teams, which will be executed concurrently using SIMD instructions.
omp target teams distribute simd
Creates a device data environment and then executes the construct on that device. It also specifies that loop iterations will be distributed among the master threads of all thread teams in a league created by a teams construct. It will be executed concurrently using SIMD instructions.
omp taskloop simd
1
Specifies a loop that can be executed concurrently using SIMD instructions and that those iterations will also be executed in parallel using OpenMP tasks.
omp teams distribute
Creates a league of thread teams to execute the structured block in the master thread of each team. It also specifies that loop iterations will be distributed among the master threads of all thread teams in a league created by a
teams
construct.
omp teams distribute parallel for
Creates a league of thread teams to execute a structured block in the master thread of each team. It also specifies a loop that can be executed in parallel by multiple threads that are members of multiple teams.
omp teams distribute parallel for simd
Creates a league of thread teams to execute a structured block in the master thread of each team. It also specifies a loop that can be executed in parallel by multiple threads that are members of multiple teams. The loop will be distributed across the master threads of the teams region, which will be executed concurrently using SIMD instructions.
omp teams distribute simd
Creates a league of thread teams to execute the structured block in the master thread of each team. It also specifies a loop that will be distributed across the master threads of the teams.
Footnotes:
1
This directive specifies a composite construct.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.