OpenMP* 5.0 support in Intel® Compiler 18.0

By Igor Vorobtsov, Published: 07/24/2017, Last Updated: 07/24/2017

OpenMP 5.0 is the next version of the OpenMP specification which should be officially released in 2018. The release of Technical Report 4 as a preview for the future OpenMP 5.0 API is available and describes language features. Intel compilers version 18.0 support reductions amongst tasks which is one of the major feature added to OpenMP.

A reduction happens while updating a variable on each loop iteration in the following way:

variable = operator(variable, expression)

Note that operator is any commutative and associative operator updating variable and variable does not occur in expression. Parallelization of this kind of loops may introduce data races since variable is shared and modified in every iteration. Mutual exclusive access is required to avoid it.

There are three common types of reductions: for-loop, while-loop and recursive. With for-loop reductions a reduction is enclosed in a for-loop body and the iteration space is known. It is used in scientific applications to update large arrays of data in each step or in numerical solvers. This reduction type is supported by existing OpenMP specification by reduction clause. However, in some other algorithms where the iteration space is unknown, while-loop reductions appear, e.g. in graph search algorithms. Recursive reductions are used in backtracking algorithms appearing in combinatorial optimization and usually allow compact and more "elegant" formulations.  Explicit tasking was introduced in OpenMP 3.0 to allow the concurrent while-loops and recursions expression, however, reductions were not covered. New feature of OpenMP 5.0 provide a way to have a parallel reductions amongst tasks.

New clauses are added to the specification in order to support it. Reduction scoping clause task_reduction defines the region in which reduction is computed by tasks:

С++: #pragma omp taskgroup task_reduction ( operator : list )
Fortran: !$OMP TASKGROUP TASK_REDUCTION ( operator | intrinsic : list )

For each list item x taskgroup keeps a copy of x for each task with in_reduction(x). At the begining of taskgroup all copies of x are initialized according to op. And all copies of x are reduced into the original x at the end.  New reduction participating clause identify each task participating in a reduction:

С++: #pragma omp task in_reduction ( operator : list )
Fortran: !$OMP TASK IN_REDUCTION ( operator | intrinsic : list )

For each list item x task requests a pointer p to its copy of x from an enclosing taskgroup. If nested, the innermost taskgroup with the matching task_reduction clause fulfills the request. All accesses to x in the task’s body are replaced by *p. Here is a simple example:

int sum=init_sum(), A[100]=init_A();
#pragma omp parallel
#pragma omp single
  #pragma omp taskgroup task_reduction(+:sum)
    #pragma omp task in_reduction(+:sum)
    for (i=0; i<50; i++) sum += A[i];
    #pragma omp task in_reduction(+:sum)
    for (i=50; i<100; i++) sum += A[i];
  } // wait here until all tasks in taskgroup are completed

We have 2 tasks which can be marked as reduction, however, we also need to somehow specify that these are all the tasks participating in the updating the sum. We enclose all the tasks which participate in reduction inside the taskgroup having a clause task_reduction. It says that in this taskgroup there are tasks that may be doing reduction with sum. Any task which wants to participate in the reduction should be marked with in_reduction clause. Note that there may be some tasks which don’t participate in the reduction. Both tasks in this example may be running concurrently.  Following diagram shows what is happening:











For new taskloop directive from 4.5 specification there are similar clauses available.
The taskloop creates tasks to share the work of the loop. It creates an implicit taskgroup to enclose them. Taskloop reduction participating+scoping clause is:

С++: #pragma omp taskloop reduction ( operator : list )
Fortran: !$OMP TASKLOOP REDUCTION ( operator | intrinsic : list )

All the taskloop’s tasks get the matching in_reduction clause and the implicit taskgroup gets the matching task_reduction clause.
It is also possible to specify in_reduction clause to the taskloop. In this case the implicit taskgroup does not have a matching task_reduction clause and an explicit taskgroup enclosing the taskloop must have a matching task_reduction clause:

С++: #pragma omp taskloop in_reduction ( operator : list )
Fortran: !$OMP TASKLOOP IN_REDUCTION ( operator | intrinsic : list )

Note that all the taskloop’s tasks get the in_reduction clause.

Product and Performance Information


Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804