User Guide

Contents

OpenMP* Constructs in the Graph Canvas

To map OpenMP* parallel regions and task constructs to a graph, run the
fgt2xml
converter with the
--omp_experimental
flag. In such graph, nodes represent parallel regions and tasks, and edges represent task dependencies

Parallel Regions

All OpenMP-related parallelism is contained within OpenMP parallel regions. In the Flow Graph Analyzer, a parallel region is mapped to a subgraph node in the graph canvas. Inside the subgraph node are at least two nodes:
  • A node that represents the start of the parallel region.
  • A node that represents the implicit barrier at the end of the region.
For example, for an empty parallel region like the following, the Flow Graph Analyzer creates a subgraph node, such as
omp0::n0
, in the graph canvas.
#pragma omp parallel { }
subgraph node shown in the graph canvas
When you double-click the subgraph node, you see the following, where
omp0::n0::n1
is the start of the parallel region and
omp0::n0::n2
is the implicit barrier at the end of the node.
result of double-clicking the subgraph node

OpenMP* Tasks

An OpenMP* task is a block of code contained in a parallel region that can be executed simultaneously with other tasks in the same region. In the Flow Graph Analyzer, an OpenMP task is mapped to a generic node. For example, in the code below, there are two tasks: one prints
hello
and the other prints
world
. The order in which these tasks execute is not specified, so they can execute in any order. However, the two tasks always start after the enclosing parallel region begins, and they complete before the enclosing parallel region ends.
#pragma omp parallel { #pragma omp task { printf("hello "); } #pragma omp task { printf("world "); } }
When you visualize this program in the Flow Graph Analyzer, it looks like this:
visualize the example program in the flow graph analyzer
When you double-click this subgraph, you see the following, where
omp0::n0::n1
is the start of the parallel region,
omp0::n0::n4
is the implicit barrier at the end of the region,
omp0::n0::n2
is the "hello" task and
omp0::n0::n3
is the "world" task.

OpenMP* Task Dependencies

In the OpenMP* specification, a partial ordering of tasks can be expressed with
depend
clauses. The task dependence is fulfilled when the predecessor task completes. There are three dependency types supported by the OpenMP API:
in
,
out
, and
in-out
:
  • in
    dependency type: The generated task is a dependent task of all previously generated sibling tasks that reference at least one of the list items in an
    out
    or
    in-out
    clause.
  • out
    and
    in-out
    dependency types: The generated task is a dependent task of all previously generated sibling tasks that reference at least one of the list items in an
    in
    ,
    out
    , or
    in-out
    clause.
In the Flow Graph Analyzer, task dependencies are represented by edges between the nodes that represent OpenMP tasks.
It is important to understand what dependencies are visualized in the Flow Graph Analyzer.
  • The task dependency graph represents the partial order set by the
    depend
    clauses for the OpenMP tasks executed by the application. The nodes in the graphs are OpenMP tasks and the edges represent the partial order.
  • To reduce the complexity of the graph, the Flow Graph Analyzer omits some
    transitive
    dependencies. A transitive dependence is a dependency between three tasks, such that if it holds between the first and the second tasks and between the second and the third tasks, it must hold between the first and the third tasks. In the figure below, the node
    a
    must execute before the node
    b
    in the partial order due to a dependency on the location
    x
    as
    a <
    x
    b
    .
    flow graphic analyzer omits some transitive dependencies to reduce complexity of the graph
    • Part (a) of the figure shows an example that only includes dependencies due to a single location
      x
      . Because
      a <
      x
      b
      and
      b <
      x
      d
      , the Flow Graph Analyzer does not show the
      transitive
      edge
      a <
      x
      d
      .
    • Part (b) of the figure shows two locations
      x
      and
      y
      that determine the partial order. There are two potential dependency edges from
      a
      to
      d
      :
      a <
      x
      d
      and
      a <
      y
      d
      . The Flow Graph Analyzer includes an edge from
      a
      to
      d
      because a is the direct source of
      y
      for
      d
      , but it excludes
      a <
      x
      d
      .
If there are parallel edges between two nodes and at least one of them can be omitted due to transitivity, they all can be omitted without changing the partial order. The Flow Graph Analyzer includes edges like
a <
y
d
in the graph topology because including edges to satisfy all required data dependencies is the most natural representation.
For example:
#pragma omp parallel { std::string s = ""; #prgma omp single { #pragma omp task depend( out: s) { s = "hello"; printf("%s", s); } #pragma omp task depend( out: s ) { s = "world"; printf("%s",s); } } }
This application, when visualized with the Flow Graph Analyzer, has a single top-level subgraph node representing the OpenMP parallel region.
single top-level subgraph node
When you double-click this subgraph, you see the following:
The edge between
omp0::n0::n2
and
omp0::n0::n3
represents task dependency due to the variable
s
.
The main components of the Flow Graph Analyzer include the treemap view, the graph-topology canvas, the timeline and concurrency histogram view, and the critical-path report. OpenMP task traces map naturally to these views:
  • The treemap view shows the time spent in each OpenMP parallel region, colored according to the average application concurrency during the time it was executing.
  • The graph topology canvas shows the partial ordering of the tasks.
  • The timeline and concurrency histogram view show the execution of each task on the OpenMP runtime threads and the application concurrency over time.
  • The critical report shows the most time-consuming path from each source to each sink in the graph, sorted with the longest critical path at the top.
For more examples, see https://link.springer.com/chapter/10.1007/978-3-319-98521-3_12.

OpenMP* Nodes to Source Code Mapping:

In addition to the graphical view of OpenMP* task dependency graphs, the Flow Graph Analyzer also shows nodes mapping to corresponding source code. To get this information, you must build an OpenMP application with the
–g
flag.
For example, source code mapping with subgraph nodes in a parallel region looks as follows:
source code mapping with subgraph nodes

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserverd for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804