Developer Guide

Contents

Graph Viewer

The
<project_dir>/reports/report.html
file provides a stall point graph that includes load and store information between kernels and different memories, pipes connected between kernels, and loops.
The Graph Viewer (see Figure 1) shows an abstracted netlist of your DPC++ system in a hierarchical graphical report consisting of system, block, and cluster views. It allows you to review information such as sizes, types, dependences, and schedules of instructions, properties of interfaces such as stream and memory interface, and view variables that have loop-carried dependencies.
Access the Graph Viewer by selecting
System Viewers
Graph Viewer
in the
report.html
.
You can interact with the Graph Viewer in the following ways:
  • Use the mouse wheel to zoom in and out within the Graph Viewer.
  • Navigate through the following hierarchical views in the Graph List:
    • System
    • Block
    • Cluster
  • Click a node to display its location in the source code in the Code pane and node details in the Details pane.

System View

Use the system view of the Graph Viewer report to view various kernels in your DPC++ system. The system view illustrates connections between your kernels and connections from kernels to memories. In addition, the system view shows the connection of blocks within a kernel and highlights blocks with a high initiation interval (II).
In system view of the Graph Viewer, you can review portions of your design that are associated with red logic blocks. For example, a logic block that has a pipelined loop with a high initiation interval (II) value might be highlighted in red because the high II value might affect design throughput.
You can hide certain types of connections in the system view of the Graph Viewer by unchecking that type of connection. By default, both
Control
and
Memory
are checked in the Graph Viewer.
Control
refers to connections between blocks and loops.
Memory
refers to connections to and from global or local memories. If your design includes connections to and from read or write pipes, you also have a
Pipes
option in the Graph Viewer.
System View of the Graph Viewer Report
media/image26.png
The system view of the Graph Viewer shows following types of connections:
  • Control
  • Memory, if your design has global or local memory
  • Pipe, if your design uses pipes
You can choose to hide a type of connection by unchecking the corresponding checkbox.

Block View

The block view of the Graph Viewer provides a more granular graph view of the kernel. This view shows the following:
  • Fine grained details within kernels (including instructions, dependencies, and schedule of the instructions) of the generated datapath of computations. The
    Intel® oneAPI DPC++/C++ Compiler
    encapsulates maximum instructions in clusters for better QoR. The Graph Viewer shows clusters, instructions outside clusters and their connections.
  • Linking from the instruction back to source line by clicking the instruction node.
  • Various information about the instructions, such as data width, schedule information and more, if applicable.
The schedule information is relative to the start of each block. Since the
Intel® oneAPI DPC++/C++ Compiler
cannot statically infer the trip counts of blocks, if your design consists of multiple blocks, the compiler cannot compute the absolute schedule information by considering the trip counts. Moreover, the schedule information provided for the stallable instructions (such as pipe RD/WR or memory LD/ST) are estimated values from empirical measurements. The real schedule is likely to be different and you must verify it with a hardware, or a simulation run.
If your design has loops, the
Intel® oneAPI DPC++/C++ Compiler
encapsulates the loop control logic into loop orchestration nodes and the initial condition of the loops into loop input nodes and their connection to the datapath.
Inside a block, there is often pipe RD/WR or memory LD/ST nodes connecting to computation nodes or clusters. You can click on the computation nodes and view the Details pane (or hover over the nodes) to see specific instructions and the bit width. You can click on the RD/WR or LD/ST nodes to see information such as width, depth, type, and schedule of a pipe or an LSU from the Details pane.
If your design has clusters, a cluster has a FIFO in its exit node to store any pipelined data in-flight. You can click on the cluster exit node to find the exit FIFO width and depth attribute. The cluster exit FIFO size is also available in the cluster view of the Graph Viewer.
Block View of the Graph Viewer Report
media/image27.png

Cluster View

The cluster view of the Graph Viewer provides more granular graph views of the kernel. It helps in viewing clusters inside a block and it shows all variables inside a cluster that have loop-carried dependency. This view shows the following:
  • Fine grained details within clusters (including instructions and dependencies of the instructions) of the generated datapath of computations.
  • Linking from the instruction back to source line by clicking the instruction node.
  • Various information about the instructions, such as data width, node’s schedule information in start cycle and latency are provided, if applicable.
A cluster starts with an entry node and ends with an exit node. The cluster exit node has a FIFO of depth greater than or equal to the latency of the cluster to store any data in-flight. You can find the size of the cluster exit FIFO by clicking on the exit node. The cluster exit FIFO size information is also available in the block view of the Graph Viewer when you click on the exit node.
Cluster View of the Graph Viewer
media/image28.png
A cluster has a FIFO in its exit node to store any pipelined data in-flight. You can click on the cluster exit node to find the exit FIFO width and depth attribute. The cluster exit FIFO size is also available in the cluster view of the Graph Viewer.
Besides computation nodes, when your design contains loops, you can see loop orchestration nodes and variable nodes along with their Feedback nodes. The compiler generates the loop orchestration logic for loops in your design. This logic is represented by loop orchestration nodes in the cluster view of the Graph Viewer. A variable node corresponds to a variable that has loop-carried dependency in your design. A variable node goes through various computation logic and finally feeds to a Feedback node that connects back to the variable node. This back edge means that the variable is passed to the next iteration after the new value is evaluated. Scan for loop-carried variables that have a long latency to the Feedback nodes as they can be the II bottlenecks. You can cross-check by referring to the Loop Analysis report for more information about the II bottleneck. The Feedback node has a FIFO to store any data in-flight for the loop and is sized to d*II where d is the dependency distance and II is the initiation interval. You can find the size of the cluster exit FIFO by clicking on the feedback node and looking at the Details pane or the pop-up box.
The dependency distance is the number of iterations between successive load/store that depends on each other.

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804