User Guide

Contents

Broken Call Tree

Symptoms

In the Call Tree section of the HTML report
, you see one of the following:
  • A code region is duplicated.
  • A code region is located at a wrong place.
  • A code region has incorrect number of trip counts reported in any column of the Trip Counts column group.
  • A code region with your code has a
    System Module
    message in the Diagnostics column
    and
    Cannot be modeled: System Module
    message in the Why Not Offloaded column
    .
Any of these symptoms mean that the
Offload Advisor
detected the application call tree incorrectly during Survey.

Details

A broken call tree often happens if you use a program model with Data Parallel C++ (DPC++) or Threading Building Block (TBB). These program models run code in many threads using a complicated scheduler, and the
Intel® Advisor Beta
sometimes cannot correctly detect their call stacks. As a result, some code instances might have no metrics or incorrect metrics in a report and a call tree is broken.

Cause

This can happen due to the following reasons:
  • Call stacks were detected incorrectly.
  • A heavy optimization was used.
  • Debug information has issues.

Possible Solution

This is not an issue if all hotspots and code you are interested in are outside of the broken part of the call tree. You can ignore it in this case.
To fix a broken call tree, do the following:
  • Make sure you compiled binary with
    -g
    option.
    You can recompile it with the
    -debug inline-debug-info
    option to get enhanced debug information.
  • Recompile the binary with a lower optimization level: use
    -O2
    .
  • If you collect performance metrics with advixe-cl:
    When running the Survey analysis, try the following:
    • Remove
      --stackwalk-mode=online
      option.
    • Add
      --no-stack-stitching
      option.
  • Offload only specific code regions if their estimated execution time on a target device is greater than or equal to the original execution time. Rerun the performance modeling with
    --select-loops
    to specify loops of interest and
    --enforce-offloads
    to make sure all of them are offloaded. For example:
    advixe-python <APM>/analyze.py <project-dir> --select-loops=[<file-name1>:<line-number1>,<file-name1>:<line-number2>,<file-name2>:<line-number3>] –-enforce-offloads
    Replace
    <APM>
    with
    $APM
    on Linux* OS or
    %APM%
    on Windows* OS.
  • If you model a multithreaded code that runs with a complicated scheduler, you might see a code region with suspiciously low trip counts and multiple instances of the same region loop present in the scheduler. This means that the Offload Advisor could not correctly detect the call stacks. Use the
    --enable-batching
    option to artificially increase the number of trip counts by using total number of executions instead of average number trip counts.

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804