User Guide

Contents

Why Not Offloaded: Not Profitable

Symptoms

A code region of interest has
Not profitable
with a clarifying message as a reason why it is not offloaded.

Cause and Solution

You may see
Not Profitable
as a reason for not offloading a code region, which means that estimated execution time on a target device is greater than or equal to the original execution time on a base platform and the estimated speedup is less than 1. This reason is always followed by a message that clarifies what makes offloading impossible.
In the commands below, replace
<APM>
with
$APM
on Linux* OS or
%APM%
on Windows* OS.
Message
Details and Cause
Solution
Not profitable: Parallel execution efficiency is limited due to Dependencies
Dependencies limit parallel execution and the code region cannot benefit from offloading to a target device. The estimated execution time after acceleration is greater than or equal to the original execution time.
See the
Estimated Time on Target Device (+Host)
and
Baseline Elapsed Time
columns of a metrics table.
Solution 1.
  1. Locate your loop of interest in the Call Tree section of the HTML report.
  2. Scroll to the
    Dependency Type
    column in the
    Loop/Function
    column group and check if your code region of interest has a real (for example, Dependency: raw or Dependency: waw) or assumed (Dependency: assumed) dependency.
  3. If your code region of interest has a
    real
    dependency, rewrite your code to get rid of the dependency limitations and rerun both the metrics collection and performance modeling.
    You can refer to the Dependencies Problem and Message Types Reference for details on potential dependencies problems.
  4. If your code region of interest has an assumed dependency, rerun the performance modeling with
    --assume-parallel
    or
    --set-parallel=[<IDs/source-locations>]
    option to reduce the number of code regions with dependencies.
    For example:
    advixe-python <APM>/analyze.py <project-dir> --set-parallel=[<file-name1>:<line-number1>,<file-name1>:<line-number2>,<file-name2>:<line-number3>]
    These options do not resolve dependencies issues, they tell the Offload Advisor to assume that certain code regions do not have dependencies.
Solution 2.
You can tell Offload Advisor to model offloading for only specific code regions even if they are not profitable.
Rerun the performance modeling with
--select-loops
to specify loops of interest and
--enforce-offloads
to make sure all of them are offloaded. For example:
advixe-python <APM>/analyze.py <project-dir> --select-loops=[<file-name1>:<line-number1>,<file-name1>:<line-number2>,<file-name2>:<line-number3>] --enforce-offloads
Not profitable: The Number of Loop Iterations is not enough to fully utilize Target Device capabilities
The number of loop iterations is not enough to fully utilize Target Device capabilities and benefit from offloading.
For example, if a target device can execute up to 1024 threads in parallel, but a loop has only 100 iterations, offloading of this loop can result in up to 924 threads being inactive. This may significantly reduce the benefit of using the target device.
See the
Estimated Time on Target Device (+Host)
and
Baseline Elapsed Time
columns of a metrics table.
In most cases, such code regions cannot benefit from offloading. If you still want to offload such loops, you can try to use one of the following workarounds:
Solution 1.
If a loop is broken down into several chunks by a compiler or a program model, use
--enable-batching
or
--threads=<number-of-threads>
option with
analyze.py
. For the
--threads
option, specify the number of parallel threads equal to the target device capacity.
Solution 2.
You can tell Offload Advisor to model offloading for only specific code regions even if they are not profitable.
Rerun the performance modeling with
--select-loops
to specify loops of interest and
--enforce-offloads
to make sure all of them are offloaded. For example:
advixe-python <APM>/analyze.py <project-dir> --select-loops=[<file-name1>:<line-number1>,<file-name1>:<line-number2>,<file-name2>:<line-number3>] --enforce-offloads
Not profitable: Data Transfer Tax is greater than Computation Time and Memory Bandwidth Time
Time spent on transferring data to a target device (
Data Transfer Tax
) is greater than Computation Time and Memory Bandwidth Time. The resulting
Estimated Time on Target Device (+Host)
with data transfer tax is greater than or equal to the
Baseline Elapsed Time
.
Check metrics in the
Data Transfer
column group. Large values in any of the columns mean that this code region cannot benefit from offloading.
If you still want to offload such loops, you can try to use one of the following workarounds:
  • Rerun the collection with the
    --no-data-transfer
    option to disable data transfer analysis and use only estimated execution time for speedup and profitability calculation.
    For example:
    advixe-python <APM>/collect.py <project-dir> --no-data-transfer -- <target> [target-options]
    This option disables data transfer analysis for all loops. You may get different performance modeling results for all loops.
  • Rerun the performance modeling with
    --select-loops
    and
    --enforce-offloads
    to offload only specific code regions even if they are not profitable.
    For example:
    advixe-python <APM>/analyze.py <project-dir> --select-loops=[<file-name1>:<line-number1>,<file-name1>:<line-number2>,<file-name2>:<line-number3>] --enforce-offloads
Not profitable: Computation Time is high despite the full use of Target Device capabilities
The code region uses the Target Device capabilities to the full, but the Computation Time is still high. As a result, the Estimated Time on Target Device (+Host) is greater than or equal to the Baseline Elapsed Time.
Check the value in the
Total Time by Compute
column in the
Offload Information
column group.
  • High value means that this code region cannot benefit from offloading.
  • Unexpectedly high value indicates a problem with a programming model used.
If you still want to offload such loops:
You can tell Offload Advisor to model offloading for only specific code regions even if they are not profitable. Rerun the performance modeling with
--select-loops
to specify loops of interest and
--enforce-offloads
to make sure all of them are offloaded.
For example:
advixe-python <APM>/analyze.py <project-dir> --select-loops=[<file-name1>:<line-number1>,<file-name1>:<line-number2>,<file-name2>:<line-number3>] --enforce-offloads
Not profitable:
Cache
/
Memory
Bandwidth Time is greater than other execution time components on Target Device
The
Total Execution Time
spent in Cache or Memory Bandwidth takes a big part of the Estimated Time on Target Device (+Host). As a result, it is greater than or equal to the Baseline Elapsed Time.
In the actual report, the
Cache/Memory
is replaced with a specific cache or memory level that prevents offloading, for example, L3 or SLM.
Solution 1.
  1. Local your loop of interest in the Call Tree section of the HTML report.
  2. Scroll to the
    Offload Information
    column group and expand it by double-clicking the title.
  3. Examine the values in one of the Total Execution Time by
    Cache
    /
    Memory
    columns for the memory level reported in the message to verify that Offload Advisor correctly evaluated the traffic.
  4. If traffic is correct: Examine the children of the code region to identify which part takes most of the time and prevents offloading.
  5. Optimize the part of your code that takes most of the
    Total Execution Time
    and rerun the collection and analysis.
Solution 2.
You can tell Offload Advisor to model offloading for only specific code regions even if they are not profitable.
Rerun the performance modeling with
--select-loops
to specify loops of interest and
--enforce-offloads
to make sure all of them are offloaded. For example:
advixe-python <APM>/analyze.py <project-dir> --select-loops=[<file-name1>:<line-number1>,<file-name1>:<line-number2>,<file-name2>:<line-number3>] --enforce-offloads
Not profitable because of offload overhead (taxes)
Total time of Offload Taxes, which includes
Invocation Tax
,
Kernel Code Transfer Tax
,
Data Transfer Tax
, takes a big part of the
Estimated Time on Target Device (+Host)
. As a result, it is greater than or equal to the
Baseline Elapsed Time
.
Solution 1.
  1. Locate your loop of interest in the Call Tree section of the HTML report.
  2. Scroll to the
    Overhead
    column group and expand it by double-clicking the title.
  3. Examine values in the columns of the
    Overhead
    group. Big value in any of the columns means that this code region cannot benefit from offloading because cost of offloading is high.
Solution 2.
You can tell Offload Advisor to model offloading for only specific code regions even if they are not profitable.
Rerun the performance modeling with
--select-loops
to specify loops of interest and
--enforce-offloads
to make sure all of them are offloaded. For example:
advixe-python <APM>/analyze.py <project-dir> --select-loops=[<file-name1>:<line-number1>,<file-name1>:<line-number2>,<file-name2>:<line-number3>] --enforce-offloads

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804