User Guide

Contents

Investigate Not-Offloaded Code Regions

The modeling step analyzes code regions profitability for offloading to a target device. Some regions might be not profitable for offloading or cannot be modeled. To see details why your code region of interest is reported as not recommended for offloading, select a loop in a
CPU+GPU
pane and see the detailed loop information, including the reason why the loop is not recommended for offloading, in the
Details
pane.
If you want to apply filters and see only loops not recommended for offloading, open the drop-down list and select
Non-Offloaded
.
For each loop not recommended for offloading, you can force offload modeling. See the Enforce Offloading for Specific Loops.

Cannot Be Modeled

Message
Cause and Details
Solution
Cannot be modeled: Outside of Marked Region
Intel® Advisor
cannot model performance for a code region because it is not marked up for analysis.
Make sure a code region satisfies all markup rules or use a different markup strategy:
  • It is not a system module or a system function.
  • It has instruction mixes.
  • It is executed.
  • Its execution time is not less than 0.02 seconds.
Cannot be modeled: Not Executed
A code region is in the call tree, but the
Offload Modeling
detected no calls to it for a dataset used during Survey.
This can happen if execution time of the loop is very small and close to the sampling interval of the Intel® Advisor. Such loops can have significant inaccuracies in time measurement. By default, the sampling interval for Survey is 0.01 seconds.
You can try to decrease the sampling interval of the
Intel Advisor
:
  1. Go to
    Project Properties
    Survey Hotspots Analysis
    Advanced
    .
  2. Set the
    Sampling Interval
    to less than 10ms.
  3. Re-run
    Offload Modeling
    .
Cannot be modeled: Internal Error
Internal Error
means incorrect data or lack of data because the
Intel Advisor
encountered issues when collecting or processing data.
Try to re-run the
Offload Modeling
perspective to fix the metrics attribution problem. If this does not help, use the Analyzers Community forum for technical support.
Cannot be modeled: System Module
This code region is a system function/loop.
This is not an issue. If this code region is inside an offload region, or a runtime call, its execution time is added to execution time of offloaded regions.
Cannot be modeled: No Execution Count
The
Intel Advisor
detected no calls to a loop during Trip Count analysis and no information about the Execution Counts is available for this loop.
Try re-running the
Offload Modeling
to fix the metrics attribution problem.

Less or Equally Profitable Than Children/Parent Offload

This message is not an issue. It means that
Intel Advisor
has found a more profitable code region to offload. If you still want to see offload estimations for the original code region, use the solutions described in the table below.
Message
Cause and Details
Solution
Less or equally profitable than children offloads
Offloading child loops/functions of this code region is more profitable than offloading the whole region with all its children. This means that the
Estimated Time
on a target platform for the region of interest is greater than or equal to the sum of
Estimated Time
on a target platform of its child regions profitable for offloading.
The following reasons might prevent offloading: total execution time, taxes, trip counts, dependencies.
Disable analyzing child loops of all region heads.
  1. Go to
    Project Properties
    Performance Modeling
    .
  2. Enter
    --no-model-children
    in the
    Other Parameters
    field.
  3. Re-run Performance Modeling.
Less or equally profitable than parent offload
Offloading a whole parent code region of the region of interest is more profitable than offloading any of its child regions separately. This means that the
Estimated Time
on a target platform for the region of interest is greater than or equal to the
Estimated Time
on a target platform of its parent region.
Offloading a child code region might be limited by high offload taxes.
Model offloading for only specific code regions even if they are not profitable.
  1. Go to
    Project Properties
    Performance Modeling
    .
  2. Enter
    --select=<loops-to-offload>
    to specify loops of interest and
    --enforce-offloads
    to make sure all of them are offloaded in the
    Other Parameters
    field.
  3. Re-run Performance Modeling.

Not Profitable

Message
Cause and Details
Solution
Not profitable: Parallel execution efficiency is limited due to Dependencies
Dependencies limit parallel execution and the code region cannot benefit from offloading to a target device. The estimated execution time after acceleration is greater than or equal to the original execution time.
Solution 1
If you did not enable Dependencies analysis when collecting data, run the analysis to get detailed information about real dependencies in your code.
Refer to the Dependencies Problem and Message Types Reference for details on potential dependencies problems.
Solution 2
Ignore dependencies (real and assumed) and model offloading for all or selected code regions:
  1. Go to
    Project Properties
    Performance Modeling
    .
  2. Enter one of the options in the
    Other Parameters
    field:
    • --no-assume-dependencies
      to ignore dependencies for
      all
      code regions
    • --set-parallel=[<loop-IDs/source-locations>]
      to ignore dependencies for specified code regions
  3. Re-run Performance Modeling.
Not profitable: The Number of Loop Iterations is not enough to fully utilize Target Platform capabilities
The loop cannot benefit from offloading to a target platform as it has a low number of iterations.
For example, if a target device can execute up to 1024 threads in parallel, but a loop has only 100 iterations, offloading of this loop can result in up to 924 threads being inactive. This may significantly reduce the benefit of using the target device.
In most cases, such code regions cannot benefit from offloading. If you still want to offload such loops, you can try the following workaround:
If a loop is broken down into several chunks by a compiler or a program model:
  1. Go to
    Project Properties
    Performance Modeling
    .
  2. Enter
    --batching
    or
    --threads=<target-threads>
    in the
    Other Parameters
    field.
    <target-threads>
    is the number of parallel threads equal to the target device capacity.
  3. Re-run Performance Modeling.
Not profitable: Data Transfer Tax is greater than Computation Time and Memory Bandwidth Time
Time spent on transferring data to a target device (Data Transfer Tax) is greater than
Compute
time and
DRAM BW
(bandwidth) time. The resulting
Estimated Time
on a target platform on a target platform with data transfer tax is greater than or equal to the
Measured Time
on a host platform.
Check data in the
Data Transfer
column. Large value means that this code region cannot benefit from offloading.
If you still want to offload such loops, disable data transfer analysis to use only estimated execution time for speedup and profitability calculation.
This option disables data transfer analysis for all loops. You may get different performance modeling results for all loops.
Not profitable: Computation Time is high despite the full use of Target Platform capabilities
The code region uses the Target Platform capabilities to the full, but the Computation Time is still high. As a result, the
Estimated Time
on a target platform is greater than or equal to the Baseline Elapsed Time.
Check the value in the
Compute
column in the
Estimated Bound-by
column group.
  • High value means that this code region cannot benefit from offloading.
  • Unexpectedly high value indicates a problem with a programming model used.
Not profitable:
Cache/Memory
Bandwidth Time is greater than other execution time components on Target Device
The time spent in Cache or Memory Bandwidth takes a big part of the
Estimated Time
on a target platform. As a result, it is greater than or equal to the
Estimated Time
.
In the actual report, the
Cache/Memory
is replaced with a specific cache or memory level that prevents offloading, for example, L3 or LLC. See the
Throughput
column for details about the highest bandwidth time.
  1. Examine children of the code region to identify which part takes most of the time and prevents offloading.
  2. Optimize the part of your code that takes most of the
    Measured Time
    and rerun the collection and analysis.
Not profitable because of offload overhead (taxes)
Total time of Offload Taxes, which includes
Kernel Launch Tax
,
Data Transfer Tax
, takes a big part of the
Estimated Time
on a target platform. As a result, it is greater than or equal to the
Measured time
on a host platform.
Examine values in the columns of the
Overhead
group. Big value in any of the columns means that this code region cannot benefit from offloading because cost of offloading is high.
If kernel launch tax is large, enable hiding data transfer taxes from the
Project Properties
or using the
--assume-hide-taxes
option.

N/A - Part of Offload

This means that offloading a code region is less profitable than offloading its outer loop.
This is not an issue. The code region of interest is located inside of an offloaded loop.

Total Time Is Too Small for Reliable Modeling

This means the execution time of a code region or a whole loop nest is less than 0.02 seconds. In this case,
Intel Advisor
cannot estimate the speedup correctly and say if it is worth to offload the code regions because its execution time is close to the sampling interval of the
Intel Advisor
.
Possible Solution
If you want to check the profitability of offloading code regions with total time less than 0.02 seconds:
  1. Go to
    Project Properties
    Performance Modeling
    .
  2. Enter the
    --loop-filter-threshold=0
    option to the
    Other parameters
    field to model such small offloads.
  3. Re-run Performance Modeling.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.