Investigate Not-Offloaded Code Regions
The modeling step analyzes code regions profitability for offloading to a target device. Some regions might be not profitable for offloading or cannot be modeled. To see details why your code region of interest is reported as not recommended for offloading, select a loop in a
CPU+GPU
pane and see the detailed loop information, including the reason why the loop is not recommended for offloading, in the
Details
pane.
If you want to apply filters and see only loops not recommended for offloading, open the
drop-down list and select

Non-Offloaded
.
For each loop not recommended for offloading, you can force offload modeling. See the
Enforce Offloading for Specific Loops.
Cannot Be Modeled
Message | Cause and Details | Solution |
---|---|---|
Cannot be modeled: Outside of Marked Region
| Intel® Advisor cannot model performance for a code region because it is not marked up for analysis.
| Make sure a code region satisfies all markup rules or use a different markup strategy:
|
Cannot be modeled: Not Executed
| A code region is in the call tree, but the
Offload Modeling detected no calls to it for a dataset used during Survey.
| This can happen if execution time of the loop is very small and close to the sampling interval of the Intel® Advisor. Such loops can have significant inaccuracies in time measurement. By default, the sampling interval for Survey is 0.01 seconds.
You can try to decrease the sampling interval of the
Intel Advisor :
|
Cannot be modeled: Internal Error
| Internal Error means incorrect data or lack of data because the
Intel Advisor encountered issues when collecting or processing data.
| Try to re-run the
Offload Modeling perspective to fix the metrics attribution problem. If this does not help, use the
Analyzers Community forum for technical support.
|
Cannot be modeled: System Module
| This code region is a system function/loop.
| This is not an issue. If this code region is inside an offload region, or a runtime call, its execution time is added to execution time of offloaded regions.
|
Cannot be modeled: No Execution Count
| The
Intel Advisor detected no calls to a loop during Trip Count analysis and no information about the Execution Counts is available for this loop.
| Try re-running the
Offload Modeling to fix the metrics attribution problem.
|
Less or Equally Profitable Than Children/Parent Offload
Message | Cause and Details | Solution |
---|---|---|
Less or equally profitable than children offloads
| Offloading child loops/functions of this code region is more profitable than offloading the whole region with all its children. This means that the
Estimated Time on a target platform for the region of interest is greater than or equal to the sum of
Estimated Time on a target platform of its child regions profitable for offloading.
The following reasons might prevent offloading: total execution time, taxes, trip counts, dependencies.
| Disable analyzing child loops of all region heads.
|
Less or equally profitable than parent offload
| Offloading a whole parent code region of the region of interest is more profitable than offloading any of its child regions separately. This means that the
Estimated Time on a target platform for the region of interest is greater than or equal to the
Estimated Time on a target platform of its parent region.
Offloading a child code region might be limited by high offload taxes.
| Model offloading for only specific code regions even if they are not profitable.
|
Not Profitable
Message | Cause and Details | Solution |
---|---|---|
Not profitable: Parallel execution efficiency is limited due to Dependencies
| Dependencies limit parallel execution and the code region cannot benefit from offloading to a target device. The estimated execution time after acceleration is greater than or equal to the original execution time.
| Solution 1 If you did not enable
Dependencies analysis when collecting data, run the analysis to get detailed information about real dependencies in your code.
Refer to the
Dependencies Problem and Message Types Reference for details on potential dependencies problems.
Solution 2 Ignore dependencies (real and assumed) and model offloading for all or selected code regions:
|
Not profitable: The Number of Loop Iterations is not enough to fully utilize Target Platform capabilities
| The loop cannot benefit from offloading to a target platform as it has a low number of iterations.
For example, if a target device can execute up to 1024 threads in parallel, but a loop has only 100 iterations, offloading of this loop can result in up to 924 threads being inactive. This may significantly reduce the benefit of using the target device.
| In most cases, such code regions cannot benefit from offloading. If you still want to offload such loops, you can try the following workaround:
If a loop is broken down into several chunks by a compiler or a program model:
|
Not profitable: Data Transfer Tax is greater than Computation Time and Memory Bandwidth Time
| Time spent on transferring data to a target device (Data Transfer Tax) is greater than
Compute time and
DRAM BW (bandwidth) time. The resultingEstimated Time on a target platform on a target platform with data transfer tax is greater than or equal to the
Measured Time on a host platform.
| Check data in the
Data Transfer column. Large value means that this code region cannot benefit from offloading.
If you still want to offload such loops, disable data transfer analysis to use only estimated execution time for speedup and profitability calculation.
This option disables data transfer analysis for all loops. You may get different performance modeling results for all loops.
|
Not profitable: Computation Time is high despite the full use of Target Platform capabilities
| The code region uses the Target Platform capabilities to the full, but the Computation Time is still high. As a result, the
Estimated Time on a target platform is greater than or equal to the Baseline Elapsed Time.
| Check the value in the
Compute column in the
Estimated Bound-by column group.
|
Not profitable:
Cache/Memory Bandwidth Time is greater than other execution time components on Target Device
| The time spent in Cache or Memory Bandwidth takes a big part of the
Estimated Time on a target platform. As a result, it is greater than or equal to the
Estimated Time .
In the actual report, the
Cache/Memory is replaced with a specific cache or memory level that prevents offloading, for example, L3 or LLC. See the
Throughput column for details about the highest bandwidth time.
|
|
Not profitable because of offload overhead (taxes)
| Total time of Offload Taxes, which includes
Kernel Launch Tax ,
Data Transfer Tax , takes a big part of the
Estimated Time on a target platform. As a result, it is greater than or equal to the
Measured time on a host platform.
| Examine values in the columns of the
Overhead group. Big value in any of the columns means that this code region cannot benefit from offloading because cost of offloading is high.
If kernel launch tax is large, enable hiding data transfer taxes from the
Project Properties or using the
--assume-hide-taxes option.
|
N/A - Part of Offload
This means that offloading a code region is less profitable than offloading its outer loop.
This is not an issue. The code region of interest is located inside of an offloaded loop.
Total Time Is Too Small for Reliable Modeling
This means the execution time of a code region or a whole loop nest is less than 0.02 seconds. In this case,
Intel Advisor
cannot estimate the speedup correctly and say if it is worth to offload the code regions because its execution time is close to the sampling interval of the
Intel Advisor
.
Possible Solution
If you want to check the profitability of offloading code regions with total time less than 0.02 seconds:
- Go to.
- Enter the--loop-filter-threshold=0option to theOther parametersfield to model such small offloads.
- Re-run Performance Modeling.