Check for Dependency Issues
Accuracy Level
High
Enabled Analyses
Survey with GPU profiling and in-depth static analysis + Trip Counts and FLOP with callstacks and full data transfer simulation for all memory levels (Characterization) +
Dependencies
with reduction detection + Modeling
Result Interpretation
Without the Dependencies analysis, if a loop is not explicitly marked as parallel with pragmas or if a compiler assumes dependencies present,
Intel® Advisor
assumes the loop is not recommended for offloading because they have high compute time. In this case, you can see high percentage of dependency-bound code regions. To get accurate information about dependencies, run the Dependencies analysis.
After running the
Offload Modeling
perspective with
High
accuracy, you will get a complete
Offload Modeling
report extended with detailed information about loops that have and do not have dependencies and a full data transfer report.
If you already have a report generated for a lower accuracy, all offload recommendations, metrics, and speed-up will be updated to be more precise taking into account new data.

In the metrics table of the
Accelerated Regions
tab:
- Expand theMeasuredcolumn group and see theDependency Typecolumn. It indicates if the loop has dependencies and if yes, reports dependency types.In theDetailstab, see an icon indicating loop dependency type:
- code region is parallel.
- code region has dependencies.
- In theThroughputcolumn of theEstimated Bound-bygroup, review time spent for dependencies-bound parts of your code. If the value is high, fix the dependencies.
- Intel Advisormight detect that some of the loops do not have dependencies and can be offload candidates, even though they were previously assumed as having dependencies. Review the list of loops/functions considered profitable for offloading for new candidates.
Review the
Data Transfer Estimations
pane with detailed information about data transferred between host and device and memory objects. In addition to
basic data transfer report, it includes:
- Offloaded memory objects with size and transfer direction. By default, the object list is sorted by memory object size.
- The histogram distribution of objects that the selected region accessed by size.
Next Steps
- Based on collected data, rewrite your code to offload to a target platform and measure performance of GPU kernels withGPU Roofline Insightsperspective.