User Guide


Examine Data Transfers for Modeled Regions

Accuracy Level


Enabled Analyses

Survey with in-depth static analysis + Trip Counts and FLOP with callstacks, basic data transfer simulation, and GPU memory subsystem simulation (Characterization) + Modeling

Result Interpretation

After running the
Offload Modeling
perspective with
accuracy, you get an extended
Offload Modeling
report, which shows you in additions to the basic data:
  • More accurate estimations of traffic and time for all cache and memory levels.
  • Measured data transfer and estimated data transfer between host and device memory.
  • Total data for the loop/function from different callees.
Offload Modeling
perspective assumes a loop is parallel if its dependency type is unknown. It means that there is no information about a loop from a compiler or the loop is not explicitly marked as parallel, for example, with a programming model (OpenMP*, Data Parallel C++,
Intel® oneAPI Threading Building Blocks
  • If you already have a report generated for a lower accuracy, all offload recommendations, metrics, and speed-up will be updated to be more precise taking into account new data.
  • The
    accuracy level for the
    Offload Modeling
    perspective provides sufficient information about memory and cache usage and taxes of your offloaded application.
Example of an Accelerated Regions report with data transfer and tax estimations (Offload Modeling perspective)
In the
Accelerated Regions
tab of the
Offload Modeling
report, review the metrics about memory usage and data transfers:
  • In the metrics table:
    • In the
      column of the
      Estimated Bound-by
      group, review a full picture of time taxes paid for offloading to a target platform.
    • In the
      Estimated Data Transfer
      column, review the amount of data read by and written to a target platform if code is offloaded.
    • In the
      Memory Estimates
      column group, see how well your application uses resources of all memory levels. Expand the group to see more detailed and accurate metrics for different memory levels.
  • Select a code region from the table and review the details about amount of data transferred between host and device memory in the
    Data Transfer Estimations
    • See the total amount of data transferred in each direction and the corresponding offload taxes.
    • See hints about optimizing data transfers in the selected code region.
For details about metrics reported, see Accelerator Metrics.

Next Steps

  • Based on collected data, rewrite your code to offload to a target platform and measure performance of GPU kernels with
    GPU Roofline Insights
  • Consider running the
    Offload Modeling
    perspective with a higher level of accuracy to get more precise offload recommendations.

Product and Performance Information


Performance varies by use, configuration and other factors. Learn more at