User Guide


Offload Modeling

Identify high-impact opportunities to offload to a target device as well as the code regions that are not profitable to offload by running the
Offload Modeling
Offload Modeling
  • Provides performance speedup estimation on target devices
  • Provides offload overhead estimation
  • Pinpoints performance bottlenecks
  • Takes into account not only compute and memory limitations, but the time required to transfer data and schedule region execution on a target device
Currently, you can model application performance only on Intel® GPUs.

Offload Modeling

Example of a Summary report of the Offload Modeling perspective
After you execute the
Offload Modeling
perspective, you first see a
report that includes the most important information about your application and compares the measured performance of you code with the modeled performance on a target device. The Summary gives you hints about your next steps. Review the following:
  • Main metrics for the modeled performance of your program in the
    Top Metrics
    Program Metrics
    panes. This information indicates if you should offload your application to a target device or not.
  • Specific factors that prevent your code from achieving a better performance if executed on a target device in the
    Offload Bounded by
  • Top five offloaded loops/functions that provide the highest benefit if offloaded sorted by speedup in the
    Top Offloaded
  • Top five non-offloaded loops/functions, performance metrics, and the reason why they were not offloaded in the
    Top Non-Offloaded
    pane. For details about reasons for not offloading and possible solutions, refer to Investigate Non-Offloaded Code Regions.

How It Works

Offload Modeling
perspective includes the following steps:
  • Get the baseline performance data for your application by running a
  • Identify the number of times loops are invoked and executed and the number of floating-point and integer operations, estimate cache and memory traffics on target device memory subsystem by running the
  • Identify loop-carried dependencies by running the
  • Select regions of interest that might be offloaded to a target device.
  • Estimate execution time for each selected code region if it is offloaded to a target device by running
    . If execution on target takes less time than on host, the region is profitable for offloading.
  • Compute the total program speed-up and other estimated performance metrics according to Amdahl's law, taking into account speed-up from the most profitable regions.

Perspective Views

  • Analysis Workflow
    tab - Review the controls available to configure the perspective workflow for your application.
  • Offload Modeling Summary
    - Review the most important information about your application performance modeled for a target device.
  • Accelerated Regions
    report - Review the detailed information on all of the offloaded and non-offloaded regions of the code.
  • Logs
    - Review the log messages reported during the perspective execution.

Product and Performance Information


Performance varies by use, configuration and other factors. Learn more at