Get Started Guide

  • 2021.2
  • 03/26/2021
  • Public Content

Identify High-impact Opportunities to Offload to GPU

Offload Modeling
perspective enables you to identify high-impact opportunities to offload to GPU as well as the areas that are not profitable to offload. It provides performance speed-up projection on accelerators along with offload overhead estimation and pinpoints accelerator performance bottlenecks.
Intel Advisor® offers two ways to run the
Offload Modeling
perspective: from the
Intel® Advisor
GUI and from command line interface (CLI).
Intel Advisor
enables you to open results collected using both methods in the GUI or in your web browser.
Offload Modeling
Perspective from
Intel® Advisor
In the
Analysis Workflow
pane, use a drop-down menu to select the Offload Modeling perspective, set data collection accuracy level to
. At this accuracy level,
Intel® Advisor
  • Collects Survey data with basic execution metrics of your application
  • Runs Characterization analysis to get the information about Trip Counts and floating-point operations (FLOP), simulate cache traffic, and estimate time required to transfer data from one device to another
  • Models application performance on a target device assuming that main hotspots can be executed in parallel
Click the button to run the perspective.
For details about data collection accuracy presets, see
Intel Advisor
User Guide: Offload Modeling Accuracy Presets
Upon completion,
Intel Advisor
displays an
Offload Modeling Summary
that offers:
  • Information on total potential speed-up of your application
  • Top 5 offloaded code regions in your call tree
  • Top 5 regions that are not profitable to offload
  • Number of offloaded functions/loops
  • Fraction of offloaded code relative to total time of original application
Intel Advisor
generates an interactive HTML report that is stored in the
. You can open the HTML report in your web browser.
Run Offload Modeling Perspective from Command Line Interface
To collect data and model your application performance on a target GPU using the
command line interface, do the following:
  1. Run the Survey analysis and collect performance metrics with online stack walk and static instruction mix on a host device:
    advisor --collect=survey --stackwalk-mode=online --static-instruction-mix --project-dir=./advi -- myApplication
  2. Run the Trip Counts and FLOP analysis, simulate multi-level GPU memory subsystem behavior, and estimate time required for transferring data from host to target:
    advisor --collect=tripcounts --flop --stacks --enable-cache-simulation --data-transfer=light --target-device=gen9_gt2 --project-dir=./advi -- myApplication
    • flop
      collects data about floating-point operations, integer operations, memory traffic, and mask utilization metrics.
    • stacks
      performs advanced collection of call stack data.
    • enable-cache-simulation
      models memory subsystem behavior on a target application.
    • data-transfer
      is set to light mode that models data transfer between host and device memory.
    • target-device
      specifies a device configuration to use for simulating cache behavior during Trip Counts collection. The following device configurations are available:
      gen9_gt2 | gen9_gt3 | gen9_gt4 | gen11_icl | gen12_dg1 | gen12_tgl
  3. Model application performance on a target device:
    advisor --collect=projection --no-assume-dependencies --config=gen9_gt2 --project-dir=./advi
    • no-assume-dependencies
      assumes that a loop does not have dependencies if the dependency type is unknown. It’s recommended option for the Medium accuracy mode since dependency analysis is not executed.
    • config
      sets a device configuration to model your application performance for. By default, this option is set to
      . The following device configurations are available:
      gen9_gt2 | gen9_gt3 | gen9_gt4 | gen11_icl | gen12_dg1 | gen12_tgl
For details about CLI options, see Intel Advisor User Guide: Command Line Interface.
Upon completion, open the collected results in the
Intel Advisor
GUI or open the interactive HTML report that is stored in the
using your web browser.
Intel Advisor enables you to create a read-only result snapshot using the following command:
advisor --snapshot --project-dir=./advi --pack --cache-sources --cache-binaries -- /tmp/my_proj_snapshot
What's Next
After running the
Offload Modeling
perspective, you need to identify, whether your top hotspots have loop-carried dependencies that might be show-stoppers for offloading. To do that:
  1. Rerun Performance Modeling analysis assuming that your main hotspots with unknown dependency types cannot be executed in parallel:
    • In the
      Intel Advisor
      GUI, expand the Performance Modeling analysis, make sure to enable the Assume Dependencies check box, and click the button to run it.
    • In the CLI, use the following command line:
      advisor --collect=projection --assume-dependencies --config=gen9_gt2 --project-dir=./advi
  2. If the difference between metrics for your hotspots collected with and without Assume Dependencies option is small (for example, 2x speed-up with Assume Dependencies and 2.2x speed-up without Assume Dependencies), rely on collected results. If the difference is big (for example 2x speed-up with Assume Dependencies and 50x speed-up without Assume Dependencies), consider running Dependencies analysis.
For details about checking for loop-carried dependencies, see the respective section in the
Intel Advisor
User Guide.
See Also
View useful information about Offload Modeling in the Offload Modeling Resources page.
Explore more ways to run Offload Modeling perspective from command line interface in
Intel Advisor
User Guide: Run Offload Modeling from Command Line
Explore typical scenarios of optimizing GPU usage described in the
Intel Advisor

Product and Performance Information


Performance varies by use, configuration and other factors. Learn more at