User Guide


Offload Modeling

Find high-impact opportunities to offload to a target device and the profitability of porting your code to the target.
Offload Modeling
perspective can help you to:
  • Determine if you should offload your code to a target device provide and what is the potential speedup for different GPU models
  • Identify loops that are recommended for offloading
  • Pinpoint potential performance bottlenecks on the target platform to decide on optimization directions before porting the code
  • Check how effectively data can be transferred between host and target devices after you offload your code
Run the
Offload Modeling
for C and C++ applications or Data Parallel C++ (DPC++), C++/Fortran with OpenMP* pragmas, or OpenCL™ applications offloaded to CPU for analysis.
Currently, you can model application performance only on Intel® GPUs.

How It Works

Offload Modeling
perspective includes the following steps:
  1. Get the baseline performance data for your application by running a
  2. Identify the number of times loops are invoked and executed and the number of floating-point and integer operations, estimate cache and memory traffics on target device memory subsystem by running the
  3. Mark up loops of interest and identify loop-carried dependencies that might block parallel execution by running the
  4. Estimate the total program speedup on a target device and other performance metrics according to Amdahl's law, considering speedup from the most profitable regions by running
    Performance Modeling
    . A region is profitable if its execution time on the target is less than on a host.

Offload Modeling

Offload Modeling
perspective measures performance of your application and compares it with its modeled performance on a selected target GPU so that you can decide what parts of your application you can execute on the GPU and how you can optimize it to get a better performance after offloading.
  • Main metrics for the modeled performance of your program indicating if you should offload your application to a target device or not.
  • Specific factors that prevent your code from achieving a better performance if executed on a target device, that is the factors that your code is bounded by.
  • Top five offloaded loops/functions that give higher speedup and top five not offloaded loops.
Example of a Summary report of the Offload Modeling perspective

Product and Performance Information


Performance varies by use, configuration and other factors. Learn more at