Offload Modeling Perspective
- Determine if you should offload your code to a target device provide and what is the potential speedup for different GPU models
- Identify loops that are recommended for offloading
- Pinpoint potential performance bottlenecks on the target platform to decide on optimization directions before porting the code
- Check how effectively data can be transferred between host and target devices after you offload your code
How It Works
- Get the baseline performance data for your application by running aSurveyanalysis.
- Identify the number of times loops are invoked and executed and the number of floating-point and integer operations, estimate cache and memory traffics on target device memory subsystem by running theCharacterizationanalysis.
- Mark up loops of interest and identify loop-carried dependencies that might block parallel execution by running theDependenciesanalysis.
- Estimate the total program speedup on a target device and other performance metrics according to Amdahl's law, considering speedup from the most profitable regions by runningPerformance Modeling. A region is profitable if its execution time on the target is less than on a host.
Offload Modeling Summary
- Main metrics for the modeled performance of your program indicating if you should offload your application to a target device or not.
- Specific factors that prevent your code from achieving a better performance if executed on a target device, that is the factors that your code is bounded by.
- Top five offloaded loops/functions that give higher speedup and top five not offloaded loops.