Examine Regions Recommended for Offloading
- There is no tax for transferring data between baseline and target platforms.
- All data goes to L1/L3 cache level only. L1/L3 cache traffic estimation might be inaccurate.
- A loop is parallel if the loop dependency type in unknown (Assume Dependenciescheckbox is disabled). This happens when there is no information about a loop dependency type from a compiler or the loop is not explicitly marked as parallel, for example, with a programming model (OpenMP*, Data Parallel C++,Intel® oneAPI Threading Building Blocks(oneTBB))
- Review the metrics for the whole application in theSummarytab.
- Check if your application is profitable to offload to a target device or if it has a better performance on a baseline platform in theProgram Metricspanes.
- See what prevents your code from achieving a better performance if executed on a target device in theOffload Bounded bypane.If you enableAssume Dependenciesoption for thePerformance Modelinganalysis, you might see high percentage of dependency-bound code regions. You are recommended run the Dependencies analysis and rerun Performance Modeling to get more accurate results.
- If the estimated speed-up is high enough and other metrics in theSummarypane suggest that your application can benefit from offloading to a selected target platform, you can start offloading your code.
- If you want to investigate the results reported for each region in more detail, go to theAccelerated Regionstab and select a code region:
- Check whether your target code region is recommended for offloading to a selected platform. In theBasic Estimated Metricscolumn group, review theOffload Summarycolumn. The code region is considered profitable for offloading if estimated speed-up is more than 1, that is, estimated time execution on a target device is smaller that on a host platform.If your code region on interest is not recommended for offloading, consider re-running the perspective with a higher accuracy or refer to Investigate Not Offloaded Code Regions for recommendations on how to model offloading for this code region.
- In theThroughputcolumn of theEstimated Bounded-bygroup, review time spent for compute- and L3 cache bandwidth-bound parts of your code. If the value is high, consider optimizing compute and/or L3 cache usage in your application.
- Review the metrics in theCompute Estimatescolumn group to see the details about instructions and number of threads used in each code region.
- View the offload summary and details for the selected code region in theDetailspane.