GPU Roofline Insights Perspective
- What is the maximum achievable performance with your current hardware resources?
- Does your application work optimally on current hardware resources?
- If not, what are the best candidates for optimization?
- Is memory bandwidth or compute capacity limiting performance for each optimization candidate?
- Choose collection accuracy level to select perspective steps and set analysis properties, depending on the desired results:
By default, accuracy is set toLow. For more info, see GPU Roofline Accuracy Presets.
- Low: Model your application performance for a target device and get the basic information about potential speed-up and performance.
- Medium: Model your application performance and data transfers between host and target devices.
- High: Model your application performance and data transfers and detect parallel regions to extend list of offload candidates.
- Custom: Customize the perspective flow and properties.
- For GPU Roofline, the accuracy level controls the complexity of theCPU Rooflinechart generated for loops/functions in your code executed on CPU. If you are interested only in code regions executed on GPU, selectLowaccuracy.
- The higher accuracy value you choose, the higher runtime overhead is added to your application.
- Run the perspective: click button.While the perspective is running, you can do the following in theAnalysis Workflowtab:
- Control the perspective execution:
- Stop data collection and see the already collected data: Click the button.
- Pause data collection: Click the button.
- Cancel data collection and discard the collected data: Click the button.
- Expand an analysis with to control the analysis execution:
- Pause analysis and see the already collected data: Click the button.
- Stop analysis and start the next analysis selected: Click the button.
- Interrupt execution of all selected analyses and see the already collected data: Click the button.