Finding Performance Bottlenecks in OpenGL/OpenGL ES* Frames

Pre-requisites:

If you are analyzing an OpenGL* ES application, the target system to test against must be connected to your host system. This is required to calculate metrics data. Metrics data is only available for devices based on the Intel® Processor Graphics.

Using Graphics Frame Analyzer, you can explore a variety of metrics to identify performance bottlenecks in the frame, and analyze performance dependency on different drivers or states of the hardware.

To identify performance bottlenecks in a frame:

  1. Open your frame capture with Graphics Frame Analyzer.
  2. In the Profiling view, choose the available metrics for the X and/or Y axis to visualize specific aspects of performance in the frame.
  3. Review the bar chart to locate performance issues in the frame. You can analyze individual API calls that correspond to separate bars in the chart (default), or group them by render targets using the Group by Render Targets toggle button. 
    The scrollbar below the chart provides an overview of the entire frame, while the slider reflects the part of the frame currently displayed in the chart. You can stretch/shrink the slider to change the scaling of the bar chart. If the X-axis represents a non-constant metric, you can double-click the slider to toggle between the full frame view and the currently selected part.
  4. Select the bars that contribute the most to the frame time.

    The Metrics pane displays metrics information for the selected API calls/render targets. If multiple metrics are available for your device, Graphics Frame Analyzer groups them by the specific hardware blocks to which they correspond. The Pipeline blocks include metrics that represent the processing flow at each stage of the graphics rendering pipeline. The Interfaces blocks reflect the state of the external interfaces used by the Pipeline blocks. The width of each block corresponds to 100% of execution time for selected region of API calls. The colored markers below the blocks indicate EU states for this block:

      • Green markers indicate the active execution state
      • Red markers indicate the stalled state
      • Grey markers indicate the idle state

        NOTE

        The Pipeline view is only available when you select Draw or Dispatch calls. Otherwise, you can only see the metrics in the table format.

  5. Analyze metrics values to identify performance opportunities within the selected region of API calls. The basic analysis methodology is as follows:
    1. Check that the selected region is not memory-bound for GTI bandwidth and/or L3. The main indicators of memory bandwidth issues are multiple red markers in other metrics blocks, as most of them depend on GTI/L3 memory interfaces.
    2. Check the Pixel Back-End block for possible issues. The overall frame performance might be limited by pixel back-end maximum throughput, measured in pixels per clock. This is a typical issue on mobile platforms. If this is the case, try reducing the number of pixels required for output or changing graphics state conditions, as some graphics states might reduce maximum throughput of pixel back-end.
    3. Evaluate Sampler and EU states.

      TIP

      If you are opening your frame on a different system, its rendering context could differ from the context of the system where the frame was captured. If there is a difference in rendering contexts, it might affect the performance and metrics data. To understand any possible performance impact of the current rendering context, click the button. You can compare the original and current rendering contexts in the pop-up window.

  6. If your target platform supports GPU Duration, EU Active, and EU Stall GPU metrics, Graphics Frame Analyzer visualizes GPU duration for each program used by the selected API calls. The full circle of the pie chart represents GPU Duration of all the API calls where the program is used. For multiple API call selections that use more than one program, the size of the pie chart correlates with the GPU duration values displayed on the pie chart. Inner sectors of the pie chart represent the GPU time distribution between the EU Active (shades of green), EU Stall (shades of purple) and EU Idle (grey color):

    Hover over the pie chart to get details on EU Active, EU Stall, and EU Idle timings for each program. Each state receives a highlight outside the pie chart, and you can see the corresponding metric value, in ms:

The origin of some bottlenecks can be hard to troubleshoot. For example, a tiny API call might turn out to be a bottleneck because of latency issues. However, in most cases, these steps should be sufficient to identify performance issues in your frame.

To analyze your frame's performance dependency on different drivers or states of the hardware:

  1. Open the frame in Graphics Frame Analyzer in the Profiling view.
  2. Click the Export button to save the metrics data in a CSV file.
  3. Modify your device settings and reopen the frame file.

    Graphics Frame Analyzer calculates the new metrics data for your frame.

  4. Import one or more saved CSV files into the bar chart using the Import button , or simply drag and drop them from the file explorer onto the bar chart.

    The bar chart updates to show the imported metric values as thin colored bars, side-by-side with the current data represented by thick bars:

    NOTE:

    You can only import metrics data that was exported for the same frame file, with the same set of metrics. You cannot compare metrics data collected on different platforms with different sets of metrics available.

  5. Compare metric values in the chart to understand performance impact of the hardware settings. If you imported more than one metrics snapshot, hover over the rectangles above the chart to view the *.csv filenames. To remove the imported metrics data from the chart, click the corresponding rectangle.

Next Steps

See Also

Graphics Frame Analyzer Window: Profiling View

For more complete information about compiler optimizations, see our Optimization Notice.