• 01/27/2020
  • Public Content

Performance Optimization for Intel® Processor Graphics

Content expert:
Eugene Krasichkov
Performance Optimization for Intel® Processor Graphics is a series of recipes to help you determine and optimize performance bottlenecks in graphics applications.
Methodology
Use the series of recipes to learn how to use the Intel® GPA Graphics Frame Analyzer on Intel® Processor Graphics to profile your code efficiently and to find bottlenecks in the graphics pipeline.
Ingredients
To optimize performance of graphics applications on Intel® Processor Graphics with Intel® GPA, you need the following:
  • Tool:
    Intel® Graphics Performance Analyzers
TIP
To download a free copy of the Intel® Graphics Performance Analyzers toolkit, visit the Intel® GPA product page.
  • Operating System:
    Windows*, macOS*, Ubuntu*
  • GPU:
    Intel® Processor Graphics Gen6 - Gen11
  • API:
    DirectX* 9 - 12, Vulkan*, OpenGL*
How to Start Analysis
To get started with your analysis:
  1. Launch the Intel® GPA Graphics Monitor on your target system.
  2. Capture a sample frame or stream (for Vulkan) from your game with the Intel® GPA Heads-Up Display (HUD).
    NOTE
    It is recommended to analyze performance with the latest driver and version of Intel® GPA.
  3. Open the captured frame with the Intel® GPA Graphics Frame Analyzer.
    NOTE
    For Vulkan, open the captured stream in the Multiframe View, and then select a frame to open with Intel® GPA Graphics Frame Analyzer.
  4. Click the button to enable the Hotspot mode, and then select any event or group of events for further analysis.
In the normal mode you can manually select one event or a contiguous range of events. To properly observe graphics architecture, the selected events should meet the following conditions:
  • Total cycle count of all selected events is ≥ 20,000.
TIP
Check the GPU Core Clocks, cycles metric.
  • There are no state changes between the events, such as shader changes, pipeline state, etc. Texture and constant changes are exempt from this rule, unless the texture is a dynamically-generated surface.
  • Events share the same render, depth, and stencil surface. This is not an explicit check in Intel® GPA.
If you select a set of events that do not meet the above conditions, these events will be considered filtered events, and the analysis will not be conducted. When using metrics analysis techniques like this, do not have any state change within the selection. For example, if you measure two draw calls where one has a depth attachment and the other does not, any potential hotspot associated with depth would be averaged out over the two draw calls—effectively diluting the results.
How Graphics Frame Analyzer Identifies Bottlenecks Using Hardware Metrics
Once the selection is made, Intel® GPA Graphics Frame Analyzer playbacks the frame on your GPU, collects performance data, and highlights graphics architectural blocks with bottlenecks.
Green underline means that the bottleneck criteria are not met, and that this part of the pipeline is not the bottleneck. Red means that this part of the GPU pipeline is the primary bottleneck. Yellow means that the node is not a primary bottleneck, but does have performance optimization opportunities.
3D Metrics
Compute Metrics
Each of the metrics blocks in the Intel® GPA Graphics Frame Analyzer Metrics pane is mapped based on the graphics processing unit workflows. Intel® Processor Graphics performs deeply pipelined parallel execution of the front-end work and the back-end work within a single event. The front-end work includes geometry transformation, rasterization, early depth/stencil, etc. The back-end work includes pixel shading, sampling, color write, blend, and late depth/stencil. Due to the deeply pipelined execution, hotspots from downstream architectural blocks bubble up and stall upstream blocks. This can make it difficult to find the actual hotspot.
To find the primary hotspot using the metrics, Intel® GPA walks the pipeline in reverse order. Intel® GPA follows two separate workflows for 3D and general-purpose computing designed on graphics processing units (GPGPU).
Workflow for 3D workloads:
Workflow for compute workloads:
Green nodes within the flowcharts represent potential bottlenecks within the GPU. At each node Intel® GPA asks, whether the bottleneck is primary. If yes, the bottleneck for the particular selection is found. If no, Intel® GPA continues to the next node in the flowchart. Blue nodes branch the decision path and grey nodes represent terminal hotspots.
For more information about the unknown hotspots, read the following sections.

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804