GPU Rendering Analysis (Preview)
How It Works
- System-wide profiling on all virtual domains (Dom0, DomUs) running under the Xen* hypervisor to identify domains that take too many resources and introduce a bottleneck for the whole platform:
- Profiling of OpenGL-ES applications running on Linux* systems to detect performance-critical API calls:
- For Xen virtualization platforms:
- For Xen platform-wide analysis on all virtual domains (Dom0, DomUs), select theProfile Systemtarget type.
- For a graphical app using OpenGL-ES API, select theLaunch ApplicationorAttach to Processtarget types.
Configure and Run Analysis
- Click the (standalone GUI)/ (Visual Studio IDE)Configure Analysisbutton on theIntel® VTune™toolbar.ProfilerTheConfigure Analysiswindow opens.
- FromHOWpane, click the Browse button and selectGPU Rendering.
- Optionally, you may edit the following collection options:
To modify a pre-defined option in theDetailssection or change the list of collected hardware events, create a new custom analysis type .
- Use theGPU sampling internal, msfield to specify an interval (in milliseconds) between GPU samples for GPU hardware metrics collection. By default, theVTuneuses 1ms interval for the hardware event-based sampling collection and 1000ms for the user-mode sampling and tracing collection.Profiler
- Use theAnalyze Processor Graphics hardware eventsoption to monitor the Render and GPGPU engine usage (Intel Graphics only), identify which parts of the engine are loaded, and correlate GPU and CPU data. This option requires root/administrative privileges.VTuneprovides platform-specific presets of the hardware metrics. For this analysis, the Render Basic event group is pre-selected. All presets collect data about execution units (EUs) activity: EU Array Active, EU Array Stalled, EU Array Idle, Computing Threads Started, and Core Frequency.Profiler
- Overviewevent set also includes metrics that track general GPU memory accesses such as Memory Read/Write Bandwidth, GPU L3 Misses, Sampler Busy, Sampler Is Bottleneck, and GPU Memory Texture Read Bandwidth. These metrics can be useful for both graphics and compute-intensive applications.
- Compute Basic (with global/local memory accesses)event group also includes metrics that distinguish accessing different types of data on a GPU: Untyped Memory Read/Write Bandwidth, Typed Memory Read/Write Transactions, SLM Read/Write Bandwidth, Render/GPGPU Command Streamer Loaded, and GPU EU Array Usage. These metrics are useful for compute-intensive workloads on the GPU.
- Compute Extendedevent group includes metrics targeted only for GPU analysis on the Intel processor code name Broadwell and higher. For other systems, this preset is not available.
- Render Basic(preview) event group includes Pixel Shader, Vertex Shader, and Output Merger metrics
- Full Computeevent group is a combination of the Overview and Compute Basic event sets.