GPU Rendering Analysis (Preview)

Use the GPU Rendering analysis to estimate your code performance based on the GPU usage per engine and GPU hardware metrics.

Note

This is a PREVIEW FEATURE. A preview feature may or may not appear in a future production release. It is available for your use in the hopes that you will provide feedback on its usefulness and help determine its future. Data collected with a preview feature is not guaranteed to be backward compatible with future releases. Please send your feedback to parallel.studio.support@intel.com or to intelsystemstudio@intel.com.

How It Works

GPU Rendering analysis focuses on the following usage models:

  • System-wide profiling on all virtual domains (Dom0, DomUs) running under the Xen* hypervisor to identify domains that take too many resources and introduce a bottleneck for the whole platform:

  • Profiling of OpenGL-ES applications running on Linux* systems to detect performance-critical API calls:

Prerequisites

For successful analysis, make sure to configure your system as follows:

Select Target

Your project configuration for the GPU Rendering analysis depends on the analysis target to profile:

  • For Xen platform-wide analysis on all virtual domains (Dom0, DomUs), select the Profile System target type.

  • For a graphical app using OpenGL-ES API, select the Launch Application or Attach to Process target types.

Note

These two usage models are mutually exclusive. This means that the Xen platform-wide analysis does not detect OpenGL-ES API calls, and, respectively, Xen virtual domain statistics is not available in the Launch Application and Attach to Process mode.

Configure and Run Analysis

To configure and run the GPU Rendering analysis:

  1. Click the (standalone GUI)/ (Visual Studio IDE) Configure Analysis button on the Intel® VTune™ Amplifier toolbar.

    The Configure Analysis window opens.

  2. From HOW pane, click the Browse button and select GPU Rendering.

  3. Optionally, you may edit the following collection options:

    • Use the GPU sampling internal, ms field to specify an interval (in milliseconds) between GPU samples for GPU hardware metrics collection. By default, the VTune Amplifier uses 1ms interval for the hardware event-based sampling collection and 1000ms for the user-mode sampling and tracing collection.

    • Use the Analyze Processor Graphics hardware events option to monitor the Render and GPGPU engine usage (Intel Graphics only), identify which parts of the engine are loaded, and correlate GPU and CPU data. This option requires root/administrative privileges.

      VTune Amplifier provides platform-specific presets of the hardware metrics. For this analysis, the Render Basic event group is pre-selected. All presets collect data about execution units (EUs) activity: EU Array Active, EU Array Stalled, EU Array Idle, Computing Threads Started, and Core Frequency.

      • Overview event set also includes metrics that track general GPU memory accesses such as Memory Read/Write Bandwidth, GPU L3 Misses, Sampler Busy, Sampler Is Bottleneck, and GPU Memory Texture Read Bandwidth. These metrics can be useful for both graphics and compute-intensive applications.

      • Compute Basic (with global/local memory accesses) event group also includes metrics that distinguish accessing different types of data on a GPU: Untyped Memory Read/Write Bandwidth, Typed Memory Read/Write Transactions, SLM Read/Write Bandwidth, Render/GPGPU Command Streamer Loaded, and GPU EU Array Usage. These metrics are useful for compute-intensive workloads on the GPU.

      • Compute Extended event group includes metrics targeted only for GPU analysis on the Intel processor code name Broadwell and higher. For other systems, this preset is not available.

      • Render Basic (preview) event group includes Pixel Shader, Vertex Shader, and Output Merger metrics

      • Full Compute event group is a combination of the Overview and Compute Basic event sets.

    To modify a pre-defined option in the Details section or change the list of collected hardware events, create a new custom analysis type.

  4. Click the Start button to run the analysis.

Note

You may generate the command line for this configuration using the Command Line button at the bottom.

VTune Amplifier collects data on the specified target and opens the result in the GPU Rendering viewpoint.

View Data

Start you analysis with the Summary view to understand whether your workload is CPU or GPU bound. Then, move to the Platform view that helps you visualize how GPU tasks are scheduled by CPU threads, identify CPU threads or processes utilizing graphics and understand what the CPU is doing while the GPU is executing.

For Xen platform-wide analysis, use the Platform view to identify virtual domains actively utilizing the GPU on the embedded system. GPU activities are colored on the timeline to mark a proper virtual domain.

If you analyze an application with OpenGL-ES API calls, the Platform view will show the calls as user tasks with detailed information available in the tooltip.

For more complete information about compiler optimizations, see our Optimization Notice.
Select sticky button color: 
Orange (only for download buttons)