Window: Platform View (Windows* OS Workloads)

To access this window:

  1. Launch Platform Analyzer from the Start menu.

    The Platform Analyzer window opens.

  2. From the navigation pane on the left, select the *.gpa_trace entry corresponding to the Platform Analysis result you need.

    The Platform Analyzer window opens the data for the selected result on the right.

    Note

    In the Platform Analyzer window, you can also click the menu button and select the Result... option to navigate to a Platform Analysis trace and open it.

Platform Analyzer graphically represents Platform Analysis data over time:


Project Navigator. Navigate between trace result files captured with the Platform Analyzer. Data for the selected file shows up in the right pane. The Platform Analyzer window displays only the results that have been created on the current analysis system, with the current version of the product. To manage data in the Project Navigator, right-click a trace result entry and select the required context menu command.

Platform Analyzer toolbar. Use the toolbar options:

  • Project Navigator button to open the Project Navigator pane

  • Help button to access the product documentation

Timeline toolbar. Navigate the view by zooming in/out the data.

Frames. Identify bounds for GPU and CPU frames, where:

  • CPU Frame X is the time range between the moment frame X-1 is queued for presentation and the moment frame X is queued for presentation.
  • GPU Frame X is the time range between the moment frame X-1 is rendered on the screen and the moment frame X is rendered on the screen.

Hover over a frame object to view a summary including data on frame duration, frame rate, and others:

CPU and GPU frames with the same ID are displayed in the same color.

GPU Engine. Explore overall GPU usage per GPU engine or packet type at each moment of time. By default, the Platform Analyzer window displays GPU Usage and software queues per GPU engine. Hover over an object executed on the GPU (above the dotted line) to view a short summary on GPU usage, where GPU Usage is the time when a GPU engine was executing a workload. You can explore the top GPU Usage band in the chart to estimate the percentage of GPU engine utilization (yellow areas vs. white spaces) and options to submit additional work to the hardware.

To view and analyze GPU software queues, select an object (DMA packet) in the queue and the Platform Analyzer highlights the corresponding software queue bounds in red:

Full software queue prevents packet submissions and causes waits on a CPU side in the user-mode driver until there is space in the queue. To check whether such a stall decreases your performance, you may decrease a workload on the hardware and see if there are less waits on the CPU in threads that spawn packets. Another option could be to additionally load the queue by tasks and see whether the queue length increases.

Each DMA packet in the Platform Analyzer window has its own ID that helps track its life cycle in a software queue. The ID does not correspond to the rendered frames. You may identify where a packet came from by the thread name (corresponding to the name of the module where a thread entry point resides) specified in the tooltip.

The display mode for packet types is explained in the Legend. For example, present packets are displayed in a red hatch.

On systems with Intel Processor Graphics, you may select the Packet Type drop-down menu option in the Legend area to explore GPU usage and software queues per DMA packet domain:


Note

Detailed packet type data is available only if you enable the Collect DMA packet type option in the Profiles window during analysis configuration and reboot the system for this change to take effect. Otherwise, Platform Analyzer displays collected trace data as an Unknown packet type with no DMA packet data specifics.

Computing Queue. Analyze details on OpenCL kernels submission, in particular distinguish the order of submission and execution, and identify the time spent in the queue, zoom in and explore the Computing Queue data. VTune Amplifier displays kernels with the same name and global/local size in the same color. Synchronization tasks are marked with vertical hatching . Data transfers are marked with cross-diagonal hatching .

You can click a kernel task to highlight the whole queue to the execution displayed at the top layer. Hover over an object in the queue to see kernel execution parameters:

Explore how the execution path (marked in blue) of the OpenCL device queue (in red) correlates to the DMA packets software queue (in black). The OpenCL kernel queue expedites kernels to the driver where DMA packets of different types are get multiplexed in the single DMA queue. In the example above, the Render and GPGPU queue serves both graphics (GHAL3D) and compute (OpenCL) -originated packets. Note that video transcoding tasks pass through a specific Video Codec queue, which enables running the Intel MSDK commands by GPU in parallel in most cases.

Thread Lifetime. Explore CPU utilization by thread. Platform Analyzer provides the thread name as a name of the module where the thread function resides. For example, if you have a myFoo function that belongs to MyMegaFoo.dll, the thread name is displayed as MyMegaFoo.dll. This approach helps easily identify the location of the thread code producing the work displayed on the timeline.

Hover over a context switch area to see the details on its duration, reason, and affected CPU. Dark-green context switches show time slices when a thread was busy with a workload while light-green context switch objects show areas where a thread was waiting for a synchronization object. Gray areas show inactivity periods caused by preemption when the operating system task scheduler switched a thread off a processor to run another, higher-priority thread.

Correlate CPU and GPU usage and estimate whether your application is CPU or GPU bound. GPU Engines Usage bars show DMA packets on CPU threads originating GPU tasks. The bars are colored according to the type of used GPU engine (yellow bars in the example below correspond to the Render and GPGPU engine). If the GPU Engine area of the Platform View shows aggregated GPU usage for all threads and processes in the system, the GPU Engines Usage bars in the Thread Lifetime area show GPU engine utilization by a particular thread.

You can also zoom in to identify user-defined tasks or DirectX tasks executed at particular time frames and correlate this data with the GPU usage at the same time:

Platform Metrics. Correlate the data on GPU and CPU activity per thread and your application performance per GPU Metrics selected either in the profile for this analysis or from the System Analyzer window.

Statistics pane. Drag and drop to select a range of interest on the timeline. The Statistics pane is updated to synchronize with your selection. Analyze the statistics per metrics to understand whether the workload in the selected time range GPU-bound and identify hotspot objects. Depending on the objects included into the selected range, the Statistics pane may display the following data:

  • GPU Usage

    • GPU Time shows an amount of time used by GPU engines (total and per engine).

    • Queue Time shows an amount of time spent in a software queue (total and per engine).

  • OpenCL Kernels

    • Total Time shows overall time OpenCL™ kernels were running on a GPU within the selected time range (total and per kernel).

    • Computing Queue Time shows an amount of time spent by OpenCL kernels in the queue (total and per kernel).

  • Tasks, where tasks are ITT API calls, DirectX* calls and traced calls of OpenCL runtimes on a CPU

    • Task Time shows an amount of time spent within a task (total and per task).

    • Average Task Time shows an average amount of time spent within a task (total and per task).

    • Task Count shows a number of tasks in the selected time region (total and per task).

Legend controls. Filter in/out any type of data presented in the timeline by selecting/deselecting corresponding check boxes and drop-down menu options.

Select the VSync check box to display markers for vertical synchronization. Use this data to identify dependencies between GPU frames and VSync events. Depending on your hardware, this option may not show up in the Platform View for an application that does not use vertical synchronization. If vertical synchronization is not enabled for your application, you can use the Platform View to identify a real frame rate for your code. If your application uses vertical synchronization, you can select the VSync timeline option, estimate the correlation between VSync events and application frames, identify frames missing VSync events and explore possible reasons.

Filter toolbar. Filter the data displayed in the Timeline pane using the following options:

  • Metric filter. Mouse over the Filter icon to enable the metric drop-down menu and select a filtering metric:

    By default, you see 100% of all metric data collected in the result.

    For example, for the pre-selected Task Time metric you may open the Thread or Process filtering drop-down menu to see the percentage of the Task Time each process/thread introduces into the overall Task Time for the result.

    If you select a program unit in the filtering drop-down menu, your Timeline view will be filtered out to display data for this particular program unit. For example, if you select the Thread-347 thread introducing 46.9% of the Task Time, the result data displays statistics for this thread only and the Filter bar provides an indicator that only 46.9% of the Task Time data is currently displayed.

  • Thread filter. Select a thread to filter the collected data by its contribution. All data related to other threads is hidden. By default [Any Thread] is selected, which does not filter any data.

  • Process filter. Select a process to filter the collected data by its contribution. All data related to other processes is hidden. By default [Any Process] is selected, which does not filter any data.

  • Clear Filter icon. Remove all filters and view all the available data.

Note

If you applied filters available on the Filter bar to the data already filtered with the Filter In/Out by Selection context menu options, all filters are combined and applied simultaneously.

 

Context menus. Right-click and select commands to navigate the data by zooming in/filtering in selected time ranges/objects and adjust the view (band height and timescale) according to your preferences.

For more complete information about compiler optimizations, see our Optimization Notice.