Create Efficient Media & OPENCL™ Applications

Get the data you need to optimize OpenCL™ software and deliver high-performance image and video processing pipelines.

screenshot of the Intel VTune Amplifier G P U timeline

Figure 1

Analyze GPU & Platform Data

On newer Intel® processors, you can optionally collect GPU and platform data for tuning OpenCL™ and media applications, and in turn, view correlated GPU and CPU activities.

The timeline provides a detailed view of both CPU and GPU activity (see Fig. 1).

Easier OpenCL™ Application & GPU Profiling

When tuning OpenCL applications on newer processors, the Architecture Diagram helps you understand GPU hardware metrics and identify bottlenecks. Select an OpenCL kernel of interest and an execution time frame, and then Intel® VTune™ Amplifier updates the diagram with accurate performance data.

The GPU Architecture Diagram displays key metrics, making it easier to see the performance bottleneck (see Fig. 2).

Intel VTune G P U Architecture Diagram

Figure 2

kernel profiling interface for a G P U on an OpenCL application

Figure 3

Tune Inefficient Kernel Algorithms

Use GPU In-Kernel Profiling to identify performance issues caused by memory latency or inefficient algorithms. View a profile of where the most time is spent on the OpenCL source and the compiler assembly. Analyze Direct Memory Access (DMA) packet execution with the Packet Queue Depth and Packet Duration histograms.

Performance data displays on the OpenCL application source code so you know exactly where time is being spent (see Fig. 3).

Additional Capabilities