GTPin
Welcome to GTPin

========================================================================================

Introduction

========================================================================================
Binary Instrumentation Technology is widely used in the world of CPUs, in software and hardware development for dynamic performance analysis, emulation of future instructions, tracing, modeling, and more. The Intel® Binary Instrumentation technology for x86, named the Pin project, is the underlying technology for many internal and external tools. However, in the world of Graphics Processing Units (GPU), and specifically in Intel® GPU (GEN) architecture, the profiling and performance analysis capabilities are limited, or even simply absent.

The GTPin framework is a unique platform, and the only SW platform available for profiling the GEN Execution Units (EUs). GTPin includes a binary instrumentation engine for Intel GPUs EUs, along with an API for developing analysis tools, and many sample tools. GTPin allows you to capture a range of dynamic profiling data at the finest granularity of the specific GPU EU instruction. GTPin supports both compute and graphics workloads. It operates on regular, real-world GPU applications, as well as on precaptured API streams. The technology enables fast and accurate dynamic analysis of the code that is executing on the GPU EUs. GTPin opens up new opportunities to perform dynamic, low level workload and HW analysis on an Intel GPU, with greater efficiency than other current solutions. Some of the GTPin capabilities are integrated into Intel® VTune™ Amplifier, Intel® Advisor and the Intel® Graphics Performance Analyzer (GPA).

GTPin is available, along with a set of analysis tools based on the GTPin framework. It also enables more advanced users to develop their own analysis tools. GTPin can analyze any GPU application. It also collects dynamic profiling data which the application executes on the GPU.

The rest of this guide describes GTPin capabilities, describes how to develop a profiling tool on top of GTPin, describes its API, and shows several examples how to use it.

Tutorial sections:

Reference sections:

========================================================================================

High Level Architecture

========================================================================================
The picture below shows a software stack schematic of an original graphics or GP-GPU application that exploits a GPU device, and how profiling flow goes. GTPin framework is located beneath the application. The original application is not aware of being instrumented, and all the process happens in a completely transparent manner. GTPin receives the original binary kernel from the driver and instruments it. All the original instructions of the kernel, their order and flow remain intact. In additional, some extra instructions are added to the kernel to perform the profiling functionality. The new binary kernel - instrumented kernel - is sent for the execution on the device. During the execution the profiling data is collected within a memory buffer. In the end of the process GTPin retrieves the gathered profiling data from the memory and passes it to the user.

To perform an instrumentation work GTPin is instructed by a Profiling Tool (or Profiling Application). The latter drives the instrumentation process, decides what kind of the instrumentation to perform, what kind of profiling data to collect, and how to process the gathered data. Profiling Tool communicates with GTPin framework via so called "Tool API" interface. The rest of this tutorial explains the Tool API and how to create a new Profiling Tool based on existing examples.





========================================================================================

Capabilities

========================================================================================

Supported operating systems

GTPin supports the following operating systems:

  • Microsoft Windows 10*
  • The Linux Foundation Linux* (tested on Canonical Ubuntu 18.04*)

GTPin supports 64-bit and 32-bit applications.

Supported Gfx and GP-GPU APIs

GTPin supports the following graphics and GP-GPU interfaces:

  • Intel® oneAPI Level Zero* (Level Zero*)
  • OpenCL*
  • Microsoft DX11*
  • Microsoft DX12*
  • Khronos Group Vulkan*

Supported GEN hardware

GTPin supports the following HW platforms:

Intel Integrated Graphics:

  • 7th-11th Gen Intel® Core™ processors with Intel® UHD or Intel Iris® Plus graphics

Intel Discrete Graphics:

  • Intel® Iris® Xe MAX graphics

Supported capabilities

What is GTPin good for?

GTPin is a unique framework, and the only framework that allows performing binary instrumentation of the Intel GPU (GEN) kernels for profiling purposes. GTPin is available along with a set of analysis tools based on the GTPin framework. GTPin also enables more advanced users to develop their own analysis tools. GTPin can analyze any GPU application. It also collects the dynamic profiling data which the application executes in the GPU. Some of the available GTPin analysis tools include:

  • Funtime and basic block (BBL) Latency sample tools allow you to perform Performance Profiling and Hot-spots Detection. The Funtime sample tool collects and aggregates EU cycles at the function (kernel) granularity. The BBL latency sample tool collects and aggregates EU cycles at the BBL granularity. Using these sample tools, you can easily find the most time-consuming blocks of code within any application.
  • OpcodeProf sample tool allows you to collect an instructions mix. This analysis tool generates a histogram of the opcodes being executed by the program. Examples of questions answered by the Opcode Profilier include: How many multiply and add instructions have been executed? Is systolic array significant part of this workload? And so on.
  • SIMDProf sample tool provides the number of active SIMD vector elements at the execution of each instruction within the kernel. This instrumentation allows accurate calculation of the number of operations executed, while taking SIMD instructions and SIMD masks into consideration. Note that the Intel GPU architecture includes both static and dynamic methods to disable SIMD vector elements.
  • Traces are a powerful tool in the development of next-generation CPUs. They are also critical in GPU development. Traces are usually used to feed performance simulators, in order to check how architectural changes affect the performance of the workload. The GTPin tracing tool generates an instructions trace and a memory addresses trace, partitioned to HW threads, EUs, slices, etc. It is possible to add additional information to these traces.
    • Thread Level Occupancy gives you the ability to analyze GPU EU busy and idle times, and see how the application`s execution is balanced across the EUs. This tool is based on tracing time-stamp counters at the beginning and end of any SW thread dispatch to any of the HW threads.
    • Memory Contention Analysis. By analyzing the memory address tracing, you can build memory access patterns of the application, to check their balance and efficiency, and thus to provide recommendations for the developer how to improve the approach. This is currently prototyping by the VTune pathfinding team to support next Intel discrete graphics products.

GTPin can capture any data available at the EU scope while executing a program. It can capture such data at the lowest granularity possible: the single EU assembly instruction. You can create an unlimited variety of analysis tools using the GTPin technology.

More details on the existing tools can be found in Profiling tools examples.

Profiling data granularity

GTPin collects profiling data separately, for:

  • Each kernel/shader
  • Each Enqueue/Draw command
  • Each Execution Unit (EU)
  • Each HW thread

A user can limit GTPin profiling for specific kernels and shaders, and for specific Enqueue/Draw commands.

Profiling HW scope

A user can limit GTPin profiling to a specific subset of underlying hardware (Execution Units and HW threads).

Profiling Thread Group IDs

A user can limit GTPin profiling to a specific Thread Group IDs.

========================================================================================

Installation process

========================================================================================
To install GTPin you must unzip the release package that is provided as a zipped archive.

What's included within the package

When opened, the GTPin package has the following directory structure:

Profilers
|--Bin
|--Docs
|--Examples
|--GTReplay
|--Include
|--Lib
|--Pin
|--Scripts

  • Bin. Contains the main gtpin executable.
  • Docs. Contains the GTPin User Guide.
  • Examples. Contains GTPin sample tools sources and corresponding DLLs.
  • GTReplay. Contains GTReplay binaries and sample tools sources and corresponding DLLs.
  • Include. Contains all header files required to build GTPin tools.
  • Lib. Contains GTPin libraries.
  • Pin. Contains Pin executables (an external component).
  • Scripts. Contains Python Software Foundation Python* scripts for uncompressing GTPin traces.

In addition, the package contains the GTPin license and required external licenses.

========================================================================================

How to run GTPin

========================================================================================
In order to run GTPin, you must run the following command line:

Profilers\Bin\gtpin.exe -t toolname [gtpin arguments] -- app.exe [application arguments]

The list of the arguments and parameters that can be provided to GTPin is listed in GTPin Parameters.

========================================================================================

GTPin Parameters

========================================================================================
GTPin supports several configuration parameters. The most useful parameters are:

  • -t, --toolname
    Specifies the name of the tool (if it is an existing GTPin sample tool), or it provides the path to the tool’s DLL.
  • --dump_isa
    Dumps all ISA files into profile_dir/ISA.
  • --hw_profile_scope
    Sets the HW scope to perform profiling (see details in Defining the HW Scope for Profiling)
  • --thread_group_scope
    Sets the Thread Group IDs to be profiled (see details in Defining the Thread Group IDs for Profiling)
  • --filter
    Specifies the kernels to be instrumented or the kernels not to be instrumented (see details in Kernel Instrumentation Filtering)
  • --allow_sregs
    If set to 1, this parameter enables a spill&fill of the GTPin registers in the scratch area. If set to 0, disables the spill&fill. By default, this parameter is enabled.
  • -h, --help
    Displays usage information and lists all the supported configuration parameters.

In order to see all the supported parameters, you must run the following command line:

Profilers\Bin\gtpin.exe –help

========================================================================================

GTPin Sample Tools

========================================================================================
The list of the existing sample tools can be found in Existing GTPin Sample Tools

========================================================================================

GTReplay

========================================================================================
GTReplay user guide can be found in GTReplay



Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.

Copyright © 2011-2020 Intel Corporation. All rights reserved.

* Other names and brands may be claimed as the property of others.