User Guide

Contents

Run GPU-to-GPU Performance Modeling (Preview)

To model performance of a Data Parallel C++ (DPC++), OpenCL™, or OpenMP* target application on a graphics processing unit (GPU) device, run the GPU-to-GPU modeling workflow of the
Offload Modeling
perspective.
This is a
technical preview
feature.

Workflow

The GPU-to-GPU performance modeling workflow is similar to the CPU-to-GPU modeling workflow and includes the following steps:
  1. Measure the performance of GPU-enabled kernels running on an Intel® Graphics.
  2. Model application performance on a target GPU device and compare the estimated performance metrics to the baseline performance metrics.
Compared to the CPU-to-GPU performance modeling, GPU-to-GPU performance modeling has a better modeling accuracy because it considers the similarities in hardware configurations, compilers code-generation principles, and software implementation aspects between the baseline and modeled code. During the GPU-to-GPU performance modeling, Intel Advisor does the following:
  • Analyzes only GPU compute kernels and ignores application parts executed on a CPU
  • Measures compute kernel characteristics using the
    GPU profiling capabilities
  • Models performance for
    all
    kernels executed on GPU one to one, considering tax for transferring data between host and device memory (offload overhead) and kernel invocation tax
  • Disables high-overhead features of the CPU application analysis, such as call stack handling, cache and data transfer simulation, dependencies analysis
  • Traces memory objects transferred between host and device memory
    For correct memory object tracing, GPU kernels should run with the oneAPI Level Zero back end.

Prerequisites

  1. Configure your system to analyze GPU kernels.
  2. Set
    Intel Advisor
    environment variables
    with an automated script to enable Intel Advisor command line interface.

Run the GPU-to-GPU Performance Modeling

You can run the GPU-to-GPU performance modeling
only
from command line with the Intel Advisor Python* scripts. Use
one
of the following methods:
  • Collect baseline performance metrics with the
    collect.py
    script and model performance with the
    analyze.py
    script. This method allows you to configure the collection and modeling steps separately and has more options that you can use to modify the behavior.
  • Collect performance metrics and model performance with the
    run_oa.py
    script. This method is the simplest, but less flexible because the script has more recommended options set by default.
Run the collect.py and analyze.py Scripts
In the commands below, replace
<APM>
with
$APM
on Linux OS or with
%APM%
on Windows OS.
Run the scripts as follows:
  1. Collect performance metrics with the
    collect.py
    script and the
    --gpu
    option:
    advisor-python
    <APM>
    /collect.py
    <project-dir>
    --collect=basic
    --gpu
    [<analysis-options>] --
    <target-application>
    [
    <target-options>
    ]
    where
    <script-options>
    is one or several options to modify the script behavior. See collect.py Script for a full option list.
    This command runs the Survey, Trip Counts, and FLOP analyses only for the GPU kernels.
  2. Model application performance on a target GPU with
    analyze.py
    and
    --gpu
    :
    advisor-python
    <APM>
    /analyze.py
    <project-dir>
    --gpu
    [--config=
    <config-file>
    ] [--out-dir <path>] [<analysis-options>]
    where:
    • --config=
      <config-file>
      is a target GPU configuration to model performance for. The following device configurations are available:
      gen11_icl
      (default),
      gen12_tgl
      ,
      gen12_dg1
      ,
      gen9_gt4
      ,
      gen9_gt3
      ,
      gen9_gt2
      .
    • --out-dir
      <path>
      is a directory to save all generated results files to.
    • <script-options>
      is one or several options to modify the script behavior. See analyze.py Script for a full option list.
Run the run_oa.py Script
In the commands below, replace
<APM>
with
$APM
on Linux OS or with
%APM%
on Windows OS.
Collect baseline performance metrics for GPU kernels and model their performance on a target GPU:
advisor-python
<APM>
/run_oa.py
<project-dir>
--collect=basic
--gpu
[--config=
<config-file>
] [--out-dir <path>] [<analysis-options>] --
<target-application>
[
<target-options>
]
where:
  • --config=
    <config-file>
    is a target GPU configuration to model performance for. The following device configurations are available:
    gen11_icl
    (default),
    gen12_tgl
    ,
    gen12_dg1
    ,
    gen9_gt4
    ,
    gen9_gt3
    ,
    gen9_gt2
    .
  • --out-dir
    <path>
    is a directory to save all generated results files to.
  • <script-options>
    is one or several options to modify the script behavior. See run_oa.py Options for a full option list.
This command runs the Survey, Trip Counts, and FLOP analyses only for the GPU kernels and models their performance on the selected target GPU.
After running
run_oa.py
and getting your first performance modeling results, you can run the
analyze.py
as many times as you need to remodel the performance with different software and/or hardware parameters.

View the Results

Once the Intel Advisor finishes the analyses, it prints a result summary and a result file location to a command prompt. By default, if you did not use the
--out-dir
option to change the result location, Intel Advisor generates a set of reports to the
<project-dir>
/e
<NNN>
/pp
<NNN>
/data.0
directory. The directory includes the following files:
  • The main report in HTML format named as
    report.html
  • A set of CSV files with detailed metric tables
Examine the results with the interactive HTML report. See Explore Performance Gain from GPU-to-GPU Modeling for details.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.