User Guide

Contents

Accelerator Metrics

This reference section describes the contents of data columns in reports of the
Offload Modeling
and
GPU Roofline Insights
perspectives.
# | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | XYZ

2 FPUs Active

Description:
Average percentage of time when both FPUs are used.
Collected
during the
Survey
analysis in the
GPU Roofline Insights
perspective.

Active (EU Array)

Description:
Percentage of cycles actively executing instructions on all execution units (EUs).
Collected
during the
Survey
analysis in the
GPU Roofline Insights
perspective.

Average Time

Description:
Average amount of time spent executing one task instance.
Collected
during the
Survey
analysis in the
GPU Roofline Insights
perspective.

B

C

Compute

Description:
Estimated execution time assuming an offloaded loop is bound only by compute throughput.
Collected
during the
Performance Modeling
analysis in the
Offload Modeling
perspective.

Computing Task

Description:
Name of a computing task.
Collected
during the
Survey
analysis in the
GPU Roofline Insights
perspective.

Computing Task Purpose

Description:
Action that a computing task performs.
Collected
during the
Survey
analysis in the
GPU Roofline Insights
perspective.

Computing Threads Started

Description:
Total number of threads started across all execution units for a computing task.
Collected
during the
Survey
analysis in the
GPU Roofline Insights
perspective.

D

Data Transfer Tax

Description:
Estimated time cost, in milliseconds, for transferring loop data between host and target platform.
Collected
during the
Performance Modeling
analysis in the
Offload Modeling
perspective.

Dependency Type (Measured)

Description:
Dependency type in a loop (if a loop is not parallel): assumed, proven, reduction.
Collected
during the
Survey
and
Dependencies
analyses in the
Offload Modeling
perspective.

Device (Estimated)

Description:
A target platform that application performance is modeled for.
Collected
during the
Performance Modeling
analysis in the
Offload Modeling
perspective.

Device (Measured)

Description:
A host platform that application is executed on.
Collected
during the
Survey
analysis in the
Offload Modeling
perspective.

DRAM BW

Description:
DRAM Bandwidth. Estimated execution time, in seconds, assuming an offloaded loop is bound only by DRAM memory throughput.
Collected
during the
Performance Modeling
analysis in the
Offload Modeling
perspective.

Elapsed Time

Description:
Wall-clock time from beginning to end of computing task execution.
Collected
during the
Survey
analysis in the
GPU Roofline Insights
perspective.

EU Threading Occupancy

Description:
Percentage of cycles on all execution units (EUs) and thread slots when a slot has a thread scheduled.
Collected
during the
Survey
analysis in the
GPU Roofline Insights
perspective.

FP AI

Description:
Ratio of FLOP to the number of transferred bytes.
Collected
during the
Characterization
analysis with FLOP collection enabled in the
GPU Roofline Insights
perspective.

From Target (Estimated Data Transfers)

Description:
Estimated data transferred from a target platform to a shared memory by a loop, in megabytes.
Collected
during the
Characterization
analysis with
Data Transfer
enabled in the
Offload Modeling
perspective.

GFLOP

Description:
Number of giga floating-point operations.
Collected
during the
Characterization
analysis with FLOP collection enabled in the
GPU Roofline Insights
perspective.

GFLOPS

Description:
Number of giga floating-point operations per second.
Collected
during the
Characterization
analysis with FLOP collection enabled in the
GPU Roofline Insights
perspective.

GINTOP

Description:
Number of giga integer operations.
Collected
during the
Characterization
analysis with FLOP collection enabled in the
GPU Roofline Insights
perspective.

GINTOPS

Description:
Number of giga integer operations per second.
Collected
during the
Characterization
analysis with FLOP collection enabled in the
GPU Roofline Insights
perspective.

Global Size (Estimated)

Description:
Total estimated number of work items in a loop executed after offloaded on a target platform.
Collected
during the
Performance Modeling
analysis with Trip Counts collection enabled in the
Offload Modeling
perspective.

Global Work Size

Description:
Total number of work items in all work groups.
Collected
during the
Survey
analysis in the
GPU Roofline Insights
perspective.

GPU Shader Atomics

Collected
during the
Survey
analysis in the
GPU Roofline Insights
perspective.

GPU Shader Barriers

Collected
during the
Survey
analysis in the
GPU Roofline Insights
perspective.

H

Idle (EU Array)

Description:
Percentage of cycles on all execution units (EUs), during which no threads are scheduled on a EU.
Collected
during the
Survey
analysis in the
GPU Roofline Insights
perspective.

Instances (Estimated)

Description:
Total estimated number of times a loop is executed after offloaded on a target platform.
Collected
during the
Performance Modeling
analysis with Trip Counts collection enabled in the
Offload Modeling
perspective.

Instance Count

Description:
Total number of times a task is executed.
Collected
during the
Survey
analysis in the
GPU Roofline Insights
perspective.

INT AI

Description:
Ratio of INTOP to the number of transferred bytes.
Collected
during the
Characterization
analysis with FLOP collection enabled in the
GPU Roofline Insights
perspective.

IPC Rate

Description:
Average rate of instructions per cycle (IPC) calculated for two FPU pipelines.
Collected
during the
Survey
analysis in the
GPU Roofline Insights
perspective.

J

Kernel Launch Tax

Description:
Total estimated time cost for invoking a kernel when offloading a loop to a target platform.
Does not include data transfer costs.
Collected
during the
Performance Modeling
analysis in the
Offload Modeling
perspective.

L3 BW

Description:
L3 Bandwidth. Estimated execution time, in seconds, assuming an offloaded loop is bound only by L3 cache throughput.
Collected
during the
Performance Modeling
analysis in the
Offload Modeling
perspective.

L3 Shader Bandwidth, GB/sec

Description:
Rate at which data is transferred between execution units and L3 caches, in gigabytes per second.
Collected
during the
Characterization
analysis in the
GPU Roofline Insights
perspective.

LLC BW

Description:
Last-Level Cache Bandwidth. Estimated execution time, in seconds, assuming an offloaded loop is bound only by last-level cache (LLC) throughput.
Collected
during the
Performance Modeling
analysis in the
Offload Modeling
perspective.

Local Size (Estimated)

Description:
Total estimated number of work items in one work group of a loop executed after offloaded on a target platform.
Collected
during the
Performance Modeling
analysis with Trip Counts collection enabled in the
Offload Modeling
perspective.

Local Work Size

Description:
Number of work items in one work group.
Collected
during the
Survey
analysis in the
GPU Roofline Insights
perspective.

Loop/Function

Description
: Name and source location of a loop/function in a region, where region is a sub-tree of loops/functions in a call tree.
Collected
during the
Survey
analysis in the
Offload Modeling
perspective.

M

N

Offload Actions

Description:
Recommendation that indicates if a loop is profitable for offloading to a target platform and if not, why. For not profitable loops, hover over
?
icon to see a reason why this loop is not recommended for offloading or cannot be modeled.
Collected
during the
Perfomance Modeling
analysis in the
Offload Modeling
perspective.

Parallel Threads

Description:
Estimated number of parallel threads in a loop offloaded.
Collected
during the
Performance Modeling
analysis in the
Offload Modeling
perspective
.

Private (Estimated Data Transfers)

Description:
Total estimated data transferred to a private memory from a target platform by a loop, in megabytes.
Collected
during the
Characterization
analysis with
Data Transfer
enabled in the
Offload Modeling
perspective.

Q

Read Estimated Data Transfers

Description:
Estimated data read from a target platform by an offload region, in megabytes.
Collected
during the
Characterization
analysis with
Data Transfer
enabled in the
Offload Modeling
perspective.

Read GPU Memory Bandwidth, GB/sec

Description:
Rate at which data is read from GPU, chip uncore (LLC), and main memory, in gigabytes per second.
Collected
during the
Characterization
analysis in the
GPU Roofline Insights
perspective.

Read Shared Local Memory Bandwidth, GB/sec

Description:
Rate at which data is read from shared local memory, in gigabytes per second.
Collected
during the
Characterization
analysis in the
GPU Roofline Insights
perspective.

Read Typed Memory Bandwidth, GB/sec

Description:
Rate at which data Is read from typed buffers, in gigabytes per second.
Collected
during the
Characterization
analysis in the
GPU Roofline Insights
perspective.

Read Untyped Memory Bandwidth, GB/sec

Description:
Rate at which data is read from untyped buffers, in gigabytes per second.
Collected
during the
Characterization
analysis in the
GPU Roofline Insights
perspective.

Send Active

Description:
Percentage of cycles on all execution units when EU Send pipeline is actively processed.
Collected
during the
Survey
analysis in the
GPU Roofline Insights
perspective.

SIMD Width

Description:
The number of work items processed by a single GPU thread.
Collected
during the
Survey
analysis in the
GPU Roofline Insights
perspective.

Stalled (EU Array)

Description:
Percentage of cycles on al execution units (EUs) when at least one thread is scheduled, but the EU is stalled.
Collected
during the
Survey
analysis in the
GPU Roofline Insights
perspective.

SVM Usage Type

Collected
during the
Survey
analysis in the
GPU Roofline Insights
perspective.

Speed-up

Description:
Estimated speed-up after a loop is offloaded to a target device, in comparison to the original elapsed time.
Collected
during the
Performance Modeling
analysis in the
Offload Modeling
perspective.

Taxes

Description:
The highest estimated time cost and a sum of all other costs for offloading a loop from host to a target platform.
Collected
during the
Performance Modeling
analysis in the
Offload Modeling
perspective.

Throughput

Description:
Top two estimated throughput time metrics that modeled loop performance is bounded by.
Collected
during the
Performance Modeling
analysis in the
Offload Modeling
perspective.

Time (Estimated)

Description:
Elapsed wall-clock time from beginning to end of loop execution measured on a host platform.
Collected
during the
Survey
analysis in the
Offload Modeling
perspective.

Time (Measured)

Description:
Estimated elapsed wall-clock time from beginning to end of loop execution estimated on a target platform after offloading.
Collected
during the
Performance Modeling
analysis in the
Offload Modeling
perspective.

To Target (Estimated Data Transfers)

Description:
Estimated data transferred to a target platform from a shared memory by a loop, in megabytes.
Collected
during the
Characterization
analysis with
Data Transfer
enabled in the
Offload Modeling
perspective.

ToFrom Target (Estimated Data Transfers)

Description:
Sum of estimated data transferred both to/from a shared memory to/from a target platform by a loop, in megabytes.
Collected
during the
Characterization
analysis with
Data Transfer
enabled in the O
Offload Modeling
perspective.

Total Estimated Data Transfers

Description:
Sum of the total estimated traffic incoming to a target platform and the total estimated traffic outgoing from the target platform, for an offload loop, in megabytes. It is calculated as
(MappedTo + MappedFrom + 2*MappedToFrom)
.
Collected
during the
Characterization
analysis with
Data Transfer
enabled in the
Offload Modeling
perspective.

Total Offload Tax

Description:
Total estimated time cost, in milliseconds, for offloading a loop to a target platform.
Collected
during the
Performance Modeling
analysis in the
Offload Modeling
perspective.

Total Time

Description:
Total amount of time spent executing a task.
Collected
during the
Survey
analysis in the
GPU Roofline Insights
perspective.

U

V

Write Estimated Data Transfers

Description:
Estimated data written to a target platform by a loop, in megabytes.
Collected
during the
Characterization
analysis with
Data Transfer
enabled in the
Offload Modeling
perspective.

Write GPU Memory Bandwidth, GB/sec

Description:
Rate at which data is written to GPU, chip uncore (LLC), and main memory, in gigabytes per second.
Collected
during the
Characterization
analysis in the
GPU Roofline Insights
perspective.

Write Shared Local Memory Bandwidth, GB/sec

Description:
Rate at which data is written to shared local memory, in gigabytes per second.
Collected
during the
Characterization
analysis in the
GPU Roofline Insights
perspective.

Write Typed Memory Bandwidth, GB/sec

Description:
Rate at which data Is read from typed buffers, in gigabytes per second.
Collected
during the
Characterization
analysis in the
GPU Roofline Insights
perspective.

Write Untyped Memory Bandwidth, GB/sec

Description:
Rate at which data is written to typed buffers, in gigabytes per second.
Collected
during the
Characterization
analysis in the
GPU Roofline Insights
perspective.

X, Y, Z

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.