Accelerator Metrics
This reference section describes the contents of data columns in reports of the
Offload Modeling
and
GPU Roofline Insights
perspectives.
2 FPUs Active
Description:
Average percentage of time when both FPUs are used.
Active (EU Array)
Description:
Percentage of cycles actively executing instructions on all execution units (EUs).
Average Time
Description:
Average amount of time spent executing one task instance.
B
C
Compute
Description:
Estimated execution time assuming an offloaded loop is bound only by compute throughput.
Computing Task
Description:
Name of a computing task.
Computing Task Purpose
Description:
Action that a computing task performs.
Computing Threads Started
Description:
Total number of threads started across all execution units for a computing task.
D
Data Transfer Tax
Description:
Estimated time cost, in milliseconds, for transferring loop data between host and target platform.
Dependency Type (Measured)
Description:
Dependency type in a loop (if a loop is not parallel): assumed, proven, reduction.
Device (Estimated)
Description:
A target platform that application performance is modeled for.
Device (Measured)
Description:
A host platform that application is executed on.
DRAM BW
Description:
DRAM Bandwidth. Estimated execution time, in seconds, assuming an offloaded loop is bound only by DRAM memory throughput.
Elapsed Time
Description:
Wall-clock time from beginning to end of computing task execution.
EU Threading Occupancy
Description:
Percentage of cycles on all execution units (EUs) and thread slots when a slot has a thread scheduled.
FP AI
Description:
Ratio of FLOP to the number of transferred bytes.
Collected
during the
Characterization
analysis with FLOP collection enabled in the
GPU Roofline Insights
perspective.
From Target (Estimated Data Transfers)
Description:
Estimated data transferred from a target platform to a shared memory by a loop, in megabytes.
Collected
during the
Characterization
analysis with
Data Transfer
enabled in the
Offload Modeling
perspective.
GFLOP
Description:
Number of giga floating-point operations.
Collected
during the
Characterization
analysis with FLOP collection enabled in the
GPU Roofline Insights
perspective.
GFLOPS
Description:
Number of giga floating-point operations per second.
Collected
during the
Characterization
analysis with FLOP collection enabled in the
GPU Roofline Insights
perspective.
GINTOP
Description:
Number of giga integer operations.
Collected
during the
Characterization
analysis with FLOP collection enabled in the
GPU Roofline Insights
perspective.
GINTOPS
Description:
Number of giga integer operations per second.
Collected
during the
Characterization
analysis with FLOP collection enabled in the
GPU Roofline Insights
perspective.
Global Size (Estimated)
Description:
Total estimated number of work items in a loop executed after offloaded on a target platform.
Collected
during the
Performance Modeling
analysis with Trip Counts collection enabled in the
Offload Modeling
perspective.
Global Work Size
Description:
Total number of work items in all work groups.
GPU Shader Atomics
GPU Shader Barriers
H
Idle (EU Array)
Description:
Percentage of cycles on all execution units (EUs), during which no threads are scheduled on a EU.
Instances (Estimated)
Description:
Total estimated number of times a loop is executed after offloaded on a target platform.
Collected
during the
Performance Modeling
analysis with Trip Counts collection enabled in the
Offload Modeling
perspective.
Instance Count
Description:
Total number of times a task is executed.
INT AI
Description:
Ratio of INTOP to the number of transferred bytes.
Collected
during the
Characterization
analysis with FLOP collection enabled in the
GPU Roofline Insights
perspective.
IPC Rate
Description:
Average rate of instructions per cycle (IPC) calculated for two FPU pipelines.
J
Kernel Launch Tax
Description:
Total estimated time cost for invoking a kernel when offloading a loop to a target platform.
Does not include data transfer costs.
L3 BW
Description:
L3 Bandwidth. Estimated execution time, in seconds, assuming an offloaded loop is bound only by L3 cache throughput.
L3 Shader Bandwidth, GB/sec
Description:
Rate at which data is transferred between execution units and L3 caches, in gigabytes per second.
LLC BW
Description:
Last-Level Cache Bandwidth. Estimated execution time, in seconds, assuming an offloaded loop is bound only by last-level cache (LLC) throughput.
Local Size (Estimated)
Description:
Total estimated number of work items in one work group of a loop executed after offloaded on a target platform.
Collected
during the
Performance Modeling
analysis with Trip Counts collection enabled in the
Offload Modeling
perspective.
Local Work Size
Description:
Number of work items in one work group.
Loop/Function
Description
: Name and source location of a loop/function in a region, where region is a sub-tree of loops/functions in a call tree.
M
N
Offload Actions
Description:
Recommendation that indicates if a loop is profitable for offloading to a target platform and if not, why. For not profitable loops, hover over
?
icon to see a reason why this loop is not recommended for offloading or cannot be modeled.
Parallel Threads
Description:
Estimated number of parallel threads in a loop offloaded.
Private (Estimated Data Transfers)
Description:
Total estimated data transferred to a private memory from a target platform by a loop, in megabytes.
Collected
during the
Characterization
analysis with
Data Transfer
enabled in the
Offload Modeling
perspective.
Q
Read Estimated Data Transfers
Description:
Estimated data read from a target platform by an offload region, in megabytes.
Collected
during the
Characterization
analysis with
Data Transfer
enabled in the
Offload Modeling
perspective.
Read GPU Memory Bandwidth, GB/sec
Description:
Rate at which data is read from GPU, chip uncore (LLC), and main memory, in gigabytes per second.
Read Shared Local Memory Bandwidth, GB/sec
Description:
Rate at which data is read from shared local memory, in gigabytes per second.
Read Typed Memory Bandwidth, GB/sec
Description:
Rate at which data Is read from typed buffers, in gigabytes per second.
Read Untyped Memory Bandwidth, GB/sec
Description:
Rate at which data is read from untyped buffers, in gigabytes per second.
Send Active
Description:
Percentage of cycles on all execution units when EU Send pipeline is actively processed.
SIMD Width
Description:
The number of work items processed by a single GPU thread.
Stalled (EU Array)
Description:
Percentage of cycles on al execution units (EUs) when at least one thread is scheduled, but the EU is stalled.
SVM Usage Type
Speed-up
Description:
Estimated speed-up after a loop is offloaded to a target device, in comparison to the original elapsed time.
Taxes
Description:
The highest estimated time cost and a sum of all other costs for offloading a loop from host to a target platform.
Throughput
Description:
Top two estimated throughput time metrics that modeled loop performance is bounded by.
Time (Estimated)
Description:
Elapsed wall-clock time from beginning to end of loop execution measured on a host platform.
Time (Measured)
Description:
Estimated elapsed wall-clock time from beginning to end of loop execution estimated on a target platform after offloading.
To Target (Estimated Data Transfers)
Description:
Estimated data transferred to a target platform from a shared memory by a loop, in megabytes.
Collected
during the
Characterization
analysis with
Data Transfer
enabled in the
Offload Modeling
perspective.
ToFrom Target (Estimated Data Transfers)
Description:
Sum of estimated data transferred both to/from a shared memory to/from a target platform by a loop, in megabytes.
Collected
during the
Characterization
analysis with
Data Transfer
enabled in the
OOffload Modeling
perspective.
Total Estimated Data Transfers
Description:
Sum of the total estimated traffic incoming to a target platform and the total estimated traffic outgoing from the target platform, for an offload loop, in megabytes. It is calculated as
(MappedTo + MappedFrom + 2*MappedToFrom)
.
Collected
during the
Characterization
analysis with
Data Transfer
enabled in the
Offload Modeling
perspective.
Total Offload Tax
Description:
Total estimated time cost, in milliseconds, for offloading a loop to a target platform.
Total Time
Description:
Total amount of time spent executing a task.
U
V
Write Estimated Data Transfers
Description:
Estimated data written to a target platform by a loop, in megabytes.
Collected
during the
Characterization
analysis with
Data Transfer
enabled in the
Offload Modeling
perspective.
Write GPU Memory Bandwidth, GB/sec
Description:
Rate at which data is written to GPU, chip uncore (LLC), and main memory, in gigabytes per second.
Write Shared Local Memory Bandwidth, GB/sec
Description:
Rate at which data is written to shared local memory, in gigabytes per second.
Write Typed Memory Bandwidth, GB/sec
Description:
Rate at which data Is read from typed buffers, in gigabytes per second.
Write Untyped Memory Bandwidth, GB/sec
Description:
Rate at which data is written to typed buffers, in gigabytes per second.