User Guide

Contents

Input and Output Analysis

Use a platform-wide Input and Output analysis to monitor utilization of the disk and network subsystems, CPU and processor buses.
The Input and Output analysis type has been deprecated on Windows* OS following the
Intel® VTune™
Profiler
Update 2 release.
The Input and Output analysis helps identify:
  • Imbalance between I/O and compute operations (HPC applications)
  • Long latency of I/O requests (transactional workloads)
  • Hardware utilization (streaming)
  • Data plane utilization (applications supporting DPDK framework). You can analyze how your application utilizes NIC ports, bandwidth, PCIe, and UPI.
  • I/O performance issues that may be caused by ineffective accesses to remote sockets or under-utilized throughput of an SPDK device
Depending on the selected configuration, the Input and Output analysis collects certain IO API metrics and explores your code from different perspectives:

System Disk IO API Metrics

This collection type uses the hardware event-based sampling collection and system-wide Ftrace* collection (for Linux* and Android* targets)/ETW collection (Windows* targets) to provide a consistent view of the storage sub-system combined with hardware events and an easy-to-use method to match user-level source code with I/O packets executed by the hardware.
Disk Input and Output Analysis
The analysis actively relies on the data produced by the kernel block driver system. In case your platform utilizes a non-standard block driver sub-system (for example, user-space storage drivers), IO metrics will not be available in the analysis type.
VTune
Profiler
uses the following system-wide metrics for the disk I/O analysis:
  • I/O Wait
    system-wide metric (Linux* targets only) shows the time when system cores are idle but there are threads in a context switch caused by I/O access.
  • I/O Queue Depth
    metric shows the number of I/O requests submitted to the storage device. Zero requests in a queue means that there are no requests scheduled and disk is not used at all.
  • I/O Data Transfer
    metric shows the number of bytes read from or written to the storage.
  • Page Faults
    metric (Linux targets only) shows the number of page faults occurred on a system. This metric is useful when analyzing access to memory mapped files.
  • CPU Activity
    metric defines a portion of time the system spent in the following states:
    • Idle
      state - the CPU core is idle.
    • Active
      state - the CPU core is executing a thread.
    • I/O Wait
      (Linux targets only) - the CPU core is idle but there is a thread, blocked by an access to the disk, that could be potentially executed on this core.

SPDK IO API Metrics (Linux* Only)

  • SPDK Throughput Utilization
    metric helps identify under-utilization of the SPDK device throughput. You can use the Timeline view to correlate areas of the low SPDK throughput utilization with SPDK IO API calls and PCIe traffic breakdown and understand whether IO communications caused performance changes.
  • SPDK Effective Time
    metric shows the amount of time the SPDK effectively interacts with devices.

DPDK IO API Metrics

  • DPDK Rx Spin Time
    metric shows (on a per-thread basis) a portion of
    rte_eth_rx_burst(...)
    function calls that return zero packets, which is identical to the fraction of polling loop iterations that provide no packets:
To analyze core utilization by DPDK apps, consider extending the analysis to collect the
Packet Rate
and
Packet Loss
metrics (for example, with the custom collector).

Platform Metrics

For server platforms based on the Intel microarchitecture code name Sandy Bridge EP and newer, the Input and Output analysis provides an option to collect
PCIe Bandwidth
metrics that represent an amount of data transferred via the PCIe bus per second.
The
Outbound PCIe Bandwidth
metric is supported for server systems starting with Intel microarchitecture code name Broadwell.
Starting with server platforms based on the Intel microarchitecture code name Skylake, PCIe Bandwidth metrics can be collected per-device. To have human-readable names of the PCIe devices, make sure to start the Input and Output analysis with root permissions.
For systems based on Intel microarchitectures code named Skylake and Cascade Lake,
VTune
Profiler
enables you to locate functions that potentially perform MMIO reads. Enabling the collection of PCIe bandwidth by selecting the
Analyze PCIe bandwidth
checkbox in the
Details
panel of the analysis configuration window collects all uncacheable reads that may not necessarily target MMIO space. You can then use the
Event Count
tab to sort by the
MEM_LOAD_MISC_RETIRED.UC_PS
event to locate the functions that potentially perform MMIO reads.

Configure and Run Analysis

Prerequisites:
  • Create a
    VTune
    Profiler
    project
    and specify your analysis system and target (application, process, or system). Note that irrespective of the target type you select, the
    VTune
    Profiler
    automatically enables the
    Analyze system-wide
    target option to collect system-wide metrics for the Input and Output analysis.
  • For System Disk IO analysis:
    • On Linux*: To collect system Disk I/O metrics, the
      VTune
      Profiler
      enables FTrace* collection that requires access to
      debugfs
      . On some systems, this requirement makes you reconfigure your permissions with running the script located in the
      bin
      directory, or use root privileges.
    • On Windows*: Administrative privileges are required to collect system Disk I/O API metrics.
  • For SPDK IO analysis: Make sure SPDK is built using the
    --with-vtune
    option.
To run the Input and Output analysis:
  1. Click the (standalone GUI)/ (Visual Studio IDE)
    Configure Analysis
    button on the
    VTune
    Profiler
    toolbar.
    The
    Configure Analysis
    window opens.
  2. From the
    HOW
    pane, click the Browse button and select
    Platform Analysis > Input and Output
    .
    The corresponding analysis configuration opens.
  3. Depending on you target app and analysis purpose, choose any of the following configuration options:
    Select IO API type to profile
    By default, the
    VTune
    Profiler
    profiles
    System Disk IO API
    .
    For DPDK applications, select
    DPDK IO API
    .
    For SPDK applications, select
    SPDK IO API
    .
    Analyze PCIe bandwidth
    check box
    Collect the events required to compute PCIe bandwidth.
    This option is shown only on server platforms based on Intel microarchitecture code name Sandy Bridge EP and newer.
    The option is disabled by default, if applicable.
    Analyze memory bandwidth
    check box
    Collect the data required to compute memory bandwidth.
    The option is enabled by default.
    Evaluate max DRAM bandwidth
    check box
    Evaluate maximum achievable local DRAM bandwidth before the collection starts. This data is used to scale bandwidth metrics on the timeline and calculate thresholds.
    The option is enabled by default.
    SPDK and System Disk IO analysis cannot be run simultaneously.
  4. Click
    Start
    to run the analysis.
To run the Input and Output analysis from the command line, enter:
$
vtune
-collect io [-knob <
value
>] -- <
target
> [
target_options
]

View Data

VTune
Profiler
collects the data, generates an
rxxxio
result, and opens it in the default
Input and Output
viewpoint that displays statistics according to the selected configuration.

What's Next

For System Disk IO analysis, if you identified imbalance between I/O and compute operations, consider modifying your code to make I/O operations asynchronous. For I/O requests with long latency, check whether your data can be pre-loaded, written incrementally, or consider upgrading your storage device (to SSD, for example).
For SPDK IO analysis:
  • Identify low SPDK throughput utilization
  • Identify I/O misconfiguration issues on multi-socket systems
For DPDK application analysis, explore the following metrics:
  • Analyze the
    Rx Spin Time
    and
    Rx Batch Statistics
    to characterize core utilization in terms of packet receiving.
  • Analyze the
    PCIe Bandwidth
    to estimate the inbound and outbound NIC traffic.
  • Ensure the
    DRAM Bandwidth
    value is low enough, which means that Intel® Data Direct I/O Technology (Intel DDIO) and Last Level Cache work properly.
  • Analyze
    UPI Bandwidth
    on the multi-socket systems for potential misconfiguration problems.

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804