Intel® VTune™ Amplifier

Input and Output Analysis

Use a platform-wide Input and Output analysis to monitor utilization of the disk and network subsystems, CPU and processor buses.

Note

This is a PREVIEW FEATURE on Windows* OS. A preview feature may or may not appear in a future production release. It is available for your use in the hopes that you will provide feedback on its usefulness and help determine its future. Data collected with a preview feature is not guaranteed to be backward compatible with future releases. Please send your feedback to parallel.studio.support@intel.com or to intelsystemstudio@intel.com.

The Input and Output analysis helps identify:

Depending on the selected configuration, the Input and Output analysis collects certain IO API metrics and explores your code from different perspectives:

System Disk IO API Metrics

This collection type uses the hardware event-based sampling collection and system-wide Ftrace* collection (for Linux* and Android* targets)/ETW collection (Windows* targets) to provide a consistent view of the storage sub-system combined with hardware events and an easy-to-use method to match user-level source code with I/O packets executed by the hardware.

Disk Input and Output Analysis

The analysis actively relies on the data produced by the kernel block driver system. In case your platform utilizes a non-standard block driver sub-system (for example, user-space storage drivers), IO metrics will not be available in the analysis type.

VTune Amplifier uses the following system-wide metrics for the disk I/O analysis:

SPDK IO API Metrics (Linux* Only)

DPDK IO API Metrics

To analyze core utilization by DPDK apps, consider extending the analysis to collect the Packet Rate and Packet Loss metrics (for example, with the custom collector).

Platform Metrics

For server platforms based on the Intel microarchitecture code name Sandy Bridge EP and newer, the Input and Output analysis provides an option to collect PCIe Bandwidth metrics that represent an amount of data transferred via the PCIe bus per second.

Note

The Outbound PCIe Bandwidth metric is supported for server systems starting with Intel microarchitecture code name Broadwell.

Starting with server platforms based on the Intel microarchitecture code name Skylake, PCIe Bandwidth metrics can be collected per-device. To have human-readable names of the PCIe devices, make sure to start the Input and Output analysis with root permissions.

Configure and Run Analysis

Prerequisites:

To run the Input and Output analysis:

  1. Click the (standalone GUI)/ (Visual Studio IDE) Configure Analysis button on the VTune Amplifier toolbar.

    The New Amplifier Result tab opens.

  2. From the HOW pane, click the Browse button and select Platform Analysis > Input and Output.

    The corresponding analysis configuration opens.

  3. Depending on you target app and analysis purpose, choose any of the following configuration options:

    Select IO API type to profile

    By default, the VTune Amplifier profiles System Disk IO API.

    For DPDK applications, select DPDK IO API.

    For SPDK applications, select SPDK IO API.

    Analyze PCIe bandwidth check box

    Collect the events required to compute PCIe bandwidth.

    This option is shown only on server platforms based on Intel microarchitecture code name Sandy Bridge EP and newer.

    The option is disabled by default, if applicable.

    Analyze memory bandwidth check box

    Collect the data required to compute memory bandwidth.

    The option is enabled by default.

    Evaluate max DRAM bandwidth check box

    Evaluate maximum achievable local DRAM bandwidth before the collection starts. This data is used to scale bandwidth metrics on the timeline and calculate thresholds.

    The option is enabled by default.

    Note

    SPDK and System Disk IO analysis cannot be run simultaneously.

  4. Click Start to run the analysis.

To run the Input and Output analysis from the command line, enter:

$ amplxe-cl -collect io [-knob <value>] -- <target> [target_options]

View Data

VTune Amplifier collects the data, generates an rxxxio result, and opens it in the default Input and Output viewpoint that displays statistics according to the selected configuration.

What's Next

For System Disk IO analysis, if you identified imbalance between I/O and compute operations, consider modifying your code to make I/O operations asynchronous. For I/O requests with long latency, check whether your data can be pre-loaded, written incrementally, or consider upgrading your storage device (to SSD, for example).

For SPDK IO analysis:

For DPDK application analysis, explore the following metrics:

See Also