Input and Output Analysis

Use a platform-wide Input and Output analysis to monitor utilization of the disk and network subsystems, CPU and processor buses.

Note

This is a PREVIEW FEATURE on Windows* OS. A preview feature may or may not appear in a future production release. It is available for your use in the hopes that you will provide feedback on its usefulness and help determine its future. Data collected with a preview feature is not guaranteed to be backward compatible with future releases. Please send your feedback to parallel.studio.support@intel.com or to intelsystemstudio@intel.com.

The Input and Output analysis helps identify:

  • Imbalance between I/O and compute operations (HPC applications)

  • Long latency of I/O requests (transactional workloads)

  • Hardware utilization (streaming)

  • Data plane utilization (applications supporting DPDK framework). You can analyze how your application utilizes NIC ports, bandwidth, PCIe, and UPI.

  • I/O performance issues that may be caused by ineffective accesses to remote sockets or under-utilized throughput of an SPDK device

Depending on the selected configuration, the Input and Output analysis collects certain IO API metrics and explores your code from different perspectives:

System Disk IO API Metrics

This collection type uses the hardware event-based sampling collection and system-wide Ftrace* collection (for Linux* and Android* targets)/ETW collection (Windows* targets) to provide a consistent view of the storage sub-system combined with hardware events and an easy-to-use method to match user-level source code with I/O packets executed by the hardware.

Disk Input and Output Analysis

The analysis actively relies on the data produced by the kernel block driver system. In case your platform utilizes a non-standard block driver sub-system (for example, user-space storage drivers), IO metrics will not be available in the analysis type.

VTune Amplifier uses the following system-wide metrics for the disk I/O analysis:

  • I/O Wait system-wide metric (Linux* targets only) shows the time when system cores are idle but there are threads in a context switch caused by I/O access.

  • I/O Queue Depth metric shows the number of I/O requests submitted to the storage device. Zero requests in a queue means that there are no requests scheduled and disk is not used at all.

  • I/O Data Transfer metric shows the number of bytes read from or written to the storage.

  • Page Faults metric (Linux targets only) shows the number of page faults occurred on a system. This metric is useful when analyzing access to memory mapped files.

  • CPU Activity metric defines a portion of time the system spent in the following states:

    • Idle state - the CPU core is idle.

    • Active state - the CPU core is executing a thread.

    • I/O Wait (Linux targets only) - the CPU core is idle but there is a thread, blocked by an access to the disk, that could be potentially executed on this core.

SPDK IO API Metrics (Linux* Only)

  • SPDK Throughput Utilization metric helps identify under-utilization of the SPDK device throughput. You can use the Timeline view to correlate areas of the low SPDK throughput utilization with SPDK IO API calls and PCIe traffic breakdown and understand whether IO communications caused performance changes.

  • SPDK Effective Time metric shows the amount of time the SPDK effectively interacts with devices.

DPDK IO API Metrics

  • DPDK Rx Spin Time metric shows (on a per-thread basis) a portion of rte_eth_rx_burst(...) function calls that return zero packets, which is identical to the fraction of polling loop iterations that provide no packets:

To analyze core utilization by DPDK apps, consider extending the analysis to collect the Packet Rate and Packet Loss metrics (for example, with the custom collector).

Platform Metrics

For server platforms based on the Intel microarchitecture code name Sandy Bridge EP and newer, the Input and Output analysis provides an option to collect PCIe Bandwidth metrics that represent an amount of data transferred via the PCIe bus per second.

Note

The Outbound PCIe Bandwidth metric is supported for server systems starting with Intel microarchitecture code name Broadwell.

Starting with server platforms based on the Intel microarchitecture code name Skylake, PCIe Bandwidth metrics can be collected per-device. To have human-readable names of the PCIe devices, make sure to start the Input and Output analysis with root permissions.

Configure and Run Analysis

Prerequisites:

  • Create a VTune Amplifier project and specify your analysis system and target (application, process, or system). Note that irrespective of the target type you select, the VTune Amplifier automatically enables the Analyze system-wide target option to collect system-wide metrics for the Input and Output analysis.

  • For System Disk IO analysis:

    • On Linux*: To collect system Disk I/O metrics, the VTune Amplifier enables FTrace* collection that requires access to debugfs. On some systems, this requirement makes you reconfigure your permissions with running the prepare_debugfs.sh script located in the bin directory, or use root privileges.

    • On Windows*: Administrative privileges are required to collect system Disk I/O API metrics.

  • For SPDK IO analysis: Make sure SPDK is built using the --with-vtune option.

To run the Input and Output analysis:

  1. Click the (standalone GUI)/ (Visual Studio IDE) Configure Analysis button on the VTune Amplifier toolbar.

    The New Amplifier Result tab opens.

  2. From the HOW pane, click the Browse button and select Platform Analysis > Input and Output.

    The corresponding analysis configuration opens.

  3. Depending on you target app and analysis purpose, choose any of the following configuration options:

    Select IO API type to profile

    By default, the VTune Amplifier profiles System Disk IO API.

    For DPDK applications, select DPDK IO API.

    For SPDK applications, select SPDK IO API.

    Analyze PCIe bandwidth check box

    Collect the events required to compute PCIe bandwidth.

    This option is shown only on server platforms based on Intel microarchitecture code name Sandy Bridge EP and newer.

    The option is disabled by default, if applicable.

    Analyze memory bandwidth check box

    Collect the data required to compute memory bandwidth.

    The option is enabled by default.

    Evaluate max DRAM bandwidth check box

    Evaluate maximum achievable local DRAM bandwidth before the collection starts. This data is used to scale bandwidth metrics on the timeline and calculate thresholds.

    The option is enabled by default.

    Note

    SPDK and System Disk IO analysis cannot be run simultaneously.

  4. Click Start to run the analysis.

To run the Input and Output analysis from the command line, enter:

$ amplxe-cl -collect io [-knob <value>] -- <target> [target_options]

View Data

VTune Amplifier collects the data, generates an rxxxio result, and opens it in the default Input and Output viewpoint that displays statistics according to the selected configuration.

What's Next

For System Disk IO analysis, if you identified imbalance between I/O and compute operations, consider modifying your code to make I/O operations asynchronous. For I/O requests with long latency, check whether your data can be pre-loaded, written incrementally, or consider upgrading your storage device (to SSD, for example).

For SPDK IO analysis:

  • Identify low SPDK throughput utilization

  • Identify I/O misconfiguration issues on multi-socket systems

For DPDK application analysis, explore the following metrics:

  • Analyze the Rx Spin Time and Rx Batch Statistics to characterize core utilization in terms of packet receiving.

  • Analyze the PCIe Bandwidth to estimate the inbound and outbound NIC traffic.

  • Ensure the DRAM Bandwidth value is low enough, which means that Intel® Data Direct I/O Technology (Intel DDIO) and Last Level Cache work properly.

  • Analyze UPI Bandwidth on the multi-socket systems for potential misconfiguration problems.

For more complete information about compiler optimizations, see our Optimization Notice.
Select sticky button color: 
Orange (only for download buttons)