User Guide

Contents

Analyze Linux Kernel I/O

Use the Input and Output analysis of
Intel® VTune™
Profiler
to match user-level code to I/O operations executed by the hardware.
This collection mode uses hardware event-based sampling collection and system-wide Ftrace* collection to provide a consistent view of the storage system combined with hardware events, as well as an easy-to-use method to match user-level source code to I/O operations executed by the hardware.
This analysis actively relies on the data provided by the kernel block driver sub-system. If your platform utilizes a non-standard block driver sub-system, such as in the case of using user-space storage drivers, I/O metrics will not be available in this analysis type.
VTune
Profiler
provides the following system-wide metrics for the kernel I/O analysis:
  • I/O Wait
    — this system-wide metric represents the amount of time during which the CPU cores were idle due to threads being in an I/O wait state.
  • I/O Queue Depth
    — this metric shows the number of I/O requests submitted to the storage device. If the number of requests in a queue is zero, this means that there are no requests scheduled, and the disk is not utilized at all.
  • I/O Data Transfer
    — this metric shows the number of bytes read from or written to the storage device(s).
  • Page Faults
    — this metric shows the number of page faults that have occurred on the system. It is particularly useful when analyzing access to memory-mapped files.
  • CPU Activity
    — this metric represents the portion of time the system spent in one of the following states:
    • Idle
      state — the CPU core is idle
    • Active
      state — the CPU core is executing a thread
    • I/O Wait
      — the CPU core is idle, but there is a thread that could potentially be executed on this core that is blocked by disk access.
All I/O metrics collected by
VTune
Profiler
, such as
I/O Wait Time
,
I/O Waits
, and
I/O Queue Depth
, are collected in a system-wide mode and are not target-specific.

Analyze I/O Wait Time

To analyze
I/O Wait Time
, start with the
Summary
window. This window provides a quick overview of the target system performance and introduces the
I/O Wait Time
metric that helps you identify whether your application is I/O-bound:
The
I/O Wait Time
metric represents a portion of time during which the threads are in I/O wait state while the system has cores in idle state. In this case, the number of threads is not greater than the number of idling cores. This aggregated
I/O Wait Time
metric is an integral function of the
I/O Wait
metric that is available in the
Timeline
pane of the
Bottom-up
window.
To estimate how quickly storage requests are served by the kernel sub-system, see the
Disk Input and Output Histogram
. Use the
Operation Type
drop-down menu to select the type of I/O operation you are interested in. For example, for I/O writes, 2-4 storage requests executed within 0.06 seconds or more are classified as slow by
VTune
Profiler
:
To explore this type of I/O request in greater detail, switch to the
Bottom-up
window.

Analyze Slow I/O Requests

In the
Bottom-up
window, select an area of interest on the timeline, then use the
Zoom In and Filter by Selection
context menu option. The
Summary
histogram is updated to show the data for the selected time range.
For example, in this case, there were 2-4 slow write requests executed during the 6th second of application execution:
By zooming in on an area of interest, you can get a closer look at different metrics and understand the reason behind high I/O wait time.
VTune
Profiler
collects the
I/O Wait
type of context switches caused by I/O accesses from the thread, and provides a system-wide
I/O Wait
metric in the
CPU Activity
area. Use this data to identify imbalance between I/O and compute operations.
System-wide
I/O Wait
shows the time during which the system cores were idle, but there were threads in a context switch due to I/O access. Use this metric to estimate the dependency of performance on the storage medium.
For example, an
I/O Wait
value of 100% means that all cores of the system are idle, but there are threads blocked by I/O requests. To solve this issue, change the logic of the application to run compute threads in parallel with I/O tasks. Alternatively, consider using faster storage.
An
I/O Wait
value of 0% could mean one of the following:
  • Regardless of the number of threads blocked on storage access, all CPU cores are actively executing application code.
  • No threads are blocked on storage access.
Explore the
I/O Queue Depth
area to see thee number of storage requests submitted to the storage device. Spikes correspond to the maximum number of requests. Zero-value gaps on the
I/O Queue Depth
chart correspond to points in application run when storage was not utilized at all.
To identify the exact points in time when slow I/O packets were scheduled for execution, enable the
Slow
markers for the
I/O Queue Depth
metric:
To identify points of high bandwidth, analyze the
I/O Data Transfer
area that shows thee number of bytes read from or written to the storage device.

Analyze Call Stack for I/O Functions

VTune
Profiler
instruments all user-space I/O functions. This enables you to correlate slow I/O requests with instrumented user-space activities. You can do that by examining the full call stack that points to the exact API invocation.
To view a
Task Time
call stack for a particular I/O call, select the required
I/O API
marker on the timeline and explore the stack in the
Call Stack
pane:

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.