User Guide

Contents

SPDK IO Data View

Use
Intel VTune
Profiler
's Input and Output analysis to profile SPDK IO API, analyze PCIe traffic, and identify IO performance issues that may be caused by ineffective accesses to remote sockets, under-utilized throughput of an SPDK device, and others.
Prerequisites:
for successful analysis, make sure SPDK is built using the
--with-vtune
option.
VTune
Profiler
helps you optimize the following SPDK usage models:
  • Application services:
    • SPDK
      vhost-scsi
      to provide optimized block storage to VMs
    • SPDK NVMe to optimize access to the locally attached storage
  • Disaggregated storage:
    • NVM Express* over Fabrics
    • iSCSI targets
For SPDK analysis, consider the following workflow:

Identify Low SPDK Throughput Utilization

Start your analysis with the Summary window that displays overall SPDK performance statistics per executed operation types. Expand an operation block to identify potential IO performance imbalance among SSDs:
Explore the
SPDK Throughput
histogram to understand how long your workload has been under-utilizing SPDK throughput per device:
Then, you can switch to the
Bottom-up
window and filter out the Timeline view by
Low SPDK Throughput Utilization
metric to see the correlation among the throughput under-utilization, SPDK IO API calls, and PCIe traffic breakdown per physical device:
Locate an area of recession (
Low SPDK Throughput
markers with a high duration) on the timeline and zoom in to see performance changes for IO communications (for example, drops for SPDK operations). Right-click and select the
Filter In by Selection
menu option:
When the
Bottom-up
view is filtered in, you can apply the
Function
grouping to the grid and identify functions executed at the selected time frame. Double-click a function with the highest CPU time value to dive to the source view and analyze the code.

Identify IO Misconfiguration Issues on Multi-Socket Systems

Use the
Platform
window to analyze whether your SPDK workload is configured properly for a multi-socket system. To do this, switch to the
Package/Physical Core/Logical Core
grouping on the legend pane to track IO performance per package.
The example below illustrates an ineffective IO flow when an SPDK device and core consuming/producing data belong to different packages. As a result, you see high
UPI Bandwidth
values, which signals a heavy utilization of the interconnect:

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804