Use Intel VTune Amplifier's Input and Output analysis to profile SPDK IO API, analyze PCIe traffic, and identify IO performance issues that may be caused by ineffective accesses to remote sockets, under-utilized throughput of an SPDK device, and others.
Prerequisites: for successful analysis, make sure SPDK is built using the --with-vtune option.
VTune Amplifier helps you optimize the following SPDK usage models:
SPDK vhost-scsi to provide optimized block storage to VMs
SPDK NVMe to optimize access to the locally attached storage
NVM Express* over Fabrics
For SPDK analysis, consider the following workflow:
Identify Low SPDK Throughput Utilization
Start your analysis with the Summary window that displays overall SPDK performance statistics per executed operation types. Expand an operation block to identify potential IO performance imbalance among SSDs:
Explore the SPDK Throughput histogram to understand how long your workload has been under-utilizing SPDK throughput per device:
Then, you can switch to the Bottom-up window and filter out the Timeline view by Low SPDK Throughput Utilization metric to see the correlation among the throughput under-utilization, SPDK IO API calls, and PCIe traffic breakdown per physical device:
Locate an area of recession (Low SPDK Throughput markers with a high duration) on the timeline and zoom in to see performance changes for IO communications (for example, drops for SPDK operations). Right-click and select the Filter In by Selection menu option:
When the Bottom-up view is filtered in, you can apply the Function grouping to the grid and identify functions executed at the selected time frame. Double-click a function with the highest CPU time value to dive to the source view and analyze the code.
Identify IO Misconfiguration Issues on Multi-Socket Systems
Use the Platform window to analyze whether your SPDK workload is configured properly for a multi-socket system. To do this, switch to the Package/Core/H/W Context grouping on the legend pane to track IO performance per package.
The example below illustrates an ineffective IO flow when an SPDK device and core consuming/producing data belong to different packages. As a result, you see high UPI Bandwidth values, which signals a heavy utilization of the interconnect: