Input and Output Analysis
- Imbalance between I/O and compute operations (HPC applications)
- Long latency of I/O requests (transactional workloads)
- Hardware utilization (streaming)
- Data plane utilization (applications supporting DPDK framework). You can analyze how your application utilizes NIC ports, bandwidth, PCIe, and UPI.
- I/O performance issues that may be caused by ineffective accesses to remote sockets or under-utilized throughput of an SPDK device
System Disk IO API Metrics
- I/O Waitsystem-wide metric (Linux* targets only) shows the time when system cores are idle but there are threads in a context switch caused by I/O access.
- I/O Queue Depthmetric shows the number of I/O requests submitted to the storage device. Zero requests in a queue means that there are no requests scheduled and disk is not used at all.
- I/O Data Transfermetric shows the number of bytes read from or written to the storage.
- Page Faultsmetric (Linux targets only) shows the number of page faults occurred on a system. This metric is useful when analyzing access to memory mapped files.
- CPU Activitymetric defines a portion of time the system spent in the following states:
- Idlestate - the CPU core is idle.
- Activestate - the CPU core is executing a thread.
- I/O Wait(Linux targets only) - the CPU core is idle but there is a thread, blocked by an access to the disk, that could be potentially executed on this core.
SPDK IO API Metrics (Linux* Only)
- SPDK Throughput Utilizationmetric helps identify under-utilization of the SPDK device throughput. You can use the Timeline view to correlate areas of the low SPDK throughput utilization with SPDK IO API calls and PCIe traffic breakdown and understand whether IO communications caused performance changes.
- SPDK Effective Timemetric shows the amount of time the SPDK effectively interacts with devices.
DPDK IO API Metrics
- DPDK Rx Spin Timemetric shows (on a per-thread basis) a portion ofrte_eth_rx_burst(...)function calls that return zero packets, which is identical to the fraction of polling loop iterations that provide no packets:
Configure and Run Analysis
- Create aVTuneproject and specify your analysis system and target (application, process, or system). Note that irrespective of the target type you select, theProfilerVTuneautomatically enables theProfilerAnalyze system-widetarget option to collect system-wide metrics for the Input and Output analysis.
- For System Disk IO analysis:
- On Linux*: To collect system Disk I/O metrics, theVTuneenables FTrace* collection that requires access toProfilerdebugfs. On some systems, this requirement makes you reconfigure your permissions with running the script located in thebindirectory, or use root privileges.
- For SPDK IO analysis: Make sure SPDK is built using the--with-vtuneoption.
- Click the (standalone GUI)/ (Visual Studio IDE)Configure Analysisbutton on theVTunetoolbar.ProfilerTheConfigure Analysiswindow opens.
- From theHOWpane, click the Browse button and selectPlatform Analysis > Input and Output.The corresponding analysis configuration opens.
- Depending on you target app and analysis purpose, choose any of the following configuration options:Select IO API type to profileBy default, theVTuneprofilesProfilerSystem Disk IO API.For DPDK applications, selectDPDK IO API.For SPDK applications, selectSPDK IO API.Analyze PCIe bandwidthcheck boxCollect the events required to compute PCIe bandwidth.This option is shown only on server platforms based on Intel microarchitecture code name Sandy Bridge EP and newer.The option is disabled by default, if applicable.Analyze memory bandwidthcheck boxCollect the data required to compute memory bandwidth.The option is enabled by default.Evaluate max DRAM bandwidthcheck boxEvaluate maximum achievable local DRAM bandwidth before the collection starts. This data is used to scale bandwidth metrics on the timeline and calculate thresholds.The option is enabled by default.SPDK and System Disk IO analysis cannot be run simultaneously.
- Identify low SPDK throughput utilization
- Identify I/O misconfiguration issues on multi-socket systems
- Analyze theRx Spin TimeandRx Batch Statisticsto characterize core utilization in terms of packet receiving.
- Analyze thePCIe Bandwidthto estimate the inbound and outbound NIC traffic.
- Ensure theDRAM Bandwidthvalue is low enough, which means that Intel® Data Direct I/O Technology (Intel DDIO) and Last Level Cache work properly.