Core Utilization in DPDK Apps
- Application: a DPDKtestpmdapp running on one core and performing L2 forwarding. The application is compiled against DPDK with the VTune Amplifier profiling enabled.
- DPDK with VTune Amplifier profiling support enabled. VTune Amplifier profiling support is integrated into DPDK since version 18.11. When using earlier versions, apply the attached patches (available for versions 17.11, 18.02, and 18.05). To enable profiling on the DPDK side, enable the VTune Amplifier to attach to the DPDK polling cycle. For this, reconfigure and recompile the DPDK (and the target application) with theCONFIG_RTE_ETHDEV_RXTX_CALLBACKSandCONFIG_RTE_ETHDEV_PROFILE_WITH_VTUNEflags enabled (located in theconfig/common_base configfile).
- Intel® VTune™ Amplifier 2019: Input and Output analysis
- All the Cookbook recipes are scalable and can be applied to Intel VTune Amplifier 2018 and higher. Slight version-specific configuration changes are possible.
- Intel® VTune™ Amplifier has been renamed to Intel® VTune™ Profiler starting with its version for Intel® oneAPI Base Toolkit (Beta). You can still use a standalone version of the VTune Profiler, or its versions integrated into Intel Parallel Studio XE or Intel System Studio.
- Operating system: Test system that consists of the traffic generator (GEN in the picture below) providing 64-byte frames and packet receiver (SUT - system under test), connected via 40 GbE link. The SUT performs L2 forwarding of packets.
- CPU: Intel® Xeon® Platinum 8180 (38.5M Cache, 2.5 GHz, 28 cores)
Run Input and Output Analysis
amplxe-cl -collect io -knob kernel-stack=false -knob dpdk=true -knob collect-pcie-bandwidth=true -knob collect-memory-bandwidth=false -knob dram-bandwidth-limits=false --target-process=testpmd
Analyze Core Utilization with the DPDK Rx Spin Time Metric
Analyze Packets Retrieval with DPDK Rx Batch Statistics Histogram
Understand Rx Operations and Investigate Rx Peaks
- 4 x 32 Bytedescriptors or8 x 16 Bytedescriptors are completed.
- A descriptor is invalidated in the internal NIC cache.
- 32 Byte Rx descriptor: Most ofrte_eth_rx_burst()calls receive 4 packets.
- 16 Byte Rx descriptor: Most ofrte_eth_rx_burst()calls receive 8 packets.