DPDK Application Profiling using Intel® VTune™ Amplifier

Overview

Intel® VTune™ Amplifier is a very powerful tool that is used to profile applications to find out performance overhead. This video provides step-by-step demonstrate on how one can use Intel® VTune™ Amplifier to find out, performance bottlenecks in an application, and where actually in source code need to change to improve performance.

Resources

Download and learn more about Intel® VTune™ Amplifier at https://software.intel.com/en-us/intel-vtune-amplifier-xe

Transcript

Title:: Using VTune tool for application performance analysis

In demo setup, I've already installed VTune, and will be using DPDK sample app testpmd for profiling purpose.

Make sure you build debug version of your application, this allows vtune to collect more information for you.

You can build debug version by using "-g" flag into your application make file.

Running application that you want to profile. which in my case, here, is testpmd.

Now dpdk sample application testpmd is running.

Run VTune to profile testpmd.

For that, first set environment to run VTune.

Next to profile testpmd, either we can use VTune GUI or command-line. To use GUI, we can use amplxe-gui command. However I will use command line option here, because I want to capture information for only few particular performance counters.

This is my VTune script I will run from command-line.

It says collect samples for target-process named "testpmd" for duration "40 seconds" for above mentioned performance counters.

Let's run this script. Now VTune observing testpmd for 40 seconds, and then give us result.

After getting results, scroll up a bit. You can see path to the directory where this result is been stored.

you can also find more details like operating system used, hardware events, etc. Now lets open this result in graphical user interface mode.

Go to "open results" and give path to where recently captured result is been stored.

So here is the result performance counters captured using VTune. We can see there is l1 & l2 misses occur.

Then you can go to check hotspots information. This sections gives more info about what are the most active functions in your application, and how was cpu usage from your app over the time.

If you're interested to see more specific and detailed info for your app most active functions, click on bottom-up.

Here we also had application debug version, that's why we can click on source code file and can actually see in which region of code we need to do changes to fix the performance degradation issue.

Check out more tabs, there is tons of useful information there as well. This completes this video on using VTune for application profiling.

This completes this video on "Using VTune for application performance analysis".

In summary, you saw how to run vtune through command-line, and capture results for application profiling and analysis.

You also saw how running VTune for application "debug" version provided more detailed information for code optimization.

产品和性能信息

1

英特尔的编译器针对非英特尔微处理器的优化程度可能与英特尔微处理器相同(或不同)。这些优化包括 SSE2、SSE3 和 SSSE3 指令集和其他优化。对于在非英特尔制造的微处理器上进行的优化,英特尔不对相应的可用性、功能或有效性提供担保。该产品中依赖于微处理器的优化仅适用于英特尔微处理器。某些非特定于英特尔微架构的优化保留用于英特尔微处理器。关于此通知涵盖的特定指令集的更多信息,请参阅适用产品的用户指南和参考指南。

通知版本 #20110804