Examples of adding external data to the VTune(TM) Amplifier and Using a custom collector

VTune™ Amplifier XE supports two kinds of performance data collection, they are for “algorithm level tuning” and “microarchitecture level tuning”.
1. “Algorithm level tuning” includes Hotspots Analysis, Concurrency Analysis and LocksandWaits Analysis. Hotspots analysis helps the user to identify the functions which consume CPU time. Concurrency analysis figures out all workloads in all threads (cores) and thread transitions. LockandWaits analysis reports time spent on sync-objects, e.g. time spent on “locks” that might cause stalls on other threads. LocksandWaits also provides information about wait times and wait counts for objects.
2. “Microarchitecture level tuning” uses hardware PMU event-based sampling that helps to identify problems caused by using microprocessor inefficiently, such as cache misses, branch mispredictions, etc.

Sometime it is useful to get other system performance data externally, such as “memory available”, “page fault per sec”, etc. These data can be imported and displayed in the VTune Amplifier report, starting with VTune Amplifier XE 2013 Update 16.

VTune Amplifier XE supports two methods of incorporating external data into existing VTune result.

1. Add external data to VTune Amplifier (example on Windows*)
The Windows utility named typeperf writes performance counter data to the command window, or saves it to a log file. Here is an example using typeperf to generate counter data then modify the data format, and finally import this external data into VTune Amplifier results.
Steps
a. Use “typeperf –qx” to display all supported counters. In this example, I used “\Processor(_Total)\% Processor Time”
b. In command prompt, run “>typeperf -sc 10  "\Processor(_Total)\% Processor Time" "
c. Simultaneously use VTune Amplifier’s advance-hotspots collector to profile an application
d. You will see outputs from tyepeperf,
"(PDH-CSV 4.0)","\\ZWANG14-MOBL2\Processor(_Total)\% Processor Time"
"04/15/2014 08:03:09.626","47.349660"
"04/15/2014 08:03:10.639","68.430207"
"04/15/2014 08:03:11.658","90.259670"
"04/15/2014 08:03:12.660","100.000000"

"04/15/2014 08:03:13.661","25.583946"
"04/15/2014 08:03:14.664","38.747117"
"04/15/2014 08:03:15.669","36.556477"
"04/15/2014 08:03:16.669","38.379610"
"04/15/2014 08:03:17.671","33.442685"
"04/15/2014 08:03:18.698","80.651625"

e. Then, check profiling period from summary report of VTune Amplifier. See Figure-1

f. According to CSV format required by VTune Amplifier, a file with three data lines file was created, named processortime-hostname-zwang14-.csv
tsc.UTC,ProcessorTime.COUNT,pid,tid
2014-04-15 08:03:10.639,68.430207,,
2014-04-15 08:03:11.658,90.259670,,
2014-04-15 08:03:12.660,100.000000,,
g. Now import this CSV file into existing VTune Amplifier results. Click on “Analysis Type” tab of the report, then click on “Import from CSV” button to select processortime-hostname-zwang14.csv to import. See Figure-2

 

h. Figure-3 shows the external data after it was imported and added to the timeline.

2. Using a custom collector (example on Linux)
This example uses a script file run by python. This example was written to detect “page fault per second”, as default. See attached vmstat.zip and, extract it to any folder of your machine.
You can run “amplxe-cl -collect advanced-hotspots -custom-collector="python vmstat.py" -- /home/peter/problem_report/primes.icc” in command line.
On VTune Amplifier GUI, create/modify the Project’s properties and, click on the “Target” tab. Then scroll-down to the bottom and, add “python vmstat.py” to “Custom collector:” field. See Figure-4

The advantage of running custom-collector with VTune Amplifier’s collector is that CSV result file will be copied to the result directory automatically. Opening the result, displays the external data in the timeline panel of the bottom-up report. See Figure-5

Para obtener información más completa sobre las optimizaciones del compilador, consulte nuestro Aviso de optimización.