使用VTune™ Amplifier XE的二个小贴士

1.      VTune的报告中使用“-time-filter”选项

VTune™ Amplifier XE 的使用过程,工具加载应用程序进行性能数据收集。需要说明的是工具在加载程序后立刻就进行性能数据的收集,有时被监控的程序可能要达到一个稳定状态,数据收集才有意义。当然,我们可以在图形界面的Bottom-up报告中人工选取有意义的时间段来分析,相应的操作是Zoom-in/Filter by Selection。也可以在数据采集中使用Start-paused/Resume的操作,通过人机交互(或使用amplxe-cl-command 命令来Resume)。这些都比较麻烦。

我们还是建议全程采集数据,在报告产生的过程中,直接选用(感兴趣的)时间段来过滤。

如:

# amplxe-cl -collect concurrency -r r001cc -- ./primes.icc

# amplxe-cl -report summary -r r001cc

Summary

-------

Average Concurrency:  2.405

Elapsed Time:         0.890

CPU Time:             2.120

Wait Time:            0.889

CPU Usage:            2.388

# amplxe-cl -report hotspots -r r001cc -time-filter 0.3:0.5 -show-as=percent

amplxe: Using result path `/home/peter/problem_report/r001cc'

amplxe: Executing actions 50 % Generating a report                            

Function    Module      CPU Time:Self  CPU Time:Idle:Self  CPU Time:Poor:Self  CPU Time:Ok:Self  CPU Time:Ideal:Self  CPU Time:Over:Self  Wait Time:Self

----------  ----------  -------------  ------------------  ------------------  ----------------  -------------------  ------------------  --------------

findPrimes  primes.icc  100.0%         0%                  100.0%              0%                0%                   0%                  0

main        primes.icc  0              0                   0                   0                 0                    0                   100.0%

如此一来,可以方便的过滤掉“不合理”的时间段。

 

2.      使用 frequency sleep 分析

当我们在测试一个平台的性能时,往往会考虑在某个时间段中CPU各种频率占用和时间。VTune™ Amplifier XE 2013 提供了这种分析的可能性,如:

# amplxe-cl -collect frequency – firefox

# amplxe-cl -report frequency-analysis -r r001frequency

--------

 Summary

--------

 

For All CPUs

------------

Frequency  Turbo Frequency     Time

---------  ---------------  -------

1.596GHz   Normal           161.920

3.325GHz   Turbo              4.464

1.862GHz   Normal             0.192

2.793GHz   Normal             0.173

3.059GHz   Normal             0.081

1.995GHz   Normal             0.048

2.527GHz   Normal             0.029

3.192GHz   Turbo              0.026

2.261GHz   Normal             0.020

2.128GHz   Normal             0.017

2.926GHz   Normal             0.017

2.394GHz   Normal             0.010

2.66GHz    Normal             0.005

1.729GHz   Normal             0.003

 

For Per-CPU

-----------

Core       Time  1.596GHz:Time  1.729GHz:Time  1.862GHz:Time  1.995GHz:Time  2.128GHz:Time  2.261GHz:Time  2.394GHz:Time  2.527GHz:Time  2.66GHz:Time  2.793GHz:Time  2.926GHz:Time  3.059GHz:Time  3.192GHz:Time  3.325GHz:Time

-------  ------  -------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------  ------------  -------------  -------------  -------------  -------------  -------------

core_0   27.834         26.189              0          0.032          0.020              0          0.010          0.010              0             0          0.020              0          0.065          0.008          1.479

core_1   27.834         27.834              0              0              0              0              0              0              0             0              0              0              0              0          0.000

core_10  27.834         27.834              0              0              0              0              0              0              0             0              0              0              0              0          0.000

core_2   27.834         27.518              0              0              0              0              0              0              0             0              0              0              0              0          0.316

core_8   27.834         26.913          0.003          0.139          0.027          0.017          0.010              0          0.007         0.005          0.028          0.017          0.016          0.018          0.633

core_9   27.834         25.631              0          0.020              0              0              0              0          0.022             0          0.125              0              0              0          2.036

 

另一种是sleep分析。工具收集某个时间段中C-state C0-C6)所占的时间(或权重),Wake-up的次数以及何种原因被唤醒。如:

# amplxe-cl -collect sleep – firefox

# amplxe-cl -report sleep-analysis -r r002sleep

 --------

 Summary

--------

 For All CPUs

------------

Sleep State  C-State Time  Wake-up Count  Wake-up Count per sec

-----------  ------------  -------------  ---------------------

C6                 98.327           5725                 46.740

C3                 16.344          10146                 82.834

C0                  7.096          16184                132.129

C1                  0.720            298                  2.433

 

 For Per-CPU

-----------

Core     C-State Time  C1:C-State Time  C0:C-State Time  C3:C-State Time  C6:C-State Time

-------  ------------  ---------------  ---------------  ---------------  ---------------

core_0         20.414            0.189            2.961            5.495           11.770

core_1         20.414            0.000            0.165            0.022           20.227

core_10        20.414            0.000            0.178            0.012           20.224

core_2         20.414            0.000            0.188            0.121           20.105

core_8         20.414            0.058            0.820            2.759           16.777

core_9         20.414            0.472            2.784            7.934            9.224

 

Core     Wake-up Count  C1:Wake-up Count  C0:Wake-up Count  C3:Wake-up Count  C6:Wake-up Count

-------  -------------  ----------------  ----------------  ----------------  ----------------

core_0            7617                86              3812              2541              1178

core_1             440                 3               220                27               190

core_10            474                 2               237                15               220

core_2             848                 4               424               139               281

core_8            6946                30              3474              1308              2134

core_9           16028               173              8017              6116              1722

 

 --------

 Details

--------

 CPU WakeUps By Reason

---------------------

Wake-up Reason  Wake-up Count

--------------  -------------

Timer                    4594

IRQ                      2647

Unknown                   606

 

CPU WakeUps By IRQ

------------------

Wake-up Object           IRQ  Wake-up Count

-----------------------  ---  -------------

IRQ 33 - eth0            33            2265

IRQ 32 - ahci            32             254

IRQ 7 - SCHED_SOFTIRQ    7              123

IRQ 8 - HRTIMER_SOFTIRQ  8                4

IRQ 19 - uhci_hcd:usb5   19               1

 

CPU WakeUps By Timer

Per informazioni più dettagliate sulle ottimizzazioni basate su compilatore, vedere il nostro Avviso sull'ottimizzazione.