Vtune Amplifier XE 2013: irrelevant values in "Estimated Call Count" column

Vtune Amplifier XE 2013: irrelevant values in "Estimated Call Count" column

Boris Sunik的头像

I profiled our Application with the Lightweight Hotspots and got completely irrelevant values in the "Estimation Call Count" column

Most of routines were estimated as executed from 100,000,000 till 300,000,000 times despite some of them were actually executed only several times, other several thousand times.

In one more case the data shows 3.5 second execution time and zero data in the "Estimation Call Count"

The Intel Processor is Xeon W3670, System Windows 7

20 帖子 / 0 new
最新文章
如需更全面地了解编译器优化,请参阅优化注意事项
Peter Wang (Intel)的头像

If estimated call counts were huge for the function which was only called limited times, in top-down tree report. That did make sense, maybe its subroutines was called in high counts - they were included in parent function.

If you saw above in buttom-up report, it should be a bug - estimated call counts were contributed to all hot functions, not from parents. Please submit a ticket to https://premier.intel.com, with your test case.

Boris Sunik的头像

I am trying of XE Amplifier 2013 The application is not in my product list on the premier support and there is a restriction 50 Mb for downloads while the project zip has 200 mb (1Gb data ) . So I failed to upload the file

Dmitry Chichkov的头像

I also gave Amplifier 2013/Linux x64 a quick evaluation, and it looks like Lightweight Hotspots + Stack + Counters are completely broken. With a trivial test case:


int fA()  {int i, j; for(i = 0; i < 100000; i++) j += i % 12345;  return j;}

int fB()  {int i, j; for(i = 0; i < 1000000; i++) j += i % 12345; return j;}
int main(char **argv, int argc)

{

    int a,c = 0;

    for(a = 0; a < 1000; a++) {c += fA(); c += fB();}

    return c;

}

Function are present two times in the call three. I'm getting fA/fB counters different and off by an order of magnitude - 9372 / 3124. CPU time zero. Weird wait times. And so on. OS: Linux x64/Ubuntu 11.10; CPU: Core 2 Quad. Built with gcc -O0 -g. Profiled with 1 minute est. run time. Execution time ~7 seconds. Attaching screenshot.






Best, Dmitry

附件: 

附件尺寸
下载 lwhspt.jpg134.13 KB
Peter Wang (Intel)的头像

Thank you for example code.

I verified this on my machine, found two critical issues: (I will update this thread, if any clue/solution found)
1. Missed function main() - caller in the list
2. Call count is zero, that was wrong.

# amplxe-cl -version
Intel(R) VTune(TM) Amplifier XE 2013 (build 243421) Command Line Tool
Copyright (C) 2009-2012 Intel Corporation. All rights reserved.

# gcc -g test_callstack.c -o test_callstack
# amplxe-cl -collect lightweight-hotspots -knob enable-stack-collection=true -knob enable-call-counts=true -- ./test_callstack

# amplxe-cl -report callstacks
Using result path `/home/peter/problem_report/r004lh'
Executing actions 50 % Generating a report
Function Call Stack Module CPU Time:Total CPU Time:Self
----------------------------------- ---------- -------------- -------------- -------------
fB test_callstack 90.58% 3.499

fA test_callstack 9.26% 0.358

do_wp_page vmlinux 0.1% 0.004

do_lookup_x ld-2.5.so 0.02% 0.0007519

Dmitry Chichkov的头像

Any updates?

Dmitry Chichkov的头像

2 weeks... by chance, any updates from devs?

Peter Wang (Intel)的头像

Looks like there was vtune driver installation issue on old Linux kernel, but the tool didn't give message...

Solution: please enable callstack/call count function on latest Linux OS
I tried same steps on another box (redhat-el6), I saw main() function in report.

# amplxe-cl -report callstacks|more
Using result path `/home/peter/problem_report/r000lh'
Executing actions 50 % Generating a report
Function Call Stack Module CPU Time:T
otal CPU Time:Self
-------------------------- ------------------------ -------------- ----------
---- -------------
__libc_start_main libc-2.12.so 99.59%
0

main test_callstack 99.59%
0
__libc_start_main libc-2.12.so 99.59%
0

fB test_callstack 90.42%
2.634
main test_callstack 90.42%
2.634
__libc_start_main libc-2.12.so

Dmitry Chichkov的头像

Intresting. Curious, what fA/fB call counters and fA/fB CPU times are you getting?

Incedentaly, my call counters weren't zero, like yours. They were just wrong by an order of magnitude (9400 instead of 1000). Installation was on the x64, Linux 3.0.0-16-server, Ubuntu, Xeon E5345 [Clovertown].

Peter Wang (Intel)的头像

Here are screen shot of my result.

附件: 

附件尺寸
下载 callcount.jpg112.78 KB
Dmitry Chichkov的头像

Looks like results are more consistent in your case - you are getting similar counter values for fA and fB. But fA call count is 7,935 instead of expected 1000. Any ideas onto how to rectify that?

Peter Wang (Intel)的头像

Call stacks and call counts are available only starting from Linux* kernel 2.6.28 or later. Please check your system and also notice there is no any warnings about this.

Dave G.的头像

I'm trying to run, and don't even get call counts. What am I missing?

As background, we just installed XE 2013 Update 2 (build 253325) for Linux on a RHE6.3 machine. Uname -a gives: Linux a5.colo.ucirrus.com 2.6.32-279.14.1.el6.x86_64 #1 SMP Mon Oct 15 13:44:51 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux.

I am running as root, as I can't get the driver loaded as me, despite adding my login id to /etc/groups for vtune user (which is all that had to be done for RHE5).

For analysis type (lightweight hotspots), I selected collect stacks, estimate call counts, and analyze user tasks.

Hardware is a Nehelm / Westmere, from /proc/cpuinfo, I have the following:
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 47
model name : Intel(R) Xeon(R) CPU E7- 4860 @ 2.27GHz
stepping : 2
cpu MHz : 2261.178
cache size : 24576 KB
physical id : 0
siblings : 20
core id : 0
cpu cores : 10
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat
pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 x2apic popcnt aes lahf_lm arat dts tpr_shadow vnmi flexpriority ept vpid
bogomips : 4522.35
clflush size : 64
cache_alignment : 64
address sizes : 44 bits physical, 48 bits virtual
power management:

System has 40 cores (20 + hyperthreading).

When I launch the capture, I start paused, and when our program is at the correct location to start performance analysis, I start it, capture for about 10 seconds, and then stop it (in vTune gui).

The command that vTune gui claims to execute is "amplxe-cl -collect nehalem-general-exploration -knob enable-stack-collection=true -- /home/daveg/SVN/trunk/bin/Release/_pvm". _pvm is built with the Intel c++ compiler.

I'm attaching the top part of the GUI output.

附件: 

附件尺寸
下载 capture.png253.11 KB
Peter Wang (Intel)的头像

I got confused that your screen-shot displays lightweight-hotspots, but you said to use nehalem-general-exploration. So, use
amplxe-cl -collect lightweight-hotspots -knob enable-stack-collection=true -knob enable-call-counts=true -- /home/daveg/SVN/trunk/bin/Release/_pvm
Note that you have to add "-g" option to generate debug info when building "_pvm"

Kyung_Seok L.的头像

I have same problem as what Dave G has.

I'm trying to run, and don't get "estimated call counts".

I installed XE 2013 (build 261256) for Linux on a Ubuntu 10.04 machine.

I am running as root.

For analysis type (lightweight hotspots), I selected collect stacks, estimate call counts, and analyze user tasks.

there is a capture of the GUI output attached.

The command that vTune gui claims to execute is "amplxe-cl -collect lightweight-hotspots -knob enable-stack-collection=true -knob enable-call-counts=true -- /home/jisung/test/test". test is built with the gcc with debug option.

cpu name : 3rd generation intel(R) core(TM) processor family

can anyone help me ??

附件: 

附件尺寸
下载 2012-12-28-164213.png82.35 KB
Peter Wang (Intel)的头像

Is it possible that "estimated call count" column is invisible in right you need to scroll-right?
Not sure if you worked on old OS, that estimated call stack and estimated call count are not supported. Please try this feature on latest OS.
Also check " lsmod | grep vtsspp" to ensue the driver has been installed.

Kyung_Seok L.的头像

Thank you for your reply, Perter.

My os is Ubuntu 12.04.1 LTS, which is quite latest,

I checked "lsmod | grep vtsspp" to check out the drivers.

What else should i check? to see "estimated call count"...

Peter Wang (Intel)的头像

Quote:

Kyung_Seok L. wrote:

Thank you for your reply, Perter.

My os is Ubuntu 12.04.1 LTS, which is quite latest,

I checked "lsmod | grep vtsspp" to check out the drivers.

What else should i check? to see "estimated call count"...


No other thing to do, just
add options "-knob enable-stack-collection=true -knob enable-call-counts=true" in amplxe-cl, or enable them on GUI.
If you still cant see call count, submit your results to https://premier.intel.com for investigating.
Kyung_Seok L.的头像

I recently find out why I didn't get Estimated call count infomation.

The problem was the code.

I used test code which was very short and Vtune wasn't able to figure out call count info from the test code.

When I profiled with my project, there was no problem.

If anyone gets this problem, try with long one.

By the way,

Is there any way to collect "Estimated call count" with amplxe-cl?

I looked at

http://software.intel.com/sites/products/documentation/hpc/amplifierxe/e...

and can not find the way to collect call counts.

Peter Wang (Intel)的头像

No sample captured, no call stack info - which was called "statistical call stack" info.

Use command line, for example:

amplxe-cl -collect lightweight-hotspots -knob enable-stack-collection=true -knob enable-call-counts=true -- target-app

登陆并发表评论。