Vtune Amplifier XE 2013: irrelevant values in "Estimated Call Count" column

Vtune Amplifier XE 2013: irrelevant values in "Estimated Call Count" column

Bild des Benutzers Boris Sunik

I profiled our Application with the Lightweight Hotspots and got completely irrelevant values in the "Estimation Call Count" column

Most of routines were estimated as executed from 100,000,000 till 300,000,000 times despite some of them were actually executed only several times, other several thousand times.

In one more case the data shows 3.5 second execution time and zero data in the "Estimation Call Count"

The Intel Processor is Xeon W3670, System Windows 7

20 Beiträge / 0 neu
Letzter Beitrag
Nähere Informationen zur Compiler-Optimierung finden Sie in unserem Optimierungshinweis.
Bild des Benutzers Peter Wang (Intel)

If estimated call counts were huge for the function which was only called limited times, in top-down tree report. That did make sense, maybe its subroutines was called in high counts - they were included in parent function.

If you saw above in buttom-up report, it should be a bug - estimated call counts were contributed to all hot functions, not from parents. Please submit a ticket to https://premier.intel.com, with your test case.

Bild des Benutzers Boris Sunik

I am trying of XE Amplifier 2013 The application is not in my product list on the premier support and there is a restriction 50 Mb for downloads while the project zip has 200 mb (1Gb data ) . So I failed to upload the file

Bild des Benutzers Dmitry Chichkov

I also gave Amplifier 2013/Linux x64 a quick evaluation, and it looks like Lightweight Hotspots + Stack + Counters are completely broken. With a trivial test case:


int fA()  {int i, j; for(i = 0; i < 100000; i++) j += i % 12345;  return j;}

int fB()  {int i, j; for(i = 0; i < 1000000; i++) j += i % 12345; return j;}
int main(char **argv, int argc)

{

    int a,c = 0;

    for(a = 0; a < 1000; a++) {c += fA(); c += fB();}

    return c;

}

Function are present two times in the call three. I'm getting fA/fB counters different and off by an order of magnitude - 9372 / 3124. CPU time zero. Weird wait times. And so on. OS: Linux x64/Ubuntu 11.10; CPU: Core 2 Quad. Built with gcc -O0 -g. Profiled with 1 minute est. run time. Execution time ~7 seconds. Attaching screenshot.






Best, Dmitry

Anlagen: 

AnhangGröße
Herunterladen lwhspt.jpg134.13 KB
Bild des Benutzers Peter Wang (Intel)

Thank you for example code.

I verified this on my machine, found two critical issues: (I will update this thread, if any clue/solution found)
1. Missed function main() - caller in the list
2. Call count is zero, that was wrong.

# amplxe-cl -version
Intel(R) VTune(TM) Amplifier XE 2013 (build 243421) Command Line Tool
Copyright (C) 2009-2012 Intel Corporation. All rights reserved.

# gcc -g test_callstack.c -o test_callstack
# amplxe-cl -collect lightweight-hotspots -knob enable-stack-collection=true -knob enable-call-counts=true -- ./test_callstack

# amplxe-cl -report callstacks
Using result path `/home/peter/problem_report/r004lh'
Executing actions 50 % Generating a report
Function Call Stack Module CPU Time:Total CPU Time:Self
----------------------------------- ---------- -------------- -------------- -------------
fB test_callstack 90.58% 3.499

fA test_callstack 9.26% 0.358

do_wp_page vmlinux 0.1% 0.004

do_lookup_x ld-2.5.so 0.02% 0.0007519

Bild des Benutzers Dmitry Chichkov

Any updates?

Bild des Benutzers Dmitry Chichkov

2 weeks... by chance, any updates from devs?

Bild des Benutzers Peter Wang (Intel)

Looks like there was vtune driver installation issue on old Linux kernel, but the tool didn't give message...

Solution: please enable callstack/call count function on latest Linux OS
I tried same steps on another box (redhat-el6), I saw main() function in report.

# amplxe-cl -report callstacks|more
Using result path `/home/peter/problem_report/r000lh'
Executing actions 50 % Generating a report
Function Call Stack Module CPU Time:T
otal CPU Time:Self
-------------------------- ------------------------ -------------- ----------
---- -------------
__libc_start_main libc-2.12.so 99.59%
0

main test_callstack 99.59%
0
__libc_start_main libc-2.12.so 99.59%
0

fB test_callstack 90.42%
2.634
main test_callstack 90.42%
2.634
__libc_start_main libc-2.12.so

Bild des Benutzers Dmitry Chichkov

Intresting. Curious, what fA/fB call counters and fA/fB CPU times are you getting?

Incedentaly, my call counters weren't zero, like yours. They were just wrong by an order of magnitude (9400 instead of 1000). Installation was on the x64, Linux 3.0.0-16-server, Ubuntu, Xeon E5345 [Clovertown].

Bild des Benutzers Peter Wang (Intel)

Here are screen shot of my result.

Anlagen: 

AnhangGröße
Herunterladen callcount.jpg112.78 KB
Bild des Benutzers Dmitry Chichkov

Looks like results are more consistent in your case - you are getting similar counter values for fA and fB. But fA call count is 7,935 instead of expected 1000. Any ideas onto how to rectify that?

Bild des Benutzers Peter Wang (Intel)

Call stacks and call counts are available only starting from Linux* kernel 2.6.28 or later. Please check your system and also notice there is no any warnings about this.

Bild des Benutzers Dave G.

I'm trying to run, and don't even get call counts. What am I missing?

As background, we just installed XE 2013 Update 2 (build 253325) for Linux on a RHE6.3 machine. Uname -a gives: Linux a5.colo.ucirrus.com 2.6.32-279.14.1.el6.x86_64 #1 SMP Mon Oct 15 13:44:51 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux.

I am running as root, as I can't get the driver loaded as me, despite adding my login id to /etc/groups for vtune user (which is all that had to be done for RHE5).

For analysis type (lightweight hotspots), I selected collect stacks, estimate call counts, and analyze user tasks.

Hardware is a Nehelm / Westmere, from /proc/cpuinfo, I have the following:
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 47
model name : Intel(R) Xeon(R) CPU E7- 4860 @ 2.27GHz
stepping : 2
cpu MHz : 2261.178
cache size : 24576 KB
physical id : 0
siblings : 20
core id : 0
cpu cores : 10
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat
pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 x2apic popcnt aes lahf_lm arat dts tpr_shadow vnmi flexpriority ept vpid
bogomips : 4522.35
clflush size : 64
cache_alignment : 64
address sizes : 44 bits physical, 48 bits virtual
power management:

System has 40 cores (20 + hyperthreading).

When I launch the capture, I start paused, and when our program is at the correct location to start performance analysis, I start it, capture for about 10 seconds, and then stop it (in vTune gui).

The command that vTune gui claims to execute is "amplxe-cl -collect nehalem-general-exploration -knob enable-stack-collection=true -- /home/daveg/SVN/trunk/bin/Release/_pvm". _pvm is built with the Intel c++ compiler.

I'm attaching the top part of the GUI output.

Anlagen: 

AnhangGröße
Herunterladen capture.png253.11 KB
Bild des Benutzers Peter Wang (Intel)

I got confused that your screen-shot displays lightweight-hotspots, but you said to use nehalem-general-exploration. So, use
amplxe-cl -collect lightweight-hotspots -knob enable-stack-collection=true -knob enable-call-counts=true -- /home/daveg/SVN/trunk/bin/Release/_pvm
Note that you have to add "-g" option to generate debug info when building "_pvm"

Bild des Benutzers Kyung_Seok L.

I have same problem as what Dave G has.

I'm trying to run, and don't get "estimated call counts".

I installed XE 2013 (build 261256) for Linux on a Ubuntu 10.04 machine.

I am running as root.

For analysis type (lightweight hotspots), I selected collect stacks, estimate call counts, and analyze user tasks.

there is a capture of the GUI output attached.

The command that vTune gui claims to execute is "amplxe-cl -collect lightweight-hotspots -knob enable-stack-collection=true -knob enable-call-counts=true -- /home/jisung/test/test". test is built with the gcc with debug option.

cpu name : 3rd generation intel(R) core(TM) processor family

can anyone help me ??

Anlagen: 

AnhangGröße
Herunterladen 2012-12-28-164213.png82.35 KB
Bild des Benutzers Peter Wang (Intel)

Is it possible that "estimated call count" column is invisible in right you need to scroll-right?
Not sure if you worked on old OS, that estimated call stack and estimated call count are not supported. Please try this feature on latest OS.
Also check " lsmod | grep vtsspp" to ensue the driver has been installed.

Bild des Benutzers Kyung_Seok L.

Thank you for your reply, Perter.

My os is Ubuntu 12.04.1 LTS, which is quite latest,

I checked "lsmod | grep vtsspp" to check out the drivers.

What else should i check? to see "estimated call count"...

Bild des Benutzers Peter Wang (Intel)

Zitat:

Kyung_Seok L. schrieb:

Thank you for your reply, Perter.

My os is Ubuntu 12.04.1 LTS, which is quite latest,

I checked "lsmod | grep vtsspp" to check out the drivers.

What else should i check? to see "estimated call count"...


No other thing to do, just
add options "-knob enable-stack-collection=true -knob enable-call-counts=true" in amplxe-cl, or enable them on GUI.
If you still cant see call count, submit your results to https://premier.intel.com for investigating.
Bild des Benutzers Kyung_Seok L.

I recently find out why I didn't get Estimated call count infomation.

The problem was the code.

I used test code which was very short and Vtune wasn't able to figure out call count info from the test code.

When I profiled with my project, there was no problem.

If anyone gets this problem, try with long one.

By the way,

Is there any way to collect "Estimated call count" with amplxe-cl?

I looked at

http://software.intel.com/sites/products/documentation/hpc/amplifierxe/e...

and can not find the way to collect call counts.

Bild des Benutzers Peter Wang (Intel)

No sample captured, no call stack info - which was called "statistical call stack" info.

Use command line, for example:

amplxe-cl -collect lightweight-hotspots -knob enable-stack-collection=true -knob enable-call-counts=true -- target-app

Melden Sie sich an, um einen Kommentar zu hinterlassen.