Vtune Amplifier XE 2013: irrelevant values in "Estimated Call Count" column

Vtune Amplifier XE 2013: irrelevant values in "Estimated Call Count" column

Imagen de Boris Sunik

I profiled our Application with the Lightweight Hotspots and got completely irrelevant values in the "Estimation Call Count" column

Most of routines were estimated as executed from 100,000,000 till 300,000,000 times despite some of them were actually executed only several times, other several thousand times.

In one more case the data shows 3.5 second execution time and zero data in the "Estimation Call Count"

The Intel Processor is Xeon W3670, System Windows 7

publicaciones de 20 / 0 nuevos
Último envío
Para obtener más información sobre las optimizaciones del compilador, consulte el aviso sobre la optimización.
Imagen de Peter Wang (Intel)

If estimated call counts were huge for the function which was only called limited times, in top-down tree report. That did make sense, maybe its subroutines was called in high counts - they were included in parent function.

If you saw above in buttom-up report, it should be a bug - estimated call counts were contributed to all hot functions, not from parents. Please submit a ticket to https://premier.intel.com, with your test case.

Imagen de Boris Sunik

I am trying of XE Amplifier 2013 The application is not in my product list on the premier support and there is a restriction 50 Mb for downloads while the project zip has 200 mb (1Gb data ) . So I failed to upload the file

Imagen de Dmitry Chichkov

I also gave Amplifier 2013/Linux x64 a quick evaluation, and it looks like Lightweight Hotspots + Stack + Counters are completely broken. With a trivial test case:


int fA()  {int i, j; for(i = 0; i < 100000; i++) j += i % 12345;  return j;}

int fB()  {int i, j; for(i = 0; i < 1000000; i++) j += i % 12345; return j;}
int main(char **argv, int argc)

{

    int a,c = 0;

    for(a = 0; a < 1000; a++) {c += fA(); c += fB();}

    return c;

}

Function are present two times in the call three. I'm getting fA/fB counters different and off by an order of magnitude - 9372 / 3124. CPU time zero. Weird wait times. And so on. OS: Linux x64/Ubuntu 11.10; CPU: Core 2 Quad. Built with gcc -O0 -g. Profiled with 1 minute est. run time. Execution time ~7 seconds. Attaching screenshot.






Best, Dmitry

Adjuntos: 

AdjuntoTamaño
Descargar lwhspt.jpg134.13 KB
Imagen de Peter Wang (Intel)

Thank you for example code.

I verified this on my machine, found two critical issues: (I will update this thread, if any clue/solution found)
1. Missed function main() - caller in the list
2. Call count is zero, that was wrong.

# amplxe-cl -version
Intel(R) VTune(TM) Amplifier XE 2013 (build 243421) Command Line Tool
Copyright (C) 2009-2012 Intel Corporation. All rights reserved.

# gcc -g test_callstack.c -o test_callstack
# amplxe-cl -collect lightweight-hotspots -knob enable-stack-collection=true -knob enable-call-counts=true -- ./test_callstack

# amplxe-cl -report callstacks
Using result path `/home/peter/problem_report/r004lh'
Executing actions 50 % Generating a report
Function Call Stack Module CPU Time:Total CPU Time:Self
----------------------------------- ---------- -------------- -------------- -------------
fB test_callstack 90.58% 3.499

fA test_callstack 9.26% 0.358

do_wp_page vmlinux 0.1% 0.004

do_lookup_x ld-2.5.so 0.02% 0.0007519

Imagen de Dmitry Chichkov

Any updates?

Imagen de Dmitry Chichkov

2 weeks... by chance, any updates from devs?

Imagen de Peter Wang (Intel)

Looks like there was vtune driver installation issue on old Linux kernel, but the tool didn't give message...

Solution: please enable callstack/call count function on latest Linux OS
I tried same steps on another box (redhat-el6), I saw main() function in report.

# amplxe-cl -report callstacks|more
Using result path `/home/peter/problem_report/r000lh'
Executing actions 50 % Generating a report
Function Call Stack Module CPU Time:T
otal CPU Time:Self
-------------------------- ------------------------ -------------- ----------
---- -------------
__libc_start_main libc-2.12.so 99.59%
0

main test_callstack 99.59%
0
__libc_start_main libc-2.12.so 99.59%
0

fB test_callstack 90.42%
2.634
main test_callstack 90.42%
2.634
__libc_start_main libc-2.12.so

Imagen de Dmitry Chichkov

Intresting. Curious, what fA/fB call counters and fA/fB CPU times are you getting?

Incedentaly, my call counters weren't zero, like yours. They were just wrong by an order of magnitude (9400 instead of 1000). Installation was on the x64, Linux 3.0.0-16-server, Ubuntu, Xeon E5345 [Clovertown].

Imagen de Peter Wang (Intel)

Here are screen shot of my result.

Adjuntos: 

AdjuntoTamaño
Descargar callcount.jpg112.78 KB
Imagen de Dmitry Chichkov

Looks like results are more consistent in your case - you are getting similar counter values for fA and fB. But fA call count is 7,935 instead of expected 1000. Any ideas onto how to rectify that?

Imagen de Peter Wang (Intel)

Call stacks and call counts are available only starting from Linux* kernel 2.6.28 or later. Please check your system and also notice there is no any warnings about this.

Imagen de Dave G.

I'm trying to run, and don't even get call counts. What am I missing?

As background, we just installed XE 2013 Update 2 (build 253325) for Linux on a RHE6.3 machine. Uname -a gives: Linux a5.colo.ucirrus.com 2.6.32-279.14.1.el6.x86_64 #1 SMP Mon Oct 15 13:44:51 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux.

I am running as root, as I can't get the driver loaded as me, despite adding my login id to /etc/groups for vtune user (which is all that had to be done for RHE5).

For analysis type (lightweight hotspots), I selected collect stacks, estimate call counts, and analyze user tasks.

Hardware is a Nehelm / Westmere, from /proc/cpuinfo, I have the following:
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 47
model name : Intel(R) Xeon(R) CPU E7- 4860 @ 2.27GHz
stepping : 2
cpu MHz : 2261.178
cache size : 24576 KB
physical id : 0
siblings : 20
core id : 0
cpu cores : 10
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat
pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 x2apic popcnt aes lahf_lm arat dts tpr_shadow vnmi flexpriority ept vpid
bogomips : 4522.35
clflush size : 64
cache_alignment : 64
address sizes : 44 bits physical, 48 bits virtual
power management:

System has 40 cores (20 + hyperthreading).

When I launch the capture, I start paused, and when our program is at the correct location to start performance analysis, I start it, capture for about 10 seconds, and then stop it (in vTune gui).

The command that vTune gui claims to execute is "amplxe-cl -collect nehalem-general-exploration -knob enable-stack-collection=true -- /home/daveg/SVN/trunk/bin/Release/_pvm". _pvm is built with the Intel c++ compiler.

I'm attaching the top part of the GUI output.

Adjuntos: 

AdjuntoTamaño
Descargar capture.png253.11 KB
Imagen de Peter Wang (Intel)

I got confused that your screen-shot displays lightweight-hotspots, but you said to use nehalem-general-exploration. So, use
amplxe-cl -collect lightweight-hotspots -knob enable-stack-collection=true -knob enable-call-counts=true -- /home/daveg/SVN/trunk/bin/Release/_pvm
Note that you have to add "-g" option to generate debug info when building "_pvm"

Imagen de Kyung_Seok L.

I have same problem as what Dave G has.

I'm trying to run, and don't get "estimated call counts".

I installed XE 2013 (build 261256) for Linux on a Ubuntu 10.04 machine.

I am running as root.

For analysis type (lightweight hotspots), I selected collect stacks, estimate call counts, and analyze user tasks.

there is a capture of the GUI output attached.

The command that vTune gui claims to execute is "amplxe-cl -collect lightweight-hotspots -knob enable-stack-collection=true -knob enable-call-counts=true -- /home/jisung/test/test". test is built with the gcc with debug option.

cpu name : 3rd generation intel(R) core(TM) processor family

can anyone help me ??

Adjuntos: 

AdjuntoTamaño
Descargar 2012-12-28-164213.png82.35 KB
Imagen de Peter Wang (Intel)

Is it possible that "estimated call count" column is invisible in right you need to scroll-right?
Not sure if you worked on old OS, that estimated call stack and estimated call count are not supported. Please try this feature on latest OS.
Also check " lsmod | grep vtsspp" to ensue the driver has been installed.

Imagen de Kyung_Seok L.

Thank you for your reply, Perter.

My os is Ubuntu 12.04.1 LTS, which is quite latest,

I checked "lsmod | grep vtsspp" to check out the drivers.

What else should i check? to see "estimated call count"...

Imagen de Peter Wang (Intel)

Cita:

Kyung_Seok L. wrote:

Thank you for your reply, Perter.

My os is Ubuntu 12.04.1 LTS, which is quite latest,

I checked "lsmod | grep vtsspp" to check out the drivers.

What else should i check? to see "estimated call count"...


No other thing to do, just
add options "-knob enable-stack-collection=true -knob enable-call-counts=true" in amplxe-cl, or enable them on GUI.
If you still cant see call count, submit your results to https://premier.intel.com for investigating.
Imagen de Kyung_Seok L.

I recently find out why I didn't get Estimated call count infomation.

The problem was the code.

I used test code which was very short and Vtune wasn't able to figure out call count info from the test code.

When I profiled with my project, there was no problem.

If anyone gets this problem, try with long one.

By the way,

Is there any way to collect "Estimated call count" with amplxe-cl?

I looked at

http://software.intel.com/sites/products/documentation/hpc/amplifierxe/e...

and can not find the way to collect call counts.

Imagen de Peter Wang (Intel)

No sample captured, no call stack info - which was called "statistical call stack" info.

Use command line, for example:

amplxe-cl -collect lightweight-hotspots -knob enable-stack-collection=true -knob enable-call-counts=true -- target-app

Inicie sesión para dejar un comentario.