How many performance counters does VTune use on Xeon? 2 or 4 or 8 or 18?


My first question is how many performance counters Xeon has? Some manual says 18, some says 4 for Non-HT Xeon and 8 for HT Xeon.

My second question is how many performance counters VTune uses on Xeon? It seems VTune only uses 2 of them, why not use all of them? The more counters, the less iterations, right?

To view IA64 vtune results on IA32 computer

I got the sampling data on Itanium2 machine.Because I have little
time to use Itanium2 machine, I want to view the result on IA32
When I copy the result project directory of IA64 to IA32 machine
and open vpj file on IA32 machine,Vtune says
'An unhandled exception occurred while processing the command'.
After that, I can open Tuning Browser and view the result.
But Vtune freeze when I try to sample over time.
Then I have 2 questions.
Is it possible to view vtune results on a different architecture machine?

vmlinux reported as using 98% of CPU_CYCLES

I am using Vtune 3.0 on RedHat 3, Update 5. The 2 processors are Itanium 2's.

I do:
vtl activity run1 -c sampling -o "-ec en=CPU_CYCLES" -d 20 -app ./a.out,"args" run
vtl view | more

The view says that Module vmlinux, Process pid0x0 is taking roughly 98% of the CPU cycles, even though my app is CPU intensive and runs 20 seconds.

When I took the "Tuning for the Intel Itanium 2 Microarchitecture", I didn't have this problem.

I do not have a Windows workstation connected to the Linux box, so I must use the vtl interface.

Any ideas?

VTune itself consumed 21% clocktics.


We are using the latest VTune right now to profile Linux (kernel version network stack activities. I found sometimes VTune itself consumed about 21% total clock ticks by calling the following two functions:

I'm just wondering why the overhead of VTune becomes so high? What is the nornal overhead when using VTune? Less than 5% clockticks?



Module Of Interest Never Appears in VTune Results

I am attempting to tune an application that consists of one small Windows executable and several large dlls that implement various portions of the application functionality.

I am particularly interested in three of the dlls of this application. When the application executes dll A is ALWAYS exercised. dll B is called when particular new functionality is executed. dll B in turns calls dll C which was built by a third party.

data alignment problem?

I have a case that Vtune displayed a source line with very high timer value from the time-basing sampling. The source line was deferencingpointers of pointers of a struct "node" passed as a function parameter as the following,
node->field4 = node->field1->field3+node->field2->field4;
where field1is a pointer to another struct andfield2 ia a pointer to"node" struct itself. The field4 was 44 bytes offset from the beginning of the struct, the following assembly code was shown as the hotspot from Vtune,
add ebx DWORD PTR [ecx+02ch]

Vtune & reports

Hi all.
In the last Gelato Meeting (developers of Itanium2), there was a presentation of the features of Vtune, particularly to view Intel C++ compiler reports inside vtune.

I didn't remeber very well, it's necesary to compile using -mGLOB.... and to run Vtune with the variable VT_ENABLE_.....=T or something like that, but I can't find this info in any documentation...

Any help?

Thanks all.

install vtune failed!

hi all
I try to install the VTUNE for linux,and download the vtune package from intel's software ftp site (,

afteruntar the install files, andexecute ./install command,but installation return failed ,and said that :
"Failed - The installation of VTune Performance Analyzer 3.0 for Linux* failed."

