calculating total fpu operations

I am trying to track the math involved in a process. I am not sureif Vtune perf analysis has a feature to display total number of FPU instructions for a particular process.

a) total fadd instructions
b) total fsub instructions
c) total fmul instructions
d) total fsqrt instructions
e) total fdiv instructions

Any insights will help...



Install the Linux remote agent (vtserver) on Ubuntu


I am interested in trying out the Intel Thread Checker. I work on an Ubuntu system and have all my development programs setup there. Unfortunately, VTune is not supported on any kind of Debian-based system (I will throw Ubuntu into that box). I have seen the advice to just use the Linux remote agent (vtserver) in another thread in this forum.

Very low IPC with VTune and others...

I've tested vtune with a couple of spec2000 (with reference input) under Redhat 7.3 Linux using a Pentium 4 machine and got follow very low IPC which seems unreasonable. This is the command I put into

vtl activity -duration 300 -c sampling -o "-ec en='Instructions Retired' en='Clockticks':sa=200000 " -app mcf_base.x86_linux,"" run

New release of the Intel(R) VTune(TM) Performance Analyzer is available!!

The new Intel VTune Performance Analyzer 8.0 for Windows*is now available for download on Intel Premier Support.

New for version 8.0:

  • Support for Microsoft Visual Studio* .NET 2005 and optional integration with Microsoft Visual Studio* .NET 2005 or 2003
  • Support for Microsoft Windows Vista* Beta 1 (build 5112) and Windows Longhorn Server* Beta 1 (build 5112)
  • Linux remote agent support for 2.6 kernels

How to identify lock overhead with VTune analyzer

I am using Vtune analyzer to identify performance bottlenecks in a multithreaded application(BIND 9) on my IA-32, Fedora Core 3 system. What type of profiling i have to do to get correct results. I want to find waiting time for each thread to get lock. how is it possible with VTune analyzer?




I want the explanation of the term 'samples'. For a particular function, its sample value is 52 and Sample After Value is 3.2G(320000), what does this means. If after optimization, i got a value of 28, is this represent the time taken for the function execution?

Tuning assistant

I am using VTune PA 7.2 evaluation. I run my application and go to sampling results. Here i got my hotspot in source code and make it as 'mixed by execution'. As shown in attachment, the hotspot is for memcpy. When i took the tuning assistant, it not showing any tips, only workload insights and system info are displayed. But for the traning tutorial (VTuneTraininggs_vtuneindex.htm ), it showing correct tips for source code.

New VTune 8.0 causes more overhead than ever.


I installed the latest VTune analyzer 8.0 and VTune rdc. After I did several profiling, I found this new version causes more overhead than the old one.
Here is the summary for the overhead caused by VTune:
Samp_Write_PC_File about 15% clock ticks
Prepare_Wait about 8% clockticks
Finish_To_Wait about 8% clock ticks

