Profiling With Intel VTune Amplifier

Profiling With Intel VTune Amplifier

Portrait de Divino C.

Why sometimes the "CPU Time by Utilization" is annotated in the function name istead of the function's instructions?
Also, if I sum up the CPU Time of all instructions from the function the total does not match the value attributed in the function name.

What is the effect of the "Collet Stacks" in the Lightweight Hotspot analysis? When I collect with this option enabled the results
tell me that the caller is the hotspot but when I profile with it disable it tells me that the callee is the hotspot, which one is correct?

When should I use lightweight hotspot analysis instead of hotspot analysis.

I use ubuntu 12.04 on Intel i7-3630QM.

PS: Is there any analyzis that show a timeline of an application (separated by thread) execution? I would like to see how much parallel work is being done.

3 posts / 0 nouveau(x)
Dernière contribution
Reportez-vous à notre Notice d'optimisation pour plus d'informations sur les choix et l'optimisation des performances dans les produits logiciels Intel.
Portrait de MrAnderson (Intel)

Hi Divino:

Why sometimes the "CPU Time by Utilization" is annotated in the function name istead of the function's instructions?
Also, if I sum up the CPU Time of all instructions from the function the total does not match the value attributed in the function name.

The VTune Amplifier XE does not record the time over every single instruction.  That is more like instruction simluation and the overhead is enormous!  Instead, VTune Amplifier XE does "periodic sampling" to show a statistical representation of what your application code is doing.

What is the effect of the "Collet Stacks" in the Lightweight Hotspot analysis? When I collect with this option enabled the results
tell me that the caller is the hotspot but when I profile with it disable it tells me that the callee is the hotspot, which one is correct?

That's it!  If you don't "collect stacks", you don't get the calling sequence to the hotspots.  Thus the term "lightweight hotspots".  If you collect only event-based sampling of CPU_CLK_UNHALTED and instructions retired, the overhead is very low, but you only get the location of the sample and not the calling sequence.

When should I use lightweight hotspot analysis instead of hotspot analysis.

You should use Lightweight Hotspots *without* stacks when you want to minimize the overhead of sampling.  Use Hotspots when you need to know the calling sequences.  Hotspots is more for algorithm tuning, while Lightweight Hotspots might be used to start your micro-architectural tuning (although, General Exploration is better suited for micro-architectural tuning).

PS: Is there any analyzis that show a timeline of an application (separated by thread) execution? I would like to see how much parallel work is being done.

All analysis types show a timeline of thread execution.  However, (Basic) Hotspots or Locks and Waits analysis types are better suited to analyzing parallel activity.

Note: starting with Update 9, the Hotspots analysis type is renamed "Basic Hotspots", while Lightweight Hotspots is renamed "Advanced Hotspots."

Regards, MrAnderson
Portrait de Divino C.

Hi Anderson,

thank you for spending time clarifying these points to me.

Connectez-vous pour laisser un commentaire.