Using VTune for absolute performance #'s

Using VTune for absolute performance #'s

I'm trying to compare two implementations of a particular function for their performance in terms of cpu time and floating-point instructions-retired. I'd prefer not to use any kind of stochastic sampling, I just want to know how many cycles and how many flops elapsed between point A and point B in my code, where this fragment will be executed many times in a single program run.

Unless I'm mis-reading everything, VTune's sampling is stochastic, either time-based or event-based. Is there a way to make VTune's sampling _exhaustive_, so I get the total # of instructions/flops in a function?

I am including VTuneApi calls at the beginning and end of the function to resume and pause data collection.

Really I'm looking for something very much like PAPI (http://icl.cs.utk.edu/papi/), which doesn't support Windows/P4 machines. I'm hoping VTune can deliver this functionality.

Thanks...

-Dan

-----------------------------------
Dan Morris
dmorris@cs.stanford.edu
http://cs.stanford.edu/~dmorris
-----------------------------------

1 Beitrag / 0 neu
Nähere Informationen zur Compiler-Optimierung finden Sie in unserem Optimierungshinweis.