The debug or -g setting is often satisfactory for that purpose. For full performance you would want an architecture switch such as -xhost and at least -O2 and possibly -unroll4. You should select -standard-semantics or some of its sub-options.
In gathering timing data, you may want to discard the first pass (or record it separately). This pass may incur additional overhead that may be part of or may interfere with you obtaining good test data. Consider placing your timed intervals into an array of times. This will give you a better picture of what is going on. If you are venturesome, you might consider using thread local storage and managing the timing statistics per thread. Sometimes this is an eye-opener too.
In gathering timing data, you may want to discard the first pass (or record it separately). This pass may incur additional overhead that may be part of or may interfere with you obtaining good test data. Consider placing your timed intervals into an array of times. This will give you a better picture of what is going on. If you are venturesome, you might consider using thread local storage and managing the timing statistics per thread. Sometimes this is an eye-opener too.
Jim Dempsey
Dear Jim,
What I am after at the moment is to time some of my algorithms and to see if I am getting close values between these runs, then I will either average these timing data or get the median of it.
I am performing, at least at the moment, everything sequentially, my linear solver which is the most important on the costs already is compiled for sequential mode(MUMPS).
The most common situation is that you get close values for most of your runs, but the first is much slower because you have to fill instruction cache and maybe data cache and the BTB. Some other runs can be slower because another process knocked them out of caches or the OS moved the test to another core, using different caches. So you typically want to look out for, and perhaps reject, outliers that are observed for these reasons.