clock() or gettimeofday() or ippGetCpuClocks()?

When using IPP, mainly the three different functions used by users to measure timing of a computation or an application or a function in Intel® IPP are clock(), gettimeofday() and ippGetCpuClocks(). Details of each function are listed below and why you should be using ippGetCpuClocks() in your IPP applications instead of clock() or gettimeofday().

clock():   The granularity of clock() function is dependent on implementation by various compiler vendors.  The C standard does not say anything about the granularity of clock() - a compiler can have it check time once a second and increment the variable by CLOCKS_PER_SEC. This means it is possible that, depending on different compiler implementation, you can get zero, CLOCKS_PER_SEC, CLOCKS_PER_SEC * 2 and so on, never getting any intermediate value. Don't use clock() if you need high granularity.

gettimeofday():  It returns time in milliseconds or the wall clock time. The precision of gettimeofday is also very bad, for example, for a 3 GHz machine that means precision == 3 million of cpu clocks only. If your application does only calculations, clock() and gettimeofday() would be fairly close. Any time, if the application starts waiting for something  (for  e.g: DISK  I/O), clock() will lag behind  compared to the gettimeofday().  clock() can also go faster than gettimeofday() if you have multiple threads running in the same process.

ippGetCpuClocks():  The IPP function ippGetCpuClocks() provides precision equals to 1 cpu clock.  If you want to get the highest granularity or precision, we highly recommend you to use ippGetCpuClock(). This can be used even your program is parallel and runs on multiple cores - all TSC counters are synchronized and show the same clocks as like there is the only one counter in a system.

For more complete information about compiler optimizations, see our Optimization Notice.



I think the answer is in Vipin's post Matt. You've got the TSC, which ticks at a constant rate, so you can divide the counter with that constant. Please read Vipin's post carefully.

Okay... so I have the execution time in clocks. How do I convert that reliably to ms?


For the latest Nehalem processors, the time stamp counter (RDTSC) does not vary with the actual operating frequency of the part. This is referred to as "Invariant TSC" and described in the software developers manual (SDM) section 16.11.1 ( ).

Unlike prior parts, Nehalem's TSC does not stop across core C-states. Nehalem also implements the RDTSCP instruction which returns both the TSC value and a new MSR into ECX. For a full explanation of RDTSCP, please see the SDM.

On Nehalem, the TSC runs at a constant frequency of MSR_PLATFORM_INFO[15:8] * 133.33MHz. MSR_PLATFORM_INFO[15:8] will report the lower of the ratio at which the part was stamped or a separate MSR to lower the ratio to provide TSC consistency across multi-socket systems with parts of different frequencies.

Synchronization of the TSC across multiple threads/cores/packages: As long as software does not write the TSC, the Nehalem TSC will remain synchronized across all threads, cores and packages connected to a single PCH.

The time-stamp counter on Nehalem is reset to zero each time the processor package has RESET asserted. From that point onwards the TSC will continue to tick constantly across frequency changes, turbo mode and ACPI C-states. All parts that see RESET synchronously will have their TSC's completely synchronized. This synchronous distribution of RESET is required for all sockets connected to a single PCH. For large, multi-node systems, RESET might not be synchronous.


Other Intel sources indicate that the TSC is shared between cores, but not synchronized among multiple physical CPUs. Otherwise, what is the point of the RDTSCP instruction?

Add a Comment

Have a technical question? Visit our forums. Have site or software product issues? Contact support.