Introduction
Learn how to accurately measure events of short duration using the Enhanced Timer.
To measure the performance of an application, it is common to time sections of code where hotspots or bottlenecks occur before and after the code has been optimized. There are many ways to measure the time spent on a section of code. The most common way is to get a stop watch and time the events. The system timer can be used to measure events with an accuracy of 10 ms. This should be fine with events lasting a couple seconds or more. In order to measure the time spent for events less than 10 ms, you need to have a timer that has a finer granularity. With today's processors running at frequencies of more than 1GHz, you can use the processor clock to time events with much higher accuracy than 10 ms. However, with laptops or systems supporting Intel® SpeedStep Technology or Enhanced Intel® SpeedStep Technology, the processor frequency does not stay the same all the time. The frequency will change in response to CPU utilization in an attempt to reduce power consumption when laptops are running on battery. To accurately measure events of short duration, there needs to be a timer that will not change during the measurement period, and it must be more accurate than the system timer. This paper discusses such a kind of timer.
Existing Timers
The following sections explore some of the common timers and their limitations.
Using the Stopwatch
The stopwatch is very easy to use and is convenient. This is a quick way to determine whether the modified code is running faster than the original one. However, this timer accuracy depends greatly on manual accuracy. This method of timing should be used to measure events that last more than five seconds. It may be used for applications that do not need high accuracy.
Using the C Function time
To eliminate human error, the C function "time()" can be used. Calling the function "time()" before and after the section of code and calculating the difference will give the time it took to execute that code. The accuracy of this timer is about +/- 1s. This timer can time events that last up to 79 years.
Using the Multimedia Function timeGetTime
For events that need higher accuracy, the multimedia timer can be used. The name of the function is timeGetTime. This timer is used in the same way as the C runtime function, calling the function timeGetTime before and after the section of code and taking the difference between the two readings. This timer has an accuracy of +/- 10ms and can handle events that last up to 49 days.
Using the Processor Clocks
This timer is very accurate. On a system with a 3GHz processor, this timer can measure events that last less than one nanosecond. The accuracy of this timer on a 3GHz system is +/- 0.333 nanoseconds. This timer, however, cannot be directly accessed using a high level language. It can only be called using the assembly instruction Read Time Stamp Counter (RDTSC). Depending on how the time values are stored, this timer can handle events that can last a very long time. For example, if the time value is stored as a 32-bit value, this timer can measure an event that only runs up to 1.432 seconds. However, if the time is returned as a 64-bit value, it can time the event that spans up to 194 years. There is a drawback with using the processor clock. For example, laptops using Intel® Pentium® II Processors and later have Intel Speedstep Technology built in it. While Speedstep technology is good for conserving power when laptops are running on batteries, it changes the processor frequency. If the frequency changes while the targeted code is running, the final reading will be redundant since the initial and final readings were not taken using the same clock frequency. The number of clock ticks that occurred during this time will be accurate, but the elapsed time will be an unknown.
Enhanced Timer
This Enhanced Timer (Etimer) is based on two Windows* API functions; QueryPerformanceCounter and QueryPerformanceFrequency. There is no way to know which frequency Microsoft is using to implement those two functions on any given platform. But one thing for sure is that the frequency of that timer will never change during the course of timing. The timer can be either the chipset timer or the power management timer or something else. The Etimer is created to meet two goals: first, it can be used as a high precision timer which is accurate to nanoseconds and second it is independent of Speedstep, Enhanced Speedstep technology or similar technologies. The OS will check to see if the system has a high performance clock built-in. If it has and the system has no energy saving mechanism like Speedstep Technology in it, this timer will take advantage of this clock, which is most likely the processor clock. Otherwise, the timer will use another constant frequency clock like the chipset, BIOS or the power management timer. There are things to consider when using this timer in your applications. Since the Etimer uses the system calls QueryPerformanceCounter and QueryPerformanceFrequency, it will incur an overhead associated with system calls. The Etimer also has another overhead associated with the checking mechanism that ensures that all the measurements are taken on the same processor. If the overhead is too much for your applications, you can consider using the processor clock by calling the instruction RDTSC. Let us explore the situation when we can use RDTSC. In multi-processor systems, normally, Windows synchronizes the time stamp counters (TSC's) on all processors. The TSC values are not exactly the same, but they are only off by a few counts. Windows will synchronize the TSC's when the returned value of the function QueryPerformanceFrequency is the same as that of the processor clock. If that is the case, you can safely use the TSC's without worrying which processor they are from. This way you can eliminate all the overheads associated with the mechanism that forces all the measurements to be done on the same processor.
Using the Enhanced Timer
The following sections discuss how you can use the Etimer.
General Usage
The Etimer is simple to use. By calling the function Etime at the beginning of the code section of interest for an initial reading and calling it again at the end of the code section to get a final reading, you can measure the elapsed time between the readings. The results will be stored in the elements Start and Stop of the data structure Etime_t. Following are stepwise instructions on how to use the Etimer in your applications:
- Create a variable of type Etime_t. This data structure is defined in the file Etimer.h.
- Call the function EtimeInitialize to initialize the variable just created in step a) and to call the function EtimeFrequency to get the frequency of the timer.
- Call the function Etime twice, once at the beginning of the code whose performance you want to measure and once at the end to get the initial and final readings.
- Call either the function EtimeDurationInTicks or EtimeDurationInSeconds to calculate the elapsed time.
The next sections will show how to use this timer in multi-threaded applications, and in single and multi-processor systems.
With Multi-threaded Applications in Single-processor Systems
This Etimer can be used in multi-threaded applications. If you want to measure the elapsed time of a particular section of code as executed by multiple threads, each thread needs to create its own copy of the Etime_t type variable to store information. This data type is defined in the file Etimer.h. As long as the Etime_t variables are local to the thread it is created, there shouldn’t be any problem.
With Multi-threaded Applications in Multi-Processor Systems
The Etimer can also be used in Multi-processor systems. Since the time retrieved from the timer will be slightly different on different processors it is important that the start and stop time be taken on the same processor to ensure accuracy. The Etimer was developed to deal with multi processors. It first searches for the available processors and uses the first one available. The processor ID of the processor is used to take the initial reading and is stored in the Etime_t variable so that the final reading will be measured on the same processor. If no processor is available or the processor where the start time was taken is not available, the Etime function will return an error code.
Notes
When using this timer in the applications, make sure to:
- Use the multi-threaded version of the C/C++ library when writing multi-threaded applications.
- Calculate the maximum value the timer can handle before it gets reset. For example, this enhanced timer uses a 64-bit data type to handle the time. Therefore, with the enhanced timer frequency of 3GHz, this timer can handle events that will last up to 194 years.
- The function Etime checks for available processors and selects the first available one. This will generate overhead. However, this overhead will not be affected if you want to check for performance between the original code and the optimized one.
The Structure of the Enhanced Timer
The enhanc ed timer consists of two files: the Etimer.h and the Etimer.lib. The include Etimer.h contains all error messages, the data structure, and the function declaration of the functions in the Etimer library. The following section lists the content of the Etimer.h file.
Etimer.h
Below is the listing of the Etimer.h file:
#include "windows.h"
#define CANNOT_GET_FREQUENCY 1
#define CANNOT_GET_START_TIME 2
#define CANNOT_GET_STOP_TIME 3
#define NO_PROCESSOR_AVAILABLE_TO_GET_START_TIME 4
#define NO_PROCESSOR_AVAILABLE_TO_GET_STOP_TIME 5
#define STOP_TIME_TAKEN_ON_DIFFERENT_PROCESSOR 6
struct Etime_type
{
double Start,
Stop,
Frequency;
DWORD ProcessorMask;
BOOL AlreadyStart;
int ErrorFlag;
};
typedef Etime_type Etime_t;
BOOL Etime(Etime_t *);
BOOL EtimeFrequency(Etime_t *);
BOOL EtimeInitialize(Etime_t *);
ULONGLONG EtimeDurationInTicks(Etime_t *);
double EtimeDurationInSeconds (Etime_t *);
Explanation of Messages
CANNOT_GET_FREQUENCY:
The function EtimeFrequency fails to return the frequency of the enhanced timer.
CANNOT_GET_START_TIME:
The enhanced timer does not exist.
CANNOT_GET_STOP_TIME:
The enhanced timer does not exist.
NO_PROCESSOR_AVAILABLE_TO_GET_START_TIME:
The thread that uses to capture the time cannot run on any available processors.
NO_PROCESSOR_AVAILABLE_TO_GET_STOP_TIME:
The thread that uses to capture the time cannot run on any available processors.
STOP_TIME_TAKEN_ON_DIFFERENT_PROCESSOR:
The thread that uses to capture the time is switched to another processor after it already took the start time.
Data Structure of Etime_t
struct Etime_type
{
double Start,
Stop,
Frequency;
DWORD ProcessorMask;
BOOL AlreadyStart;
int ErrorFlag;
};
typedef Etime_type Etime_t;
The variables Start and Stop are used to record the reading at the beginning and end the code section, respectively. The frequency element contains the frequency of the timer. Since the values of the Start and Stop times are in ticks, they need to be divided by the frequency in order to get the values in seconds. The ProcessorMask is a bit vector in which each bit represents a processor. In ProcessorMask, only the bit corresponding to the processor that the Start and Stop times are recorded is set to one. All other bits are set to zero. It is extremely important to measure both the start and stop times on the same processor in multi-processor environment since the time varies from processor to processor. The function Etime returns an error code if the processor is not the same when capturing times. The Boolean variable AlreadyStart instructs the Etime function that the Start time has been captured and the next time to be recorded is the Stop time. Finally, the ErrorFlag is used to store the error code.
Etimer.lib
Currently there are five functions in the library as shown below:
Etime:
Capture the current time using the enhanced timer.
EtimeFrequency:
Get the frequency of the Enhanced timer.
EtimeInitialize:
Reset the local variables of the structure type Etime_t and to call the function EtimeFrequency.
EtimeDurationInTicks:
Calculate the elapsed time in ticks.
EtimeDurationInSeconds:
Calculate the elapsed time in seconds.
Conclusion
Different timers are used for different purposes. The Etimer is used for applications that need high accuracy and on systems that are affected by power conservation mechanisms like Speedstep technology. Etimer has an overhead involved in checking for processors available in the multi-processor environment. This is to ensure that the initial and final times are taken on the same processor. Etimer also incurs an overhead from using the system call QueryPerformanceCounter. However, if the purpose of the timer is to check for performance of the original code versus the optimized code, the effect of the overhead is canceled out since it occurs in both cases. Also, ensure that you calculate the maximum value that this timer can handle before it gets reset.
Additional Resources
Learn more about the Enhanced Intel SpeedStep® Technology [PDF 358KB].
About the Authors
Paul Work is an Engineering Manager in the Client Enabling Technology division in Hillsboro, Oregon. He joined Intel in 1994 as a Parallel Systems Engineer with the Supercomputer Systems Division, supporting application optimization on Intel Paragon and Teraflops systems at Wright Labs in Ohio, Sandia National Labs in NM, and Lawrence Livermore National Labs in California. Before joining Intel, Paul served as a Computer Engineer in the United States Air Force. Paul holds a BS in Computer Science from the University of Missouri-Rolla and an MS in Computer Engineering (with dual emphasis in Computer Architecture and Software Engineering) from the Air Force Institute of Technology. He can be reached at paul.work@intel.com.
Khang Nguyen is Senior Applications Engineer working with Intel's Software and Solutions Group. He can be reached at khang.t.nguyen@intel.com.
Download the Source Code

Comments
Can I use this in linux?
"This is to ensure that the initial and final times are taken on the same processor. " - Does it mean that SetThreadAffinityMask is being used to make sure the measuremants are taken from the same core and hence incurring additional performance overhead of this system call?
The frequency given by QueryPerformceFrequency does not match the observed frequency applied to the value
returned by QueryPerformanceCounter. It is off by an offset and may also show thermal drift. These deviations are
quite considerable. A few ppm are resulting in errors of a few us/s. The true frequency has to be calibrated agains another time source, the system time for example. This is needed to "phase lock" the performance counter frequency to the system time.