| March 1, 2009 11:00 PM PST | |
Develop a benchmark to measure the execution time of the piece of code you are optimizing. This is essential to determine whether your code changes are helping or hurting performance. Most tuning experts accomplish this using assembly code to access the processor's internal time-stamp counter, a fairly simple task on the IA-32 platform running the Windows* operating system. It is more complex, however, on Linux* and the Intel® Itanium® processor.
Use the IAPerf.h performance-measurement macros. The code below outlines how the macros can be used. The first macro, PERFINITMHZ, takes the clock-speed of the machine in MHz as input. This declares the variables used by the timing code and sets the clockspeed variable so that time can be reported in seconds. (An obvious limitation, since there are ways to obtain the clock speed at run-time). The macros PERFSTART and PERFSTOP both record the value of the time-stamp counter and are to be used as bookends around the code you want to evaluate. Finally, the PERFREPORT macro uses printf() to display the elapsed time between PERFSTART and PERFSTOP.
#include "IAperf.h" |
When optimizing a function within a large application, you may wish to break that function into a small console application to facilitate faster recompilation and easier performance timing. This is an ideal framework for IAPerf.h because it can simply measure the time it takes to loop the function for a fixed number of iterations (typically, the number of times necessary to get a workload that runs for several seconds).
Often, however, you will want to measure some part of an application that cannot be easily separated into a "micro-benchmark." The metric of choice for this is usually the average time spent passing through the region of interest, assuming the code runs repeatedly during the application workload. The IAPerf.h macros will require some modification to gather the average time over many passes through the code, but this should be a simple task for an experienced C programmer. For a GUI application without console output, the reporting mechanism will need to write either to the screen or to a file.
Many more macros could be written to manage the timing data in different ways, but it would be difficult to provide a method that would suit every user. The intent of IAPerf.h is not to provide a black-box solution for all performance-measurement activities, but rather to abstract the task of reading the processor time-stamp counter in different environments.
The full code listing for the IAPerf.h header is included in the article listed below.
Portable Performance Measurement Macros for Intel® Architecture
For more complete information about compiler optimizations, see our Optimization Notice.

