Develop an Execution-Time Benchmark on 64-Bit Intel Architecture

Submit New Article

March 1, 2009 11:00 PM PST



Challenge

Develop a benchmark to measure the execution time of the piece of code you are optimizing. This is essential to determine whether your code changes are helping or hurting performance. Most tuning experts accomplish this using assembly code to access the processor's internal time-stamp counter, a fairly simple task on the IA-32 platform running the Windows* operating system. It is more complex, however, on Linux* and the Intel® Itanium® processor.


Solution

Use the IAPerf.h performance-measurement macros. The code below outlines how the macros can be used. The first macro, PERFINITMHZ, takes the clock-speed of the machine in MHz as input. This declares the variables used by the timing code and sets the clockspeed variable so that time can be reported in seconds. (An obvious limitation, since there are ways to obtain the clock speed at run-time). The macros PERFSTART and PERFSTOP both record the value of the time-stamp counter and are to be used as bookends around the code you want to evaluate. Finally, the PERFREPORT macro uses printf() to display the elapsed time between PERFSTART and PERFSTOP.

 

#include "IAperf.h"
__..
Foo(_)
{
//initialization code, variable declarations, etc.

PERFINITMHZ(500) // initialize performance measurement macros
// for a 500 MHz machine

//more set up code - not under test

PERFSTART // record time-stamp counter prior to entering
// performance-critical code

//performance-critical code under examination by the user

PERFSTOP // record time-stamp counter after leaving
// performance-critical code

PERFREPORT // report the length of time spent in the
// performance-critical code

//more code not under test

 

When optimizing a function within a large application, you may wish to break that function into a small console application to facilitate faster recompilation and easier performance timing. This is an ideal framework for IAPerf.h because it can simply measure the time it takes to loop the function for a fixed number of iterations (typically, the number of times necessary to get a workload that runs for several seconds).

Often, however, you will want to measure some part of an application that cannot be easily separated into a "micro-benchmark." The metric of choice for this is usually the average time spent passing through the region of interest, assuming the code runs repeatedly during the application workload. The IAPerf.h macros will require some modification to gather the average time over many passes through the code, but this should be a simple task for an experienced C programmer. For a GUI application without console output, the reporting mechanism will need to write either to the screen or to a file.

Many more macros could be written to manage the timing data in different ways, but it would be difficult to provide a method that would suit every user. The intent of IAPerf.h is not to provide a black-box solution for all performance-measurement activities, but rather to abstract the task of reading the processor time-stamp counter in different environments.

The full code listing for the IAPerf.h header is included in the article listed below.


Source

Portable Performance Measurement Macros for Intel® Architecture