Question: Estimating power performance on newer architectures

One of my greatest problems, outside of my tendency toward wordiness, is over researching a topic. Sometimes this research gets to the point where I not only find the answer myself, but have had so much time pass that the topic is no longer relevant.

So in an attempt to break this cycle of obsessive behavior, I’m going to wing it instead. I’m going to pose several questions, and then put forth poorly thought out and embarrassing propositions. In the mean time, behind the scenes, I’ll continue my obsessive behavior and pursue preparing a more didactic and pontifical blog entry (looking at which benchmarks are useful for measuring performance when considering power).


There have been many attempts in the last twenty years to come up with a methodology for estimating the power usage of an application. (Yes indeed, power is a fairly old topic. Lately, it has become increasingly important with the expansive use of mobile and un-tethered computers, such as notebooks, smart phones and intelligent embedded systems.) Though all of these methodologies claim success, some have been more successful than others. The usual methodology defines classes of kernels. A kernel is a simple program used to illustrate a certain use or characteristic. In the case of power, these kernels represent a given common sequence of one or more machine language operations, such as taking values from two registers, performing an integer add, and then placing the result into a register. These operations are performed in (what is essentially) an infinite loop and the current draw (as in Amps) measured. From this, power and total energy are derived. These methodologies use the infinite loop to not only emphasize the kernel being studied, but to also avoid any transients at startup. They also define various “sub” kernels to measure power when the processor performs the instruction under different memory scenarios. The usual scenarios are (1) register to register, (2) pure memory to memory, and (3) cached references. These are then used to estimate the energy usage of a program, such as bcopy. Lastly, the resulting estimate is validated using experiment.

These types of methodologies have been applied to traditional embedded, mobile and general purpose computer environments. It has been done for Pentium, ARM and other architectures.

Most of the works that I’ve read concerning these methodologies have been applied to architectures predating the current generation that has all this sophisticated HW and SW power management. Certainly, the kernel and validating applications have been almost all compute intensive and relatively simple.

Here is the basic question that I’m tossing on the table for discussion:

THE BIG QUESTION: How do these methodologies (put references here) apply to modern architectures, with sophisticated HW and SW power management?

By SW power management, I mean that the HW works with the SW through a power management policy engine. For example, in Windows Core 2 Duo systems, the OS has a low-level power management engine that decided at what points to drop the system into a lower C-state. And I am not talking about the high-level Windows Power Profiles.

To add some specificity to this question, let’s look at the Core i7, aka Nehalem. It has the Core 2 duo’s ability to drop into both various P-states (lower voltage and frequency states) and C-states (lower power idle states). Also, its sophisticated modern pipeline is able to perform out of order execution, work with multipurpose ALUs, do sophisticated branch prediction, etc.

How must these methodologies for estimating application energy usage be modified to account for newer architectures? Must we select a different set of kernels that will more accurately characterize power consumption? Is it even possible to drive the energy consumption from such a set of kernels?

Next: My embarrassing attempt to motivate discuss by putting forth some ideas.
For more complete information about compiler optimizations, see our Optimization Notice.