# A useful power-performance metric (Part IIa, the goal)

GOALS AND THE ULTIMATE QUESTION

What is our ultimate goal? What do we want to do with our life? Is life important or just a meaningless exercise in futility? These are very important questions but generally irrelevant to this discussion.

So what is our ultimate goal here in this series of articles? We want to come up with a useful measure of power-performance for an application, such as Joules per instructions for a given app. But as we will see, finding this answer is much less clear cut than it first appears.

Sometimes a useful artifice is to throw away reality and all its encumbrances. Eventually, we’ll have to return to earth. We’re boring engineers are we not?

I’m going to be getting into some stochastic integrals and other types of Lebesgue integration here, so be prepared. Just kidding. (Oh my gosh! I really do know the difference between Riemann and Lebesgue integration. I almost surely have way too much math in my background.) But we are going to get into some math as it’s a compact and convenient notation.

WHAT DO WE REALLY WANT?

This isn’t as obvious as it first appears. Do we want the performance per power (W)? Or performance per energy (J)? Do we focus on the power-performance of only a given application or of a suite of applications? Are we satisfied with the power-performance of the application as a whole or do we want to break it down further? Is our goal to be able to derive the power-performance or just measure it? We’ve got to spend at least a couple of years pondering this question before really being able to delve into its far reaching implications. Of course, we don’t have the time for this intellectual introspection, so we’ll just dive in and wing it.

Let’s address the first issue: What do we want to measure the power-performance of? Well, it’s for an application, isn’t it? Yes but this isn’t specific enough. Do we want to find it for a suite of apps? For all possible apps? For apps running in a given language, say in Javascript on a browser? I can go on and on, but I’d just confuse myself further.

To get a handle on this, let’s consider my ultimate goal. (There’s that pesky “goal” word again.) Being a selfish soul, I’m interested in applications, specifically Windows applications. So, what is the power-performance of a given application. Compared to other applications. On Intel hardware. Compared to other unnamed competitors and their hardware? Though I would like to show how Intel hardware is superior in power-performance, I’m intellectually honest here – neglecting my own natural ignorance, of course. I want a fair and useful comparison.

So here are my constraints. (If any of you want to propose different constraints, please do. I’m not god (note the little “g”) nor even the least bit omniscient.)

Constraint 1: Be able to compare the power-performance of one general application against another general application

Constraint 2: Be able to compare the total power-performance of an application, meaning we have application granularity

Constraint 3: Be able to compare the power-performance across two different pieces of HW

Constraint 4: The HW is limited to the processor

Constraint 5: We’re considering only the entire processor (meaning we’re not going down to individual cores or other processor components)

NEXT TIME: THE METRIC

PS If you have any references on this or any other relevant topic (excepting Lebesgue integration), let me know. I average reading about 2 to 3 papers a week, but that’s way too little for this topic.
Categories:
Tags:
For more complete information about compiler optimizations, see our Optimization Notice.

This sounds interesting. As a developer I am interested in benchmarks for software performance. I realize the importance of reducing the power consumptions of hardware as we miniaturize it. Battery life has been a frustration with me on portable devices.

One other thing you might want to place a constraint on is whether or not to consider processor specific instructions. By this I mean instructions that exists on only one processor and are not considered standard instructions.

Another concern is the offloading of CPU usage to the GPU. With Windows DirectX 11 they have added the Compute Shader that simplifies performing calculations on the GPU. Here is a clip from the Windows DirectX Graphics Documentation (August 2009)

A compute shader is a programmable shader stage that expands Direct3D 11 beyond graphics programming. Like other programmable shaders (vertex and geometry shaders for example), a compute shader is designed and implemented with HLSL but that is just about where the similarity ends. A compute shader provides high-speed general purpose computing and takes advantage of the large numbers of parallel processors on the GPU. The compute shader provides memory sharing and thread synchronization features to allow more effective parallel programming methods.

As you can see this could pull computations from the CPU making the power load measurement between two applications difficult. An inefficient program using the GPU could have a greater inefficiency with both the CPU and GPU. If you compare only the CPU, it might appear more efficient.

Now you can add a constraint, programs can’t make use of the GPU for the test. I do not think this is a good idea.

Our 2007 SIGMOD paper introducing the JouleSort power-performance benchmark may be of interest to you. This is different from what you're looking for in that it's a full-system benchmark with an I/O-centric workload, but the paper spends some time on benchmark design issues similar to those you've highlighted here. Link: http://portal.acm.org/citation.cfm?doid=1247480.1247522

Taylor -- Thanks for writing about a topic of high interest to many tech enthusiasts and of high value to the portable computing community. A group of us were talking about the topic of how a person can knowledgeably choose the most cost-effective portable computing device (laptop, netbook, smartphone, etc) for their particular 'computing' needs. With portable power being a limiting factor for many scenarios, part of our discussion focused on how to measure or quantify the subsystem resources (amount and speed of RAM, 'hard drive'e, cpu speed, cores and voltage/watts, 'gpu resources') being consumed during typical usage sessions. It would be helpful to have a website and/or client utility that would measure these variables, keeping in mind Herr Heisenberg, and allow a person to accurately and usefully evaluate how a given computing device would perform for the particular scenario.

Based on the topic of your post and your general background and interests, it seems you might know of websites or client utilities which most closely approach the 'computing power evaluation' tools described above.

If interested in discussing this a bit more or in sharing what you know about useful sites or utilities, please contact me at bwaldron@gmail.com.

Thanks!

Bob

good

Re: Mike K and GPU (draft)

Mike,

Your point about the GPU is a very important one, though mostly on desktops. Why not on mobile? It has more to do with the nature of the market, I suspect.

What market has a special interest in GPUs? Why do they have that interest? I suspect it’s mostly performance based, such as for graphical realism.

My guess is that this performance is their primary selling point. What about smaller footprint devices? Well, I’m not really sure that the improvement provided by a GPU is that noticeable in that market. As always, correct me if I’m wrong. (I'm excluding notebooks since their footprint are often as large as desktops now days.

Plus I suspect that taken in total, GPU power usage is significant even if per processor quotes are reasonable (or even very good). What happens if the number of processors is multiplied by the per processor power usage (or the MFOPs/processor multiplied by the W/MFOPs multiplied by the number of processors)? How does this impact mobile battery life?

Re: Suzanne Rivoire and JouleSort reference

Suzanne,

Thanks for the reference. I'm reading it now.

(By the way, reading technical literature can be painful for me. I'm one of those who have to read something several times to grock it.)

--
Taylor

Taylor,
Yes, my point is related to the desktop. I agree that with the advent of the netbook tier of mobile computers minimizing what is in it is a priority. This is obvious with your release of the Atom Processor. I do not see this eliminating full fledge laptops for those that need the power of a moderate desktop on the go. Your point of making sure this dose not only apply to the desktop is not lost on me. I program heavily for the PC, but also for a portable CE scanner device at work.
Now I do not know if you are just concerned with the mobile computers or not. I think that even the desktop computers need to look at the power usage. As we add more cores into the processors the heat will increase if the power per core remains the same (you made reference to this in the final paragraph of your reply). I assumed that you guys were looking for a way for programmers to minimize power usage in their applications like they would memory. Also that you guys wanted some method to benchmark your R&D for improved performance with reduced power.
I believe one of the biggest issues we face with computers now is heat. I have not used netbooks so do not have any experience with them. I did have a Samsung Ultra Mobile PC that would get very hot on one side. Computers are being stacked up with fans in order to keep them running. Granted the CPU is but one component (unless of course there are multiple CPUs) in the computer. Both my CPU and GPU have fans on them. The fan on my CPU is much bigger than the fan on my GPU (I do not know how the fans relate with power consumption). This actually brings up another question on power usage. The fan is on the CPU to pull heat away from it. Now I can see the fan being outside of the scope of this.

If you guys want to limit this to mobiles, I can understand. I just think that desktops could benefit from this as well.

Re: Mike K and focus on Desktop

I agree that desktops are important, perhaps the most important from the standpoint of their global impact on the environment and their global financial impact. (I’ve seen the stats before but can’t find them. The data is there, it’s just a matter of wading through all the advertisements and other non-relevant data in Google and Bing.)

In my opinion, it all has to do with the bottom line. Server power efficiency directly impacts the bottom line of the operators (e.g. cooling and electrical infrastructure). Mobile power efficiency directly impacts the bottom line of the OEM (e.g. weight and consumer acceptable battery life). For home users, they generally don’t keep track of the electrical costs, let alone power’s distant affect on the price of their computers and the impact on the environment. This is true for notebooks and desktops alike. IT departments and the desktops they service have a more direct effect on the bottom line of the companies they are part of. But I wonder if this impact isn’t also hidden by the fact that cooling and power distribution are shared between both that supporting their employees and that supporting the computers they us.

There are some efforts to address this problem with desktop systems. Some examples are Energy Star and the CSCI (Climate Savers Computing Initiative).

Re: Mike K and limiting discussion to mobiles

Thanks for mention that I was focusing a lot on mobile. I don’t mean to. It’s because the mobile environment has a more direct cause and effect relationship than desktop. My interest covers both.

Susanna Rivoire’s paper reference above tries to address the broad spectrum of application environments from HPC/Data house to embedded mobiles. (Susanna, please correct me if I’m wrong.) I like that she’s attempting to look at power and performance in a “fair” way across different classes of platforms.

(By the way, I haven't read Susanna's paper in the detail it deserves. I'm starting pass two. I'm a multi-pass reader.)

Taylor,

I think we are on the same page. Sorry about the large solid block of text in my last post.