Say your boss comes to you and tells you to ensure that the software project you are working on is energy efficient. (Go ahead, I'll wait while you say it.)
There are all kinds of ideas to be found on the Power Efficiency Community site on how to accomplish this assignment. What I'd like to know is how you prove to your boss that you have accomplished the task. Or, if you've already optimized the you-know-what out of the application, what do you measure to label your software is energy efficient?
With serial applications, the metric of better performance is easy: less time is better. Parallel applications have similar metrics: either run the same workload in less time than an equivalent serial version or run more workloads than the serial code can process in the same time.
I've found that energy efficiency is a little more nebulous. Talking with engineers that deal with this question around Intel, I've gotten three potential answers.
Total Energy Consumed
This is simply the amount of energy used during the run of your application or an average of energy used per time unit (since the needs of the application might fluctuate based on the processing done). Certainly the less total energy used, the more efficient your application will be. On laptops and Ultrabooks and other mobile devices, applications that use less energy will preserve battery life.
But, how much energy does the application NOT have to use to be considered efficient? Is there a minimum achievable level of energy consumption for a given computation? When do you know that there is no more efficiency that can be squeezed from your tuning efforts? With parallel computation we have Amdahl's Law and other theoretical models that can be used to compute an upper bound on the amount of parallelism we might be able to eke out of an application. Is there a similar model for energy efficient performance limits?
Performance per Watt
This measures how much work is done per watt of power. Clearly, if your optimizations result in more work done with the same amount of energy used or the same computation with less energy expended, your application is more efficient than it was before. Most applications will have some easy metric of performance. Things like FLOP/s or pixels rendered or transactions processed or frames per second are all common measures of the work being done in applications.
Again, a relative measure of before and after is easy to keep track of during the tuning process. If your application starts with 10 FAUXtoe-ops per watt and eventually reaches 20 FAUXtoe-ops per watt, your tuning efforts are headed in the right direction. As with the previous measure, I wonder if there is an absolute (theoretical?) value that can be held up as the goal of your optimization efforts? Dependent on what work unit your application uses, of course. And would such an energy efficiency measure need to take into account the target hardware configuration (battery, power supply, processor, chipset, GPU, etc.)?
Application Idle Behavior
One idea that I find repeated over and over is that the processor should be kept idle as much as possible. That is, it should reside in the lowest C-state as much as possible. For an application, this rule of thumb can be summarized with the acronym HUGI (Hurry Up and Get Idle). Do whatever is needed to complete any processing as quickly as possible and then have the application sit idle (and do that as energy efficiently as possible). Thus, the measure is to determine how well an application spends time waiting for user input or some other interrupt.
Judging how efficiently an application executes during its idle time will be as simple as measuring the percentage of low C-states (C3, C6) residency. This metric can be an absolute percentage of execution time or measured relative to the idle state of the system without any other application running, i.e., the quiescent state of the OS. Sounds good for user-interactive applications that can run faster than the user can type or click, but what about heavily compute-intensive executions? When I ran a version of my Akari application, less than 3% of the time was spent in C3 while achieving a 22.56X speedup on 80 threads. With all the parallel tasks that get spawned to inhabit all the threads/cores available, should that application be considered energy efficient?
What do you think?
So, when you read the first two paragraphs, did you think of any of the metrics I outlined here? Maybe it was some variation on a theme or did you have a completely different idea?
Performance was, is, and always will be the driver of the need for software tuning and optimization. Would there ever be a trade-off of lower performance (longer execution time) in order to conserve energy consumed? Perhaps a question for a future blog. For now, though, assuming a preservation or improvement of application performance is a requirement, how do you demonstrate that your application is also efficient in the energy used during execution?