Energy / Power measurement wish list

By Taylor Kidd (Intel) (14 posts) on September 5, 2008 at 8:34 am

I got such a good response from my previous post, that I decided to pose another question to my select and invisible audience.

If you could come up with a wish list, what power related measurements would you like to get from the (computer) platform, as well as from the processor itself?

Now, let's be a little creative. Saying something like, "I'd like to measure the power of an application," doesn't really tell us anything. Break things down a little. Make it a little more concrete.

Here's an example of a better question. "I'd like to measure the energy consumed by my such and such encoding function." Or, "you know, it'd be great if I could measure such and such so that I could predict the energy performance of a certain algorithm."

So what do you want Santa to bring?

 

Categories: Gaming, Mobility, Parallel Prog. & Multi-Core, Visual Computing

Comments (4)

September 7, 2008 12:18 PM PDT


Chris
SIMD unit ?
By the way, I really don't like th idea of loosing MMX with the introduction of SSE4. I still use it to enhance performance.
October 1, 2008 3:45 PM PDT

Amanda Marvel (Intel)
Total Points:
1,505
Status Points:
1,005
Brown Belt
Hi Taylor- good question. Additional things to consider are how the energy is measured and whether at peak or average over a certain period of time while the most-compute intensive portion of the application is being pushed to its limit.
December 4, 2008 5:55 PM PST


David Snowdon
My apologies for a massive delay in my reply to this email. I'd hoped to be able to do some real work on specifically what we would need (more to follow), but have ended up being delayed by various issues (ironically, ones that would easily be avoided by small additions to the hardware).

#1 on my wish list is a performance counter, or set of performance counters, that could be used to build a model of exactly how a set of instructions would behave at a different frequency (or under different operating conditions such as a reduced cache size, etc). The paper that I talked about before (http://ertos.nicta.com.au/publications/papers/Snowdon_PH_07.pdf) tries to build these models based on the performance counters which were intended for developer feedback. These have limitations, and don't model some features of the system. In fact, a large part of my work has involved developing methods for working out which performance counters are the best predictors.

Since that paper we have done a lot of work on x86 (vs. XScale). The much more complicated x86 architecture also requires a much more complicated model. Since the information isn't actually available, and there are the same number of performance counters available, it is impossible to build an appropriate model.

An excellent example of this is the effectiveness of pre-fetching. Running at a low frequency, the pre-fetching mechanism has lots of time to go away and come back with some data, and when it does turn out to be useful, the system runs quite quickly (i.e. the delay on memory is lower). At higher frequencies, there is less time for pre-fetching, and the system stalls, waiting on memory. (i.e. the delay on memory is higher). How effective pre-fetching will be for a given workload at an alternate frequency is presently very difficult to measure. The same applies to fundamental ILP issues.

Similarly, having more performance counters available would allow for more detailed energy models of the system. As the hardware manufacturer, you guys must have the information required to do the analyses to come up with near-perfect models for both performance and power. While I don't have the specific features of the processor that are required (I thought that something like the number of cycles saved via pre-fetching would be one very useful counter -- it could potentially be constructed by counting the number of memory accesses which did not benefit from pre-fetching, the number of accesses that did benefit from pre-fetching, and the number of cycles associated with each of those accesses. Then you could subtract the average number of cycles for pre-fetched memory from the number of cycles for non-pre-fetched memory.

Anyway. These are just ideas that I've been throwing around. We have a new paper which takes it to the next level, assuming that you have good models for both performance and energy at an alternate frequency... what do you do? I'll post here if its accepted.

Dave.
July 27, 2009 6:02 AM PDT


nagaraju
hi,
can i know whether we can know the energy or power by an level instruction which is usually generated when u run an application
With Regards
Nagaraju

Trackbacks (0)


Leave a comment  

To obtain technical support, please go to Software Support.
Name (required)*

Email (required; will not be displayed on this page)*

Your URL (optional)


Comment*