<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Blogs &#187; Taylor Kidd (Intel)</title>
	<atom:link href="http://software.intel.com/en-us/blogs/author/taylor-kidd/feed/" rel="self" type="application/rss+xml" />
	<link>http://software.intel.com/en-us/blogs</link>
	<description></description>
	<lastBuildDate>Fri, 25 May 2012 22:49:19 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.3</generator>
		<item>
		<title>Use cases for Power Management</title>
		<link>http://software.intel.com/en-us/blogs/2010/06/30/use-cases-for-power-management/</link>
		<comments>http://software.intel.com/en-us/blogs/2010/06/30/use-cases-for-power-management/#comments</comments>
		<pubDate>Thu, 01 Jul 2010 00:39:05 +0000</pubDate>
		<dc:creator>Taylor Kidd (Intel)</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2010/06/30/use-cases-for-power-management/</guid>
		<description><![CDATA[I’m collecting power management (PM) use cases in a broad sense, meaning I’m not just limiting myself to client use cases. I figure that trying to limit this investigation to client is foolhardy at this point given the early state of PM technology. ]]></description>
			<content:encoded><![CDATA[<p>This is going to be a short one (after another long dramatic pause in my activity).</p>
<p>I am tasked by an industrial working group to come up with a draft set of use cases to define a client power management API.</p>
<p>I’m collecting power management (PM) use cases in a broad sense, meaning I’m not just limiting myself to client use cases. I figure that trying to limit this investigation to client is foolhardy at this point given the early state of PM technology. And in any event, there isn’t any way that client PM use cases are not going to be impacted by the PM policies of IT departments, servers, OSs, etc.</p>
<p>Before I pose my question, let me first give you a rough idea of what I mean by a use case, and what a client PM API is.</p>
<p><span style="text-decoration: underline">A use case</span>: It is simply an example of how something will be used and what is expected to happen when it is used. For example, a use case might be “I can drive from Grandma’s to Little Red’s house without stopping and in less than 3 hours producing no pollution.” Notice that this is not, “An electric car weighting at least 1500 kg and that uses no more than 60 kWhr whose batteries weight no more than 500 kg and can drive at least 200 km in less than 180 minutes.” Use cases are written in the language of the person using the product, in this case, Grandma. If Grandma rolls her eyes or starts snoring when the use cases are read back to her, it isn’t a use case. Afterwards, the experienced architect takes those use cases and translates them into something an engineer can design to (i.e. requirements).</p>
<p><span style="text-decoration: underline">A client power management API</span>: Ultimately, I’m interested in an API for exchanging information and actions between the underlying OS/HW and the application. But that’s too far down the road. We have to know what the problem / goal is before we can define an API that permits a solution.</p>
<p>Right now I’m satisfied with just understanding the scope and details of the PM problem.</p>
<p>This is where Grandma and her use cases come in.</p>
<p>There are a lot of grandmas in the PM world, and many of them are some very sophisticated old bitties. They range from the grandma who wants her gaming system to fit into her fixed income budget and cost her no more than $50/mo, to the IT professional who has to fit as many servers as possible into 3000 sq ft distributed over two rooms with only 100k BTU of air. Both the grandma and the IT professional are the users most directly impacted by PM and so will define the use cases.</p>
<p>So far, I’ve talked to hard core server engineers, client side app writers, IT professionals, and some people involved in government regulation. What I’ve been told so far is that someone needs to do this work but it hasn’t been done so far except in some limited cases.</p>
<p>So I’m putting this question out to the ether: You app programmers, HW designers, OS engineers, IT professionals, building air conditioning engineers, grandmas, etc., what are the PM use cases that:</p>
<ol>
<li>Most annoy you,</li>
<li>You don’t believe people are thinking about, and</li>
<li>You see coming down the pike and that are going to blindside us.</li>
</ol>
<p>If you have another observation that doesn’t fit into those categories, bring it up. This is a wide open question that hasn’t been adequately addressed. And it won’t be unless you say something.</p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2010/06/30/use-cases-for-power-management/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Performance-power in client applications: looking at JouleSort</title>
		<link>http://software.intel.com/en-us/blogs/2010/02/03/performance-power-in-client-applications-looking-at-joulesort/</link>
		<comments>http://software.intel.com/en-us/blogs/2010/02/03/performance-power-in-client-applications-looking-at-joulesort/#comments</comments>
		<pubDate>Wed, 03 Feb 2010 23:44:31 +0000</pubDate>
		<dc:creator>Taylor Kidd (Intel)</dc:creator>
				<category><![CDATA[Game Development]]></category>
		<category><![CDATA[Graphics & Media]]></category>
		<category><![CDATA[Mobility]]></category>
		<category><![CDATA[Parallel Programming]]></category>
		<category><![CDATA[client apps]]></category>
		<category><![CDATA[power management]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2010/02/03/performance-power-in-client-applications-looking-at-joulesort/</guid>
		<description><![CDATA[Below, I've included my usual summary of any paper I read. Yep, I go through this trouble with everything, no matter how trivial. Of course, usually, this summary is private, sequestered away behind vales of security, not the least of which is the fact that no one in their right mind would be interested in [...]]]></description>
			<content:encoded><![CDATA[<p>Below, I've included my usual summary of any paper I read. Yep, I go through this trouble with everything, no matter how trivial. Of course, usually, this summary is private, sequestered away behind vales of security, not the least of which is the fact that no one in their right mind would be interested in these summaries.</p>
<p>I’m posting this as a good starting point. It’s a well written paper (in my opinion) and a very enjoyable read. (This should show you how much of a life I have.)</p>
<p>(For those more observant, you’ll notice that this summary is missing an important section. I make it a habit of sticking to “just the facts, ma’am” with any summary that is in the least bit public. For example, that same section is missing in any summary of a paper that includes an Intel author. Yes, I admit it. I’m a coward.)</p>
<p>For those of you who are more familiar with this work than I, feel free to correct anything that is misstated or misunderstood. I appreciate the correction and promise to be only a little embarrassed.</p>
<p>After posting this, I’ll pose several questions and (hopefully) start a multisided conversation on how we might look at the power-performance of client applications.</p>
<p> ===========================</p>
<p>Rivoiri, S., MA Shah, P Ranganathan, C Kozyrakis, “JouleSort: A Balanced Energy-Efficiency Benchmark,” SIGMOD ’07, June 12-14, 2007, Beijing, China.</p>
<p><em>Abstract: The energy efficiency of computer systems is an important concern in a variety of contexts. In data centers, reducing energy use improves operating cost, scalability, reliability, and other factors. For mobile devices, energy consumption directly affects functionality and usability. We propose and motivate JouleSort, an external sort benchmark, for evaluating the energy efficiency of a wide range of computer systems from clusters to handhelds. We list the criteria, challenges, and pitfalls from our experience in creating a fair energy efficiency benchmark. Using a commercial sort, we demonstrate a JouleSort system that is over 3.5x as energy-efficient as last year’s estimated winner. This system is quite different from those currently used in data centers. It consists of a commodity mobile CPU and 13 laptop drives, connected by server-style I/O interfaces.</em></p>
<p>SUMMARY:</p>
<p>The authors introduce and give rational for JouleSort, their proposed benchmark for looking at power efficiency from a whole system perspective. A brief history is given of other work, such as energy-delay product metrics and processor centric benchmarks. JouleSort is justified, noting that it applies to all scales of systems, ranging from notebooks to large enterprise systems. It is I/O and peak use centric and so is representative of the most common applications. It allows “fair” comparisons within system classes.</p>
<p>They then look at previous benchmarks (such as MinuteSort, Pennysort and Terabyte), discuss various optimization criteria (such as #/sec, #/$ and #/J), and historical trends of performance vs power efficiency (noting that improvements in energy usage have lagged significantly). A set of criteria is established for the development of JouleSort, e.g. peak-use, holistic, fair, inclusive, portable, simple and able to be used to look at trends.</p>
<p>They propose a sort based benchmark (random 100B records w/ 10B keys) and justify why it is representative (in an abstract sense) for portable, desktop and enterprise environments. Various metrics used in the past to study power-performance (fix energy, fixed time) are discussed, and a justification presented for why the JouleSort metric is based upon fixed input size. The methodology and its justification are described (e.g. using wall power and not accounting for cooling system costs).</p>
<p>A series of experiments is presented to demonstrate the usage and validity of JouleSort (unbalanced server and balanced server). They use JouleSort to come up with a system with the best configuration of performance-power, and then compare it to pure performance (NSort), and performance-cost (PennySort) metrics. Finally, JouleSort is used to look at how this system’s energy efficiency scales with system size. Lastly, JouleSort is used to explore various issues, such as the importance of and balance that exists between I/O, processor, memory and VR.</p>
<p>At the end, they discuss related work, and briefly discuss JouleSort’s limitations (e.g. doesn’t really address multi-media), and potential uses (e.g. to meet “green” requirements).</p>
<p>NOTE: An updated and condensed version of this paper is included in “Advances in Computers, Volume 75: Computer Performance Issues, ed. M. Zelkowitz, 2009.</p>
<p>ALSO NOTE: A much extended version of this paper exists in yet another document.</p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2010/02/03/performance-power-in-client-applications-looking-at-joulesort/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A useful power-performance metric (Part IIa, the goal)</title>
		<link>http://software.intel.com/en-us/blogs/2009/12/30/a-useful-power-performance-metric-part-iia-the-goal/</link>
		<comments>http://software.intel.com/en-us/blogs/2009/12/30/a-useful-power-performance-metric-part-iia-the-goal/#comments</comments>
		<pubDate>Wed, 30 Dec 2009 22:22:59 +0000</pubDate>
		<dc:creator>Taylor Kidd (Intel)</dc:creator>
				<category><![CDATA[Game Development]]></category>
		<category><![CDATA[Graphics & Media]]></category>
		<category><![CDATA[Mobility]]></category>
		<category><![CDATA[Parallel Programming]]></category>
		<category><![CDATA[client applications]]></category>
		<category><![CDATA[Power Efficiency]]></category>
		<category><![CDATA[power management]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2009/12/30/a-useful-power-performance-metric-part-iia-the-goal/</guid>
		<description><![CDATA[So what is our ultimate goal here in this series of articles? We want to come up with a useful measure of power-performance for an application, such as Joules per instructions for a given app. But as we will see, finding this answer is much less clear cut than it first appears.]]></description>
			<content:encoded><![CDATA[<p>GOALS AND THE ULTIMATE QUESTION</p>
<p>What is our ultimate goal? What do we want to do with our life? Is life important or just a meaningless exercise in futility? These are very important questions but generally irrelevant to this discussion.</p>
<p>So what is our ultimate goal here in this series of articles? We want to come up with a useful measure of power-performance for an application, such as Joules per instructions for a given app. But as we will see, finding this answer is much less clear cut than it first appears.</p>
<p>Sometimes a useful artifice is to throw away reality and all its encumbrances. Eventually, we’ll have to return to earth. We’re boring engineers are we not?</p>
<p>I’m going to be getting into some stochastic integrals and other types of Lebesgue integration here, so be prepared. Just kidding. (Oh my gosh! I really do know the difference between Riemann and Lebesgue integration. I almost surely have way too much math in my background.) But we are going to get into some math as it’s a compact and convenient notation.</p>
<p>WHAT DO WE REALLY WANT?</p>
<p>This isn’t as obvious as it first appears. Do we want the performance per power (W)? Or performance per energy (J)? Do we focus on the power-performance of only a given application or of a suite of applications? Are we satisfied with the power-performance of the application as a whole or do we want to break it down further? Is our goal to be able to derive the power-performance or just measure it? We’ve got to spend at least a couple of years pondering this question before really being able to delve into its far reaching implications. Of course, we don’t have the time for this intellectual introspection, so we’ll just dive in and wing it.</p>
<p>Let’s address the first issue: What do we want to measure the power-performance of? Well, it’s for an application, isn’t it? Yes but this isn’t specific enough. Do we want to find it for a suite of apps? For all possible apps? For apps running in a given language, say in Javascript on a browser? I can go on and on, but I’d just confuse myself further.</p>
<p>To get a handle on this, let’s consider my ultimate goal. (There’s that pesky “goal” word again.) Being a selfish soul, I’m interested in applications, specifically Windows applications. So, what is the power-performance of a given application. Compared to other applications. On Intel hardware. Compared to other unnamed competitors and their hardware? Though I would like to show how Intel hardware is superior in power-performance, I’m intellectually honest here – neglecting my own natural ignorance, of course. I want a fair and useful comparison.</p>
<p>So here are my constraints. (If any of you want to propose different constraints, please do. I’m not god (note the little “g”) nor even the least bit omniscient.)</p>
<p>Constraint 1: Be able to compare the power-performance of one general application against another general application</p>
<p>Constraint 2: Be able to compare the total power-performance of an application, meaning we have application granularity</p>
<p>Constraint 3: Be able to compare the power-performance across two different pieces of HW</p>
<p>Constraint 4: The HW is limited to the processor</p>
<p>Constraint 5: We’re considering only the entire processor (meaning we’re not going down to individual cores or other processor components)</p>
<p>NEXT TIME: THE METRIC</p>
<p>PS If you have any references on this or any other relevant topic (excepting Lebesgue integration), let me know. I average reading about 2 to 3 papers a week, but that’s way too little for this topic.</p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2009/12/30/a-useful-power-performance-metric-part-iia-the-goal/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Question: Estimating power performance on newer architectures</title>
		<link>http://software.intel.com/en-us/blogs/2009/12/18/question-estimating-power-performance-on-newer-architectures/</link>
		<comments>http://software.intel.com/en-us/blogs/2009/12/18/question-estimating-power-performance-on-newer-architectures/#comments</comments>
		<pubDate>Fri, 18 Dec 2009 21:01:17 +0000</pubDate>
		<dc:creator>Taylor Kidd (Intel)</dc:creator>
				<category><![CDATA[Game Development]]></category>
		<category><![CDATA[Graphics & Media]]></category>
		<category><![CDATA[Mobility]]></category>
		<category><![CDATA[Parallel Programming]]></category>
		<category><![CDATA[applications]]></category>
		<category><![CDATA[power management]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2009/12/18/question-estimating-power-performance-on-newer-architectures/</guid>
		<description><![CDATA[How must these methodologies for estimating application energy usage be modified to account for newer architectures? Must we select a different set of kernels that will more accurately characterize power consumption? Is it even possible to drive the energy consumption from such a set of kernels?]]></description>
			<content:encoded><![CDATA[<p>One of my greatest problems, outside of my tendency toward wordiness, is over researching a topic. Sometimes this research gets to the point where I not only find the answer myself, but have had so much time pass that the topic is no longer relevant.</p>
<p>So in an attempt to break this cycle of obsessive behavior, I’m going to wing it instead. I’m going to pose several questions, and then put forth poorly thought out and embarrassing propositions. In the mean time, behind the scenes, I’ll continue my obsessive behavior and pursue preparing a more didactic and pontifical blog entry (looking at which benchmarks are useful for measuring performance when considering power).</p>
<p>BACKGROUND:</p>
<p>There have been many attempts in the last twenty years to come up with a methodology for estimating the power usage of an application. (Yes indeed, power is a fairly old topic. Lately, it has become increasingly important with the expansive use of mobile and un-tethered computers, such as notebooks, smart phones and intelligent embedded systems.) Though all of these methodologies claim success, some have been more successful than others. The usual methodology defines classes of kernels. A kernel is a simple program used to illustrate a certain use or characteristic. In the case of power, these kernels represent a given common sequence of one or more machine language operations, such as taking values from two registers, performing an integer add, and then placing the result into a register. These operations are performed in (what is essentially) an infinite loop and the current draw (as in Amps) measured. From this, power and total energy are derived. These methodologies use the infinite loop to not only emphasize the kernel being studied, but to also avoid any transients at startup. They also define various “sub” kernels to measure power when the processor performs the instruction under different memory scenarios. The usual scenarios are (1) register to register, (2) pure memory to memory, and (3) cached references. These are then used to estimate the energy usage of a program, such as bcopy. Lastly, the resulting estimate is validated using experiment.</p>
<p>These types of methodologies have been applied to traditional embedded, mobile and general purpose computer environments. It has been done for Pentium, ARM and other architectures.</p>
<p>Most of the works that I’ve read concerning these methodologies have been applied to architectures predating the current generation that has all this sophisticated HW and SW power management. Certainly, the kernel and validating applications have been almost all compute intensive and relatively simple.</p>
<p>Here is the basic question that I’m tossing on the table for discussion:</p>
<p>THE BIG QUESTION: How do these methodologies (put references here) apply to modern architectures, with sophisticated HW and SW power management?</p>
<p>By SW power management, I mean that the HW works with the SW through a power management policy engine. For example, in Windows Core 2 Duo systems, the OS has a low-level power management engine that decided at what points to drop the system into a lower C-state. And I am not talking about the high-level Windows Power Profiles.</p>
<p>To add some specificity to this question, let’s look at the Core i7, aka Nehalem. It has the Core 2 duo’s ability to drop into both various P-states (lower voltage and frequency states) and C-states (lower power idle states). Also, its sophisticated modern pipeline is able to perform out of order execution, work with multipurpose ALUs, do sophisticated branch prediction, etc.</p>
<p>How must these methodologies for estimating application energy usage be modified to account for newer architectures? Must we select a different set of kernels that will more accurately characterize power consumption? Is it even possible to drive the energy consumption from such a set of kernels?</p>
<p>Next: My embarrassing attempt to motivate discuss by putting forth some ideas.</p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2009/12/18/question-estimating-power-performance-on-newer-architectures/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Performance per Watt: Hey, I already know it’s important, don’t I? (The intro, part I)</title>
		<link>http://software.intel.com/en-us/blogs/2009/10/22/performance-per-watt-hey-i-already-know-its-important-dont-i-the-intro-part-i/</link>
		<comments>http://software.intel.com/en-us/blogs/2009/10/22/performance-per-watt-hey-i-already-know-its-important-dont-i-the-intro-part-i/#comments</comments>
		<pubDate>Thu, 22 Oct 2009 19:41:10 +0000</pubDate>
		<dc:creator>Taylor Kidd (Intel)</dc:creator>
				<category><![CDATA[Game Development]]></category>
		<category><![CDATA[Graphics & Media]]></category>
		<category><![CDATA[Mobility]]></category>
		<category><![CDATA[Parallel Programming]]></category>
		<category><![CDATA[applications]]></category>
		<category><![CDATA[energy efficiency]]></category>
		<category><![CDATA[Power Efficiency]]></category>
		<category><![CDATA[power management]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2009/10/22/performance-per-watt-hey-i-already-know-its-important-dont-i-the-intro-part-i/</guid>
		<description><![CDATA[What is performance per Watt? Performance per Watt is pretty straight forward when you first look at it. Then you begin to sink in the quicksand you’ve blithely walked into. The panic sets in as you sink lower and lower. Eventually you decide to ignore the whole complicated mess and go back to saying to [...]]]></description>
			<content:encoded><![CDATA[<p>What is performance per Watt?</p>
<p>Performance per Watt is pretty straight forward when you first look at it. Then you begin to sink in the quicksand you’ve blithely walked into. The panic sets in as you sink lower and lower. Eventually you decide to ignore the whole complicated mess and go back to saying to yourself how straight forward it is. Of course, deep within your heart of hearts, you know that it’s not.</p>
<p>For most of us, performance per Watt is nothing more than how much our computer can get done on a given battery charge.</p>
<p>Let’s dissect this a little further and try to get down to something a little more concrete. The real problem with the above very general description is that it makes intuitive sense but not engineering sense. We need to take it apart and put it in more engineering terms.</p>
<p>A Watt is how much energy you’re using per second. It’s the rate of energy consumption. Why is this important? Well, are we asking about how much our computer can get done given so much energy (Joules)? Or are we asking how much our computer can get done when fed energy at a certain rate (Watts)? What’s the difference? The first is easier to understand. Let’s say we’re using a laptop. Then the first asks how much can we get done for a certain battery size.</p>
<p>So what’s wrong with the second? It’s a rate. You might say that to get performance per Watt, all we have to do is to divide the number of cycles executed over the life of our battery with the energy in the battery. Even neglecting the fact that we haven’t quantified what “performance” is, we run into a problem. Rates are good when considering steady state situations, but typical client usage – servers are different – is anything but steady state. This means that performance per Watt is dependent upon a whole lot of factors. These are things like the type of user / application suite you typically run, the OS you use and its power policy, your processor architecture, the peripherals you have, etc. It gets messy fast.</p>
<p>And we haven’t even tried to figure out what “performance” means in the context of power.</p>
<p>So what’s the conclusion? Do we forget performance per rate of energy usage (Watt) and just go with how much we can get done given so much energy (e.g. how big of a battery you have)? Unfortunately not. If we can quantify the rate of energy consumption then we can theoretically calculate the energy consumed by a whole host of different users, e.g. business vs home users, nerd vs coffee shop users, etc.</p>
<p>Next: A high-level look at performance</p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2009/10/22/performance-per-watt-hey-i-already-know-its-important-dont-i-the-intro-part-i/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Performance per Watt: Hey, I already know it’s important, don’t I? (The preface)</title>
		<link>http://software.intel.com/en-us/blogs/2009/10/14/performance-per-watt-hey-i-already-know-its-important-dont-i-the-preface/</link>
		<comments>http://software.intel.com/en-us/blogs/2009/10/14/performance-per-watt-hey-i-already-know-its-important-dont-i-the-preface/#comments</comments>
		<pubDate>Wed, 14 Oct 2009 23:23:26 +0000</pubDate>
		<dc:creator>Taylor Kidd (Intel)</dc:creator>
				<category><![CDATA[Game Development]]></category>
		<category><![CDATA[Graphics & Media]]></category>
		<category><![CDATA[Mobility]]></category>
		<category><![CDATA[Parallel Programming]]></category>
		<category><![CDATA[applications]]></category>
		<category><![CDATA[energy efficiency]]></category>
		<category><![CDATA[Power Efficiency]]></category>
		<category><![CDATA[power management]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2009/10/14/performance-per-watt-hey-i-already-know-its-important-dont-i-the-preface/</guid>
		<description><![CDATA[One of the big advantages of blogging is that I can write about anything I want, as long is it doesn’t violate any Intel conduct rules, refer to any competitors, reveal any confidential information, insult anyone, compliment people (yeah, I said compliment), and so on and so on and so on. Outside of that, I [...]]]></description>
			<content:encoded><![CDATA[<p>One of the big advantages of blogging is that I can write about anything I want, as long is it doesn’t violate any Intel conduct rules, refer to any competitors, reveal any confidential information, insult anyone, compliment people (yeah, I said compliment), and so on and so on and so on. Outside of that, I have a free hand.</p>
<p>This next series of articles is a case in point. I’m going to talk about performance per Watt. Why? Because I’m thinking about it. And I’m working on setting up some experiments. And I want to exploit…uh, I mean take advantage…of all you wonderful people out there to figure out what the heck I’m doing.</p>
<p>In this first article, I’m going to provide motivation on why performance per Watt, or something equivalent, is important. And I’m going to write it even if many (or most) of you think it’s obvious.</p>
<p>This brings up the flip side of blogging. A big advantage of being a blog reader is that you don’t have to listen to me repeat the apparently obvious or the boring.</p>
<p>So let’s have at it.</p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2009/10/14/performance-per-watt-hey-i-already-know-its-important-dont-i-the-preface/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Why P scales as C*V^2*f is so obvious (pt 2)</title>
		<link>http://software.intel.com/en-us/blogs/2009/08/25/why-p-scales-as-cv2f-is-so-obvious-pt-2-2/</link>
		<comments>http://software.intel.com/en-us/blogs/2009/08/25/why-p-scales-as-cv2f-is-so-obvious-pt-2-2/#comments</comments>
		<pubDate>Tue, 25 Aug 2009 23:06:14 +0000</pubDate>
		<dc:creator>Taylor Kidd (Intel)</dc:creator>
				<category><![CDATA[Game Development]]></category>
		<category><![CDATA[Graphics & Media]]></category>
		<category><![CDATA[Mobility]]></category>
		<category><![CDATA[Parallel Programming]]></category>
		<category><![CDATA[cpu power]]></category>
		<category><![CDATA[processor power]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2009/08/25/why-p-scales-as-cv2f-is-so-obvious-pt-2-2/</guid>
		<description><![CDATA[THE GORY DETAILS Let’s continue from where we left off last time. Let’s figure out the why of the equation, P = C * V^2 * (a * f) To do this, we’re going to have to look at what is going on in one of the fundamental building blocks (a CMOS inverter) of an [...]]]></description>
			<content:encoded><![CDATA[<p class="MsoNormal">THE GORY DETAILS</p>
<p class="MsoNormal">Let’s continue from where we left off last time. Let’s figure out the why of the equation,</p>
<p class="MsoNormal" align="left" style='left'>
<i>P = C * V^2 * (a * f)</i></p>
<p class="MsoNormal">To do this, we’re going to have to look at what is going on in one of the fundamental building <br />
blocks (a CMOS inverter) of an integrated circuit (IC).</p>
<p class="MsoNormal">So when and how does this circuit dissipate power? </p>
<p class="MsoNormal" align="center" style='center'><!--[if gte vml 1]&gt;--></p>
<p><a href="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2009/08/logicgate1_org.jpg"><img src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2009/08/logicgate1_org.jpg" alt="" width="267" height="279" class="alignnone size-medium wp-image-9274" /></a></p>
<p class="MsoNormal">Before getting into the math, let’s get our variables right.</p>
<p class="MsoNormal"><span class="SpellE"><i>V<sub>dd</sub></i></span> is the voltage across the gate</p>
<p class="MsoNormal"><span class="SpellE"><i>I<sub>peak</sub></i></span> is the peak short circuit current going through the gate when it switches state (0 to 1 or 1 to 0)</p>
<p class="MsoNormal"><span class="SpellE"><i>I<sub>leakage</sub></i></span> is the current through the gate even when it is reverse biased (i.e. in a 0 or a 1 state)</p>
<p class="MsoNormal"><i>C<sub>L</sub></i> is the capacitance of one transistor</p>
<p class="MsoNormal"><span class="SpellE"><span class="GramE"><i>t<sub>s</sub></i></span></span> is the switching time needed to change the state of the switch</p>
<p class="MsoNormal"><span class="SpellE"><span class="GramE"><i>f<sub>g</sub></i></span></span><span class="GramE"><i>=</i></span><i>1/T<sub>g</sub></i><span style='italic'> is the maximum rate that the gate can cycle at in our processor. In other words, it is the gate’s clock frequency.</span></p>
<p class="MsoNormal">&nbsp;</p>
<p class="MsoNormal">Let’s start out with the basic CMOS gate, see above. There are three current paths, one going through the gate, one charging the capacitance of the gate, and one resulting from leakage through a reverse biased gate. </p>
<p class="MsoNormal">The one going through the gate results from the brief time that both semi-conductor transistors are closed causing a short circuit. In an ideal world, the switch would be instantaneous and there would be no current flow, and hence no power loss. But this isn’t a perfect world. There is a brief period of time, the switching time <span class="SpellE"><i>t<sub>s</sub></i></span>, when we’ve a short circuit. The power is going to be the voltage across the open circuit, <span class="SpellE"><i>V<sub>dd</sub></i></span>, multiplied by the current, <span class="SpellE"><i>I<sub>peak</sub></i></span>. (We’re using the peak current since this is going to give us an upper bound on the power dissipated.) Say the open circuit exists for <span class="SpellE"><i>t<sub>s</sub></i></span>. Then the total energy lost is bounded by</p>
<p class="MsoNormal"><i>Energy loss due to open circuit &lt;= <span class="SpellE">V<sub>dd</sub></span>* <span class="SpellE">I<sub>peak</sub></span> * <span class="SpellE">t<sub>s</sub></span><sup></sup></i></p>
<p class="MsoNormal">&nbsp;</p>
<p class="MsoNormal">Let’s now look at the energy lost resulting from leakage through a reversed bias gate. Since <span class="SpellE"><i>t<sub>s</sub></i></span><span style='italic'> is small compared to </span><span class="SpellE"><span class="GramE"><i>T<sub>g</sub></i></span></span><span style='italic'> we can approximate the energy loss as,</span></p>
<p class="MsoNormal"><i>Energy loss due to reverse biased gate ~ <span class="SpellE">V<sub>dd</sub></span>*<span class="SpellE">I<sub>leakage</sub></span>*<span class="SpellE"><span class="GramE">T<sub>g</sub></span></span></i></p>
<p class="MsoNormal">&nbsp;</p>
<p class="MsoNormal">What about the total energy loss? If the energy loss due to the reverse bias leakage and short circuit current are so small, where is all that energy coming from that is heating our processor?</p>
<p class="MsoNormal">To get this, we need to look more closely at what a gate looks like in an analog sense. A reverse biased transistor is basically a capacitor, that is, two plates separated by an insulator / dielectric. From the figure above, it’s <i>C<sub>L</sub></i>.<i> </i>A forward biased transistor is a short. These plates charge and discharge like a capacitor because of the design of a gate. In the one state, one transistor is “open” and the other is the acting capacitor. In the other state, the roles reverse and other transistor is the acting capacitor. What I’m trying to say is that even though the circuit is essentially open, current still flows from one transistor / capacitor to the other. This current flow is going to cause resistive heating, and so consume power.</p>
<p class="MsoNormal">The equation for the energy stored in a capacitor (<i>C</i>) is</p>
<p class="MsoNormal"><i>Energy in a capacitor = ½ * C * V<sup>2</sup></i></p>
<p class="MsoNormal">At each transition, the capacitor dumps the energy stored in it to either to ground or to the other complement transistor, giving us the following.</p>
<p class="MsoNormal"><i>Energy flow due to a state transition = ½ * C<sub>L</sub> * V<sub>dd</sub><sup>2</sup></i></p>
<p class="MsoNormal">&nbsp;</p>
<p class="MsoNormal">Remember that one cycle has two state transitions. So the complete equation for the energy loss caused by one cycle, which we’ll call <span class="SpellE"><i>E<sub>tr</sub></i></span>, is,</p>
<p class="MsoNormal"><span class="SpellE"><i>E<sub>tr</sub></i></span><i>=C<sub>L</sub>*V<sup>2</sup><sub>dd</sub>+ 2*<span class="SpellE">V<sub>dd</sub></span>*<span class="SpellE">I<sub>peak</sub></span>*<span class="SpellE">t<sub>s</sub>+V<sub>dd</sub></span>*<span class="SpellE">I<sub>leakage</sub></span>*<span class="SpellE"><span class="GramE">T<sub>g</sub></span></span><sub></sub></i></p>
<p class="MsoNormal">&nbsp;</p>
<p class="MsoNormal">I’m now going to do something that is blatantly wrong.<span style='yes'>  </span>It’s also a huge topic on its own that I’m completely unqualified to talk about. I’m going to ignore the last two terms. I’m pretty sure that they were 2<sup>nd</sup> order in effect maybe 10 years ago. And the first, the one with <span class="SpellE"><i>I<sub>peak</sub></i></span>, may still be.<span style='yes'>  </span>But that last term, oh that last term. Volumes have been written about it, generally in a language incomprehensible to us mere mortals. (Request: can anyone out there tell us more?) </p>
<p class="MsoNormal">After dropping those last two terms, we’re left with,</p>
<p class="MsoNormal"><span class="SpellE"><i>E<sub>tr</sub></i></span><i><sub> </sub>~ C<sub>L</sub>*V<sup>2</sup><sub>dd</sub></i></p>
<p class="MsoNormal">This is almost what we want. We’re missing that annoying little “<i>f</i>”. The little equation we wrote above is for one cycle of a gate. True, modern processors can do one heck of a lot in one cycle, but a one cycle application is still pretty uninteresting. Our gates above are switching all the time at a rate related to the frequency of the processor, which we’ll call “<i>a * f”</i>, where <i>f</i> is the frequency of the processor and <span class="GramE"><i>a</i> is</span> some constant. </p>
<p class="MsoNormal"><i>Energy output of a gate/sec ~ C<sub>L</sub> * V<sub>dd</sub><sup>2</sup> * (a*f)</i></p>
<p class="MsoNormal">And how many gates are in a high-end Intel processor today? Close to a billion for 45 nm. (And the next generation is 32 nm.) So we’ve 1.0E9 (1 billion) transistors per processor, running at frequencies of 3E9 Hz (3 billion). Let’s see, 1E9*3E9 is – scientific notation always confuses me – 3E18 transitions per second. Is there even a name for 1E18?</p>
<p class="MsoNormal"><i>Energy output of a processor/sec ~ C<sub>L</sub> * V<sub>dd</sub><sup>2</sup> * (a*f) * &lt;number of transistors&gt;</i></p>
<p class="MsoNormal">Now before those of <span class="GramE">you</span> who actually know this stuff start crabbing, let me make it clear that I am only attempting to help people understand where the equation comes from. Yes, the effect of the short circuit current contributes noticeably to the processor’s heating when we’re talking about, say, a billion transistors. And there’s that aforementioned leakage current. And there are the leakage and voltage issues related to smaller and smaller junctions. And there are a lot of circuit elements that aren’t strictly logic gates that contribute to the power. And there have been a lot of developments related to reducing the short circuit and leakage currents of a logic gate. And yes, I’m an ignorant software hack.</p>
<p class="MsoNormal">But for those of you who are neither processor architects nor researchers into modern IC materials, mayhap this gives you a little better understanding of where this (I hope) formerly mysterious relationship comes from.</p>
<p class="MsoNormal">Of course, I could be just blowing smoke, but then, one of you less than gentle readers out there will let me know.</p>
<p><span style='AR-SA'> </span></p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2009/08/25/why-p-scales-as-cv2f-is-so-obvious-pt-2-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Why P scales as C*V^2*f is so obvious (pt 1)</title>
		<link>http://software.intel.com/en-us/blogs/2009/08/25/why-p-scales-as-cv2f-is-so-obvious-pt-1/</link>
		<comments>http://software.intel.com/en-us/blogs/2009/08/25/why-p-scales-as-cv2f-is-so-obvious-pt-1/#comments</comments>
		<pubDate>Tue, 25 Aug 2009 17:42:12 +0000</pubDate>
		<dc:creator>Taylor Kidd (Intel)</dc:creator>
				<category><![CDATA[Game Development]]></category>
		<category><![CDATA[Graphics & Media]]></category>
		<category><![CDATA[Mobility]]></category>
		<category><![CDATA[Parallel Programming]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[applications]]></category>
		<category><![CDATA[energy efficiency]]></category>
		<category><![CDATA[Power Efficiency]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2009/08/25/why-p-scales-as-cv2f-is-so-obvious-pt-1/</guid>
		<description><![CDATA[[apologies gentle readers if you've seen this twice. I've moved this over from my mysterious twin account (taylor-kidd-2) over to this account (taylor-kidd)] [Warning: Math and physics alert! Math and physics alert!] I think that you've all seen this equation before: P = a * C * V2 * f Where P is power, C [...]]]></description>
			<content:encoded><![CDATA[<p>[apologies gentle readers if you've seen this twice. I've moved this over from my mysterious twin account (taylor-kidd-2) over to this account (taylor-kidd)]</p>
<p>[Warning: Math and physics alert! Math and physics alert!] </p>
<p>I think that you've all seen this equation before:</p>
<p><em>P = a * C * V<sup>2</sup> * f</em></p>
<p>Where <em>P</em> is power, <em>C</em> is capacitance, <em>V</em> is the voltage across the gate (typically, <em>V<sub>dd</sub></em>), <em>f</em> is the clock frequency, and <em>a</em> is some constant.</p>
<p>Doing my reading, I see this equation a lot, usually prefaced with something like, “And everyone knows…” or “And of course…” When I read, “And it’s obvious,” I get really steamed because it’s never obvious to me. Maybe I’m dumb. Maybe they are way smarter than me. Or maybe they don’t know why either. Well, I’m here to say that it isn’t obvious, at least not to me. </p>
<p>So let’s see where this sucker comes from.</p>
<p>This is going to get into a little bit of math, but hey, we are all pretty comfortable with math. It comes with being a nerd. (OK, I didn’t mean to stereotype. I’m sure there are nerds that aren’t comfortable with math.-- at least theoretically. Not that I’m a nerd. I’m a perfectly well adjusted individual who just happens to be uncomfortable around non engineers, and who likes to hide in my cube and code all day.)</p>
<p>CHECKING THE UNITS</p>
<p>Before we go further, let’s check to see if the units match. This is always a good thing to do just to make sure we aren’t barking up the wrong tree.</p>
<p>Now, <em>P</em> is power (Watts). <em>V</em> is voltage (Volts). <em>f</em> is the frequency (cycles/sec), or equivalently, the number of seconds/cycle. And <em>C</em> is capacitance (Farads). A Farad is Coulombs/Volt, the charge divided by the EMF (Electromotive Force) potential.</p>
<p>Putting it all together and noting that <em>P=V*I</em> where <em>I</em> is the current in Amps, we’ve, </p>
<p><em>P = a * C * V<sup>2</sup> * f</em><br />
<em>Volts*Amps = Coulombs /Volt * Volts<sup>2</sup> * cyc/sec </em></p>
<p>Since amperage is charge flowing per unit time, i.e. Amps=Coulombs/sec, we’ve got</p>
<p><em>Volts* Coulombs/sec = Coulombs/Volts * Volts<sup>2</sup> * cyc/sec<br />
Volts*Coulombs/sec = Coulombs*Volts/sec</em></p>
<p>(note that both cycles and a are unit less)</p>
<p>Hey, it works! So at least we know that the units are correct. We’re not obviously barking up the wrong tree.</p>
<p>So that’s it folks. There you have it. Let’s now go onto something new.</p>
<p>What? You also want to know why? Wow, you are a hard crowd. OK, I’ll relent and explain why. Next time, that is.</p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2009/08/25/why-p-scales-as-cv2f-is-so-obvious-pt-1/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Energy / Power measurement wish list</title>
		<link>http://software.intel.com/en-us/blogs/2008/09/05/energy-power-measurement-wish-list/</link>
		<comments>http://software.intel.com/en-us/blogs/2008/09/05/energy-power-measurement-wish-list/#comments</comments>
		<pubDate>Fri, 05 Sep 2008 16:34:15 +0000</pubDate>
		<dc:creator>Taylor Kidd (Intel)</dc:creator>
				<category><![CDATA[Game Development]]></category>
		<category><![CDATA[Graphics & Media]]></category>
		<category><![CDATA[Mobility]]></category>
		<category><![CDATA[Parallel Programming]]></category>
		<category><![CDATA[platform]]></category>
		<category><![CDATA[power measurement]]></category>
		<category><![CDATA[processor]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2008/09/05/energy-power-measurement-wish-list/</guid>
		<description><![CDATA[I got such a good response from my previous post, that I decided to pose another question to my select and invisible audience. If you could come up with a wish list, what power related measurements would you like to get from the (computer) platform, as well as from the processor itself? Now, let's be [...]]]></description>
			<content:encoded><![CDATA[<p class="MsoNormal" style="0in 0in 0pt;"><span style="Times New Roman;"><span style="AR-SA;">I got such a good response from my previous post, that I decided to pose another question </span>to my select and invisible audience.</span></p>
<p class="MsoNormal" style="0in 0in 0pt;"><span style="Times New Roman;">If you could come up with a wish list, what power related measurements would you like to get from the (computer) platform, as well as from the processor itself?</span></p>
<p class="MsoNormal" style="0in 0in 0pt;"><span style="Times New Roman;">Now, let's be a little creative. Saying something like, "I'd like to measure the power of an application," doesn't really tell us anything. Break things down a little. Make it a little more concrete.</span></p>
<p class="MsoNormal" style="0in 0in 0pt;"><span style="Times New Roman;">Here's an example of a better question. "I'd like to measure the energy consumed by my such and such encoding function." Or, "you know, it'd be great if I could measure such and such so that I could predict the energy performance of a certain algorithm."</span></p>
<p class="MsoNormal" style="0in 0in 0pt;"><span style="Times New Roman;">So what do you want Santa to bring?</span></p>
<p class="MsoNormal" style="0in 0in 0pt;"> </p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2008/09/05/energy-power-measurement-wish-list/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>So how are P-states related to power management?</title>
		<link>http://software.intel.com/en-us/blogs/2008/08/15/so-how-are-p-states-related-to-power-management/</link>
		<comments>http://software.intel.com/en-us/blogs/2008/08/15/so-how-are-p-states-related-to-power-management/#comments</comments>
		<pubDate>Fri, 15 Aug 2008 20:16:19 +0000</pubDate>
		<dc:creator>Taylor Kidd (Intel)</dc:creator>
				<category><![CDATA[Game Development]]></category>
		<category><![CDATA[Graphics & Media]]></category>
		<category><![CDATA[Mobility]]></category>
		<category><![CDATA[Parallel Programming]]></category>
		<category><![CDATA[Power Efficiency]]></category>
		<category><![CDATA[power management]]></category>
		<category><![CDATA[processor]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2008/08/15/so-how-are-p-states-related-to-power-management/</guid>
		<description><![CDATA[This relationship between P-states, voltage and frequency is well and good, but how does this relate to power management? Power is literally, energy usage per unit of time. To get the total energy usage, you integrate the instantaneous power over the interval you're interested in, i.e. get the area under the curve. If there are [...]]]></description>
			<content:encoded><![CDATA[<p class="MsoNormal" style="0in 0in 0pt;"><span style="small;"><span style="Times New Roman;">This relationship between P-states, voltage and frequency is well and good, but how does this relate to power management? Power is literally, energy usage per unit of time. To get the total energy usage, you integrate the instantaneous power over the interval you're interested in, i.e. get the area under the curve. If there are some energy savings to be had, see the previous discussion, we want to reduce the frequency, and so the voltage, such that the CPU is just shy of being 100% utilized. At this voltage, there is (ideally) no increase in the execution time of your application. This is possible because we're only getting rid of the idle time. By minimizing the voltage, we've minimized leakage current and so minimized the instantaneous power. (See comments in my blog entry, "Can P-states save overall energy?" http://software.intel.com/en-us/blogs/2008/07/31/can-p-states-save-overall-energy/.) Integrating over the entire execution time of the application, we find the total energy used is less. (Disclaimer: this applies under ideal circumstances. Actual mileage may vary.)</span></span></p>
<p class="MsoNormal" style="0in 0in 0pt;"><span style="small;"><span style="Times New Roman;">Similarly, spreading out the work the CPU has to perform over a longer interval reduces the peak power over the interval. As we mentioned earlier, this reduces cooling requirements with the aforementioned reduction in costs. </span></span></p>
<p class="MsoNormal" style="0in 0in 0pt;"><span style="small;"><span style="Times New Roman;">So in conclusion, we know that in clients, processor usage is very bursty, consisting of large periods of idle punctuated by bursts of furious activity. (The usage profile is different for servers, but then when are you going to be running servers without an A/C tether?) By choosing an appropriate P-state, we can minimize this idle time, reducing peak power, and (potentially) increasing power efficiency. </span></span></p>
<p class="MsoNormal" style="0in 0in 0pt;"><span style="small;"><span style="Times New Roman;">(This brings up an interesting point. What is the effect of P-states on processor efficiency? By entering a higher p-state (lower frequency), we've effectively downgraded the processor. If we think of processor efficiency as being the amount of work done over a given interval, there is no effect on processor efficiency. Why? Because we're doing the same about of work in a given period -- we're just filling in the idle time.)</span></span></p>
<p class="MsoNormal" style="0in 0in 0pt;"><span style="small;"><span style="Times New Roman;">Let's look at the type of environments that P-states might be useful within. Mobile environments, of course. But how about environments where you have significant limits in the cost and size of cooling equipment? The embedded environment generally has these constraints. Embedded processors often have to deal with severe space, weight and cooling limitations. Several more advanced embedded and special purpose processors use P-states. (No, I can't really mention what those "other processors" are except to state, unequivocally, that Intel processors are the best thing since sliced bread.) </span></span></p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2008/08/15/so-how-are-p-states-related-to-power-management/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Can P-states save overall energy?</title>
		<link>http://software.intel.com/en-us/blogs/2008/07/31/can-p-states-save-overall-energy/</link>
		<comments>http://software.intel.com/en-us/blogs/2008/07/31/can-p-states-save-overall-energy/#comments</comments>
		<pubDate>Thu, 31 Jul 2008 20:48:41 +0000</pubDate>
		<dc:creator>Taylor Kidd (Intel)</dc:creator>
				<category><![CDATA[Game Development]]></category>
		<category><![CDATA[Graphics & Media]]></category>
		<category><![CDATA[Mobility]]></category>
		<category><![CDATA[Parallel Programming]]></category>
		<category><![CDATA[energy]]></category>
		<category><![CDATA[Power Efficiency]]></category>
		<category><![CDATA[power management]]></category>
		<category><![CDATA[watt]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2008/07/31/can-p-states-save-overall-energy/</guid>
		<description><![CDATA[Part of the reason I've been so silent recently is because I've been really really busy. (Aren’t we all?) But it also has to do with a short segment I had written on P-states and energy savings. This brief segment outlined a relationship between processor voltage, leakage current, frequency and power. My conclusion was that [...]]]></description>
			<content:encoded><![CDATA[<p class="MsoNormal" style="0in 0in 0pt;"><span style="Times New Roman;">Part of the reason I've been so silent recently is because I've been really really busy. (Aren’t we all?) But it also has to do with a short segment I had written on P-states and energy savings. This brief segment outlined a relationship between processor voltage, leakage current, frequency and power. My conclusion was that P-states not only reduced your peak thermal power but also allowed you to reduce your total energy usage.</span></p>
<p class="MsoNormal" style="0in 0in 0pt;"><span style="Times New Roman;">So there I was, finger poised above the "post" button, when a nagging thought raised its ugly head. What if I was wrong? </span><span style="Times New Roman;"> </span></p>
<p class="MsoNormal" style="0in 0in 0pt;"><span style="Times New Roman;">I set out to find the original references that supplied the basis of my post to be. Unfortunately, all the references seemed to have vanished from the face of the earth. Thinking about it, I could see only two possibilities of how this had happened. Perhaps I didn't remember the articles correctly, recalling a relationship where there was none. "Nah," I said to myself in response. The other possibility was that there existed a far ranging government conspiracy to erase all such references to this relationship. "Uh huh, that's got to be it." Fortunately, I soon got my damaged ego under control. I realized that I had to try harder to validate my facts.</span></p>
<p class="MsoNormal" style="0in 0in 0pt;"><span style="Times New Roman;">Here I am, a couple of months later, and I have yet to be able to validate the relationship I recall existing between voltage, frequency, leakage current and power.</span></p>
<p class="MsoNormal" style="0in 0in 0pt;"><span style="Times New Roman;">So here's what I’m going to do. I'm posing this question to you, gentle readers. </span></p>
<p class="MsoNormal" style="0in 0in 0pt;"><span style="Times New Roman;">QUESTION: Can operating in a higher P-state (i.e. lower voltage and lower frequency) result in any total energy savings? (I'm assuming that the processor doesn't have C-states, meaning it always operates in C0.)</span></p>
<p class="MsoNormal" style="0in 0in 0pt;"><span style="Times New Roman;">Constructive comments, yea or nay, are welcome. </span><span style="Times New Roman;"> </span></p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2008/07/31/can-p-states-save-overall-energy/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>What exactly is a P-state? (Pt. 1)</title>
		<link>http://software.intel.com/en-us/blogs/2008/05/29/what-exactly-is-a-p-state-pt-1/</link>
		<comments>http://software.intel.com/en-us/blogs/2008/05/29/what-exactly-is-a-p-state-pt-1/#comments</comments>
		<pubDate>Thu, 29 May 2008 16:57:58 +0000</pubDate>
		<dc:creator>Taylor Kidd (Intel)</dc:creator>
				<category><![CDATA[Game Development]]></category>
		<category><![CDATA[Graphics & Media]]></category>
		<category><![CDATA[Mobility]]></category>
		<category><![CDATA[Parallel Programming]]></category>
		<category><![CDATA[energy]]></category>
		<category><![CDATA[penryn]]></category>
		<category><![CDATA[power management]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2008/05/29/what-exactly-is-a-p-state-pt-1/</guid>
		<description><![CDATA[A P-state is a voltage and frequency operating point What is a P-state? When someone refers to a P-state, generally only the frequency is talked about. For example, on my Intel Core Duo, P0 is 2.3 GHz, and P1 is 980 MHz. In truth, a P-state is both a frequency and voltage operating point. Both [...]]]></description>
			<content:encoded><![CDATA[<h1><font size="5"><font face="Arial">A P-state is a voltage and frequency operating point</font></font></h1>
<h2><a name="PublishedToHere" title="PublishedToHere"></a><em><font face="Arial">What is a P-state?</font></em></h2>
<p><font face="Times New Roman">When someo<a href="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2008/05/pstatepeakenergy.jpg" title="pstatepeakenergy.jpg"></a><a href="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2008/05/pstatepeakenergy.jpg" title="pstatepeakenergy.jpg"></a>ne refers to a P-state, generally only the frequency is talked about. For example, on my Intel Core Duo, P0 is 2.3 GHz, and P1 is 980 MHz. In truth, a P-state is both a frequency and voltage operating point. Both are scaled as the P-state increases.</font><font face="Times New Roman"> </font></p>
<h2><em><font face="Arial">The effect of reducing frequency</font></em></h2>
<p><font face="Times New Roman"> </font><font face="Times New Roman">It's obvious that performance is directly related to frequency. We all know that increasing the frequency, increases a processor's performance. The same applies for decreasing the frequency. If you halve the frequency, a compute bound task runs half as fast. For example, if your task is compute bound, requiring 100 % of the CPU for 1 second at 2 GHz, it will take 2 seconds to execute at 1 GHz. (This is roughly correct. There are a host of other factors influencing runtime, such as cache size and speed, interrupts, etc.)</font><font face="Times New Roman"> </font></p>
<p><font face="Times New Roman">Now wait a moment. If we reduce the frequency, increasing the runtime of an application, how does this increase battery life? If we are decreasing the frequency, we are increasing the CPU utilization and reducing the % idle time. See the frequency half of Figure A. This shouldn't have any effect on the power usage of the processor. It's running all that time anyway.</font><font face="Times New Roman"> </font></p>
<p><font face="Times New Roman">You're right, historically. This was the case years ago with those ancient generations old processors. Though I don't know this for a fact, I suspect that in some cases, decreasing the percent idle time might have increased energy usage since increased processor activity increases instantaneous power, and we're decreasing the % idle time (i.e. time of low activity). This is where voltage scaling comes into play. There are two primary reasons for P-states, one is to reduce the peak thermal load, and the other is to save power.</font><font face="Times New Roman"> </font></p>
<h2><em><font face="Arial">Reducing peak thermal load</font></em></h2>
<p><font face="Times New Roman"> </font><font face="Times New Roman">The reasons why you want to reduce the peak thermal load is pretty obvious. The instantaneous energy usage (power) of the processor is related to its activity. If the processor is very busy, requiring a lot of gates to do a lot of switching, it runs hotter. So reducing the frequency reduces the peak thermal output even if the total energy usage is not reduced. The advantage of reducing peak thermal load has to do with cost. The effectiveness of your cooling is based upon peak power, not average power. (I'm neglecting the effect of thermal inertia.) So if you can reduce peak power, you reduce the cost and size of the equipment having to do the cooling.</font><font face="Times New Roman"> </font><font face="Times New Roman"> </font></p>
<p><font face="Times New Roman"><a href="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2008/05/pstatepeakenergy.jpg" title="pstatepeakenergy.jpg"><img src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2008/05/pstatepeakenergy.jpg" alt="pstatepeakenergy.jpg" /></a></font></p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2008/05/29/what-exactly-is-a-p-state-pt-1/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>There&#039;s got to be a catch</title>
		<link>http://software.intel.com/en-us/blogs/2008/04/29/theres-got-to-be-a-catch/</link>
		<comments>http://software.intel.com/en-us/blogs/2008/04/29/theres-got-to-be-a-catch/#comments</comments>
		<pubDate>Tue, 29 Apr 2008 17:24:14 +0000</pubDate>
		<dc:creator>Taylor Kidd (Intel)</dc:creator>
				<category><![CDATA[Graphics & Media]]></category>
		<category><![CDATA[Mobility]]></category>
		<category><![CDATA[Parallel Programming]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2008/04/29/theres-got-to-be-a-catch/</guid>
		<description><![CDATA[I hate moving. Nothing ever goes as it should. It takes 10 times longer than you expected. And that last box is finally unpacked just before you end up moving again. There's got to be a catch There are 5 CC-states and, depending upon how you count, 6 PC-states in the Penryn line of Intel [...]]]></description>
			<content:encoded><![CDATA[<p><font face="Times New Roman">I hate moving. Nothing ever goes as it should. It takes 10 times longer than you expected. And that last box is finally unpacked just before you end up moving again.</font></p>
<h1><font size="5"><font face="Arial">There's got to be a catch</font></font></h1>
<p><font face="Times New Roman">There are 5 CC-states and, depending upon how you count, 6 PC-states in the Penryn line of Intel processors. And, in Microsoft XP, there are 4 OS C-states. So are there 5 C-states, 6 C-states, 4 C-states, or 15 C-states? Choose the number that you are least uncomfortable with. Personally, I first imagine a 3 set Venn diagram with overlapping elements and added transition annotations. Then I get confused and give up.</font><font face="Times New Roman"> </font></p>
<p><font face="Times New Roman">Given what we've talked about above, it seems as if we should always drop a core into the lowest permissible CC-state, right?</font><font face="Times New Roman"> </font></p>
<p><font face="Times New Roman">There are a few reasons for not doing this. First off, the OS's Power Management (PM) policy, and not the hardware, determines when a core enters a CC-state. From our standpoint as a hardware manufacturer, we have little to do with this. I'll talk about why this is important later. Secondly, there is always a cost for dropping into a lower C-state. That cost is the amount of time required for the core to transition from an idle state, e.g. CC5, to C0. As you start using deeper CC-states, latency becomes significant. For example, the latency to go from CC3 to CC0 is around 20 us, literally ages when we're talking about 3 GHz processors.</font><font face="Times New Roman"> </font></p>
<p><font face="Times New Roman">This latency penalty is even worse once you realize that the phrase, "in a given C-state," is misleading. As I've mentioned above, it's easy to think of a core as descending as a waterfall from C0 into C1 into C2 into C3. (See Figure A.) If this were the case, you'd suffer only one 20 usec penalty. No, it's oscillating between C0 and C3 hundreds, if not thousands, of times a second until the OS's PM code decides that the percentage residency merits ascending / descending to the next C-state (e.g. CC3 to CC2). In Windows, the C-state that a core transitions to is based on the % idle over a certain interval. This means that each transition exacts that 20 usec penalty, and there are hundreds of transitions. Doing the math, experiencing a 20 usec delay 100 times a second is a whopping 2 msec of added latency per second. (See Figure B.)</font><font face="Times New Roman"> </font></p>
<p><font face="Times New Roman">Even if it is possible to drop a core into a deeper CC-state, the OS has to ask itself various questions, such as what is the likelihood that processes are going to be doing more work very soon, so that dropping into a deeper CC-state might actually cost an unacceptable penalty? Similarly, the processor has to ask whether dropping a core into a lower CC-state is going to cause incorrect operation, say whether the delay in the processing of an interrupt will cause an event to be lost.</font></p>
<p><font face="Times New Roman"><img src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2008/04/waterfall_080107_12.gif" alt="waterfall_080107_12.gif" /></font></p>
<p><font face="Times New Roman">Figure A. The waterfall misconception.</font></p>
<p><font face="Times New Roman"><img src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2008/04/percent_0801071.gif" alt="percent_0801071.gif" /></font></p>
<p><font face="Times New Roman">Figure B. What actually happens "in a given C-state".</font></p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2008/04/29/theres-got-to-be-a-catch/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>(update) C-states, C-states and even more C-states</title>
		<link>http://software.intel.com/en-us/blogs/2008/03/27/update-c-states-c-states-and-even-more-c-states/</link>
		<comments>http://software.intel.com/en-us/blogs/2008/03/27/update-c-states-c-states-and-even-more-c-states/#comments</comments>
		<pubDate>Thu, 27 Mar 2008 19:56:05 +0000</pubDate>
		<dc:creator>Taylor Kidd (Intel)</dc:creator>
				<category><![CDATA[Game Development]]></category>
		<category><![CDATA[Graphics & Media]]></category>
		<category><![CDATA[Mobility]]></category>
		<category><![CDATA[Parallel Programming]]></category>
		<category><![CDATA[client]]></category>
		<category><![CDATA[core duo]]></category>
		<category><![CDATA[efficiency]]></category>
		<category><![CDATA[power management]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2008/03/27/update-c-states-c-states-and-even-more-c-states/</guid>
		<description><![CDATA[As I said before, a C-state is an idle state. The processor isn't doing anything useful, so why not shut some things off? Think of it in terms of your house. If you're not at home, why keep the lights, radio, and those 6 televisions going? Modern processors have several different C-states representing increasing amounts [...]]]></description>
			<content:encoded><![CDATA[<p>As I said before, a C-state is an idle state. The processor isn't doing anything useful, so why not shut some things off? Think of it in terms of your house. If you're not at home, why keep the lights, radio, and those 6 televisions going? Modern processors have several different C-states representing increasing amounts of "stuff" shut down. C0 is the operational state, meaning that the CPU is doing useful work. C1 is the first idle state. The clock running to the processor is gated, i.e. the clock is prevented from reaching the core, effectively shutting it down in an operational sense. C2 is the 2nd idle state. The external I/O Controller Hub blocks interrupts to the processor. And so on with C3, C4, etc. I'll discuss this further down in this paper. By the way, there is nothing preventing the OS from busy waiting in its idle state, and thus keeping the processor in C0, as did older operating systems. From the OS's standpoint, the processor is idling; it's just chewing up energy for no useful reason other than being an ineffectual heater.</p>
<p>So what's this thing about "C-states, C-states and even more C-states"? It turns out that there are different kinds of C-states depending upon what part of your system you are talking about. There are core C-states, processor C-states, and OS C-states. All are similar and are idle states (I'm excluding C0, of course.) They are also different in some substantial ways.</p>
<p><u>A core C-state</u> is a hardware C-state. There are several core idle states, e.g. CC1 and CC3. As we know, a modern state of the art processor has multiple cores, such as the recently released Core Duo T5000/T7000 mobile processors, known as Penryn in some circles. What we used to think of as a CPU / processor, actually has multiple general purpose CPUs in side of it. The Intel Core Duo has 2 cores in the processor chip. The Intel Core-2 Quad has 4 such cores per processor chip. Each of these cores has its own idle state. This makes sense as one core might be idle while another is hard at work on a thread. So a core C-state is the idle state of one of those cores.</p>
<p><u>A processor C-state</u> is related to a core C-state. At some point, cores share resources, e.g. the L2 cache or the clock generators. When one idle core, say core 0, is ready to enter CC3 but the other, say core 1, is still in C0, we don't what the fact that core 0 is ready to descend into CC3 to prevent core 1 from executing because we just happened to shut down the clock generators. Thus we have the processor / package C-state, or PC-state. The processor can only enter a PC-state, say PC3, if both cores are ready to enter that CC-state, e.g both cores are ready to step into CC3. I'll talk more about this in a subsequent section.</p>
<p><u>A logical C-state</u>: The last C-state is the OS's view of the processors' C-states. In Windows, a processor's C-state is pretty much equivalent to a core C-state. In fact, the OS's lower level power management software determines when and if a given core enters a given CC-state using the MWAIT instruction. There is one important difference. When an application, such as Intel's PowerInformer, thinks it's interrogating a processor core CC-state, what is returned is the C-state of what is called a "logical core". (A logical core is technically not the same as a physical core. In my experience, a logical core is almost always the same as a physical core, but it doesn't have to be.) Logical cores don't have to worry about little things such as the hardware the OS is running on. For example, the C-state of a logical core doesn't worry about the barriers imposed by shared resources, such as the clock generators, I talked about earlier. Logical Core 0 can be in C3 while Logical Core 1 is in C0.</p>
<p>This seems a little confusing doesn't it? So how do logical core C-states, core C-states and processor C-states relate to each other? Take the situation above: From the OS perspective, logical core 0 is in C3 and logical core 1 is in C0. Since C3, from the hardware perspective, actually shuts down a shared process, the clock generators, (physical) core 0 must be held at CC2 since core 1 is in C0 and using the clock generators. The processor, in a global sense, is not idle since core 1 is in C0, so the processor's C-state is C0. To use a little bit of that intimidating mathematics, </p>
<p align="center">Processor C-state = Min(core C-states)</p>
<p align="center">Core C-state = Minimum barrier(set of all logical C-states)</p>
<p align="center">Logical C-state = anything the OS wants </p>
<p>Next: There has got to be a catch</p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2008/03/27/update-c-states-c-states-and-even-more-c-states/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>C-states and P-states are very different</title>
		<link>http://software.intel.com/en-us/blogs/2008/03/12/c-states-and-p-states-are-very-different/</link>
		<comments>http://software.intel.com/en-us/blogs/2008/03/12/c-states-and-p-states-are-very-different/#comments</comments>
		<pubDate>Wed, 12 Mar 2008 23:46:05 +0000</pubDate>
		<dc:creator>Taylor Kidd (Intel)</dc:creator>
				<category><![CDATA[Mobility]]></category>
		<category><![CDATA[Parallel Programming]]></category>
		<category><![CDATA[applications]]></category>
		<category><![CDATA[energy efficient]]></category>
		<category><![CDATA[Power Efficiency]]></category>

		<guid isPermaLink="false">http://software.intel.com/en-us/blogs/2008/03/12/c-states-and-p-states-are-very-different/</guid>
		<description><![CDATA[C-states are idle states and P-states are operational states. This difference, though obvious once you know, can be initially confusing. With the exception of C0, where the CPU is active and busy doing something, a C-state is an idle state. Since an idle CPU isn't doing anything (i.e. any useful work), why not shut it [...]]]></description>
			<content:encoded><![CDATA[<p><font face="Times New Roman">C-states are idle states and P-states are operational states. This difference, though obvious once you know, can be initially confusing. </font></p>
<p><font face="Times New Roman">With the exception of C0, where the CPU is active and busy doing something, a C-state is an idle state. Since an idle CPU isn't doing anything (i.e. any useful work), why not shut it down? No one is going to notice since there's no one using it. (Letting a Penryn run at full bore when idle is like driving in circles very fast; all you're doing is going nowhere quickly.)</font></p>
<p><font face="Times New Roman">A P-state is an operational state, meaning that the core / processor can be doing useful work in any P-state. The most obvious example is when your laptop is using a low power profile and operating on battery. The OS will lower the C0 operating frequency and voltage, i.e. enter a higher P-state. Reducing the operating frequency reduces the speed at which the processor operates, and so the energy usage per second (i.e. power). Reducing the voltage decreases the leakage current from the CPU's transistors, making the processor more energy efficient resulting in further gains. The net result is a significant reduction in the energy usage per second of the processor. On the flip side, an application will take longer to run. This may or may not be a problem from a power perspective. I'll talk about this issue in some depth in a later blog.</font></p>
<p><font face="Times New Roman">C-states and P-states are also orthogonal. This is a fancy mathematical term meaning that each can vary independently of the other. This doesn't mean that in the higher C-states, the voltage doesn't change. It only means that when you resume C0, you go back to the operating frequency and voltage defined by that P-state. </font></p>
<p><font face="Times New Roman">Next time: C-states, C-states and even more C-states</font></p>
]]></content:encoded>
			<wfw:commentRss>http://software.intel.com/en-us/blogs/2008/03/12/c-states-and-p-states-are-very-different/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
	</channel>
</rss>

