Intel® Xeon Phi™ coprocessor Power Management Turbo Part 3: How can I design my program to make use of turbo?

Previous blogs on power management and a host of other power management resources can be found in, “List of Useful Power and Power Management Articles, Blogs and References” at See [LIST] below in the reference section.


Let us cut to the chase and ask the two most important questions:

  1. Is there any way we can write programs to make better use of turbo?
  2. When is it worthwhile to redesign my program for turbo?

You can redesign your application to exploit turbo but, depending upon your circumstances, it may not be worthwhile in some cases, and not practical in others. In a mainstream computing environment, turbo can be well worth it, boosting the performance of critical compute code significantly. The HPC world is different. For example, code is compute bound for longer periods of time, and there are more stringent thermal requirements.

For turbo to be useful, the temperature distribution has to start out at a state similar to Figure COOL. This low level of activity is ideal for supporting a period of turbo processing. This period of turbo processing stops when the temperature distribution enters a steady state similar to Figure HOT. 


 Cool and low activity

Figure COOL. Temperature distribution across a chip: Cool and low activity




 Hot and high activity

Figure HOT. Temperature distribution across a chip: Hot and high activity


If we plot the Tjunction (some meaningful average of temperatures across the silicon and not an actual junction temperature) across a hypothetical core over time, we get something like that shown in Figure PLOT. The silicon starts out cool at time t0, at which point a compute intensive application starts executing on the cores. At time t1, the junction temperatures across the chip reach Tturbo_MAX, the maximum permissible temperature under which the power management policy allows turbo. The compute intensive phase continues after that, but with the frequency reduced to P0 to insure that the chip continues to function correctly in a steady state. At t2, the processor once again is idle or nearly so, allowing the junction temperatures of the silicon to drop. At some future time, the temperature drops to a point where turbo is again possible.+


Relationship between compute mode, temperature, and frequency / p-state

Figure PLOT. Relationship between compute mode, temperature, and frequency / p-state


From Figure PLOT, you can see that the best type of application to make use of Turbo is one that cycles through short periodic periods of intense computation combined with longer periods of inactivity. By the way, SS is short for Steady-State. In such a steady state condition, the processor should be able to run indefinitely without failure.

What applications fit this profile? Graphical, media decoding, and other applications of a bursty nature. Unfortunately, this does not fit many applications in the mainstream HPC domain. That is not to say that turbo is not useful in HPC. It is. For example, some implementations of SMP LINPAK show notable gains. There are also the scalar sections of code, such as that in the OS, which can notably benefit from acceleration. Amdahl's law tells us that such sections of code can dominate and prevent increasing parallelism from improving the performance of an application. If turbo can speed up such sections of code, your application can benefit appreciably.


Almost always, you are going to get the most performance by using the processor continuously at a high level of (useful) activity. It is true that in such a scenario you are unable to use turbo, but not having those necessary lengthy periods of inactivity more than makes up for it.

Turbo is useful when your application naturally has “rest” periods. A rest period does not necessarily mean that the processor is not doing anything, just that the activity level is low, such as when your program is executing a section of serial code. Since such code occupies only a small number of cores, perhaps only one, the rest of the processor has a chance to cool off. When the next chunk of highly parallel and vectorizable code executes, the processor can once again make use of turbo until it reaches a steady state as illustrated in Figure HOT.


I can imagine circumstances where you might want to redesign your code so that, for example, your application distributes serial code such that when a serial section executes, the processor has a chance to cool down. When the next computationally intense period starts up, it can make use of the boost provided by turbo. Similarly, you might want to consolidate your sections of serial code together. By doing this, the surrounding inactive silicon may act as a heat sink, allowing the serial code to increase the amount of time it spends in a turbo state.


Yes, this is true. But this does not have to do with turbo not benefiting applications. Clusters have a variety of requirements/constraints they must satisfy in their design independent of raw performance. Even if turbo substantially benefits the facility’s suite of HPC applications, enabling turbo may still not make sense. These non-performance issues can include thermal output limits (i.e. minimizing the cost of the cooling infrastructure), sensitivity to interrupt driven event variations (e.g. jitter can change certain types of event timing), or a need to have very predictable performance (e.g. turbo may accelerate an application but induce run to run variations). Turbo affects all of these.


One important driving factor influencing new designs is, not surprisingly, performance. But other issues, such as OS jitter and facility cooling technologies/costs, are also important driving issues. Power management in general, and this includes turbo, is an important consideration in the design of future HPC focused processors. Such future processors, including the coming Intel Xeon Phi coprocessor generations, will address many of these issues, and so, change the power management equation. If that was a little too long winded for you, I will paraphrase: power management is going to change and improve in future processors, be it the Intel Xeon Phi, Intel® Atom™ or Intel® Xeon® family.


This completes my series on the use of turbo for the Intel Xeon Phi coprocessor. This is not to say that you are finished with me. My next series of humble articles will be on how to measure power on the coprocessor.

NOTE: As previously in my blogs, any illustrations can be blamed solely on me as no copyright has been infringed or artistic ability shown.


[LIST] Kidd, Taylor, 2013, List of Useful Power and Power Management Articles, Blogs and References, Intel® Corporation, (October 23rd, 2013).

+Unfortunately, this plot is not based upon any design or experimental data. Any real data would be of a highly proprietary nature and not publishable. Its purpose is to convey the idea and the logic behind how turbo relates to temperature and frequency. If you are really persnickety and insist on something quantitative, I suggest solving and plotting over time a 1-dimensional heat equation with appropriate boundary conditions.


For more complete information about compiler optimizations, see our Optimization Notice.

1 comment


Add a Comment

Have a technical question? Visit our forums. Have site or software product issues? Contact support.