Intel® Xeon Phi™ coprocessor Power Management Turbo Part 2: Hot and Cold Running Silicon

The previous blog in this series, “Intel® Xeon Phi™ coprocessor Power Management Turbo Part 1: What is turbo? And how will it affect my horsepower?” can be found at http://software.intel.com/en-us/blogs/2013/09/26/intel-xeon-phi-coprocessor-power-management-turbo-part-1-what-is-turbo-and-how-will.

TEMPERATURE DISTRIBUTION

MODERATE ACTIVITY

Figure WARM shows a fictitious distribution of temperatures across a moderately active fictitious chip. (You have to use your imagination given my rather limited drawing skills.) We immediately see that there are 8 hotspots in the central region of the chip, surrounded by cooler areas across the substrate. As you get further away from the hotspot, the substrate cools even further. An actual chip is going to have a temperature distribution much more complex than this as a chip supports caches, buses, and a whole host of other activities. Even so, this simple figure illustrates the concepts we want to bring across.

  Illustration of warm multicore silicon

Figure WARM. Temperature distribution across a chip: Moderate activity

 

In this case, the Tjunction MAX might be 150 °C. Ambient might be 40 °C. Again, these are fictitious values to go along with the fictitious distributions.

The number of thermo sensors depends upon the sophistication of the processor’s hardware power management. If you only want to shut down the processor in a thermal overload situation, you might be able to get away with only one carefully placed sensor. If you want to do things like dropping the processor into a higher P-state during a thermal overload situation, being able to distinguish between more and less thermally sensitive parts of the processor, and having turbo P-states, you need more sensors.

LOW ACTIVITY

In this scenario, the cores on the chip are relatively inactive. In terms of C-states, this means that the cores are in the deeper idle C-states, occasionally entering C0 to perform some activity. As such, the chip is relatively cool. The slightly warmer core (reddish) is moderately active, perhaps hosting the operating system in an Intel® Xeon Phi™ like coprocessor.

This low level of activity is ideal for supporting a period of turbo processing.

  Cooler Silicon

Figure COOL. Temperature distribution across a chip: Cool and low activity

 

HIGH ACTIVITY

In Figure HOT, we have a situation where all the cores are executing and doing some type of computationally intense activity, for example, some matrix operation with both a high degree of functional parallelism spread across most or all of the cores, and vectorization maximizing the SIMD parallelism, such as performing n×16 packed floating point adds on an Intel® Xeon Phi™ coprocessor. The core and package C-states are all in C0, and the P-state for each core is P1 (the steady-state voltage/frequency pairing when turbo is available).

This high level of activity does not permit any turbo processing (P0) since there is no unused thermal budget. Note that P0 is defined as the highest voltage frequency pairing. This means that in systems with turbo capability, P0 is “turbo” and that the nominal P-state is P1. In SKUs w/o turbo, P0 is the nominal P-state.

  Illustration of hot multicore silicon

Figure HOT. Temperature distribution across a chip: Hot and high activity

 

OPPORTUNITIES FOR TURBO

In this example, we have a very simple Intel® Xeon Phi™ like coprocessor with 4 cores, each executing 4 HW threads.

Figure HOT shows a thermal profile that has no power budget left over for turbo. Even though there are no turbo opportunities, this is the best scenario since the entire processor is being utilized to maximize its computational throughput. Turbo is a way to speed up processing when you have the less than ideal situation where the processor has been relatively inactive and is now (in part or in whole) entering a more active phase.

Figure COLD shows a thermal profile with a good fraction of its thermal budget available for use by turbo. This is a state that you do not want to be in since the coprocessor isn’t doing you much good sitting idle. Even so, this state is often unavoidable, such as between offloads and when an MPI process is waiting for a task. Nevertheless, when you enter a more active computational state, Turbo provides a means for accelerating the execution of code on the processor for a period of time.

Figure WARM shows an intermediate profile. Surprisingly, this state is less desirable than COLD in an Intel® Xeon Phi™ like coprocessor. Being in this state means that you are inefficiently using the processor. To say it another way, you are doing some computation but it is obviously not using the processor’s cores and SIMD engines effectively.

 

NEXT: HOW CAN I DESIGN MY PROGRAM TO MAKE USE OF TURBO?

REFERENCES

For a list of previous blogs in this series, and well as other related blogs on power and power management, see the article at http://software.intel.com/en-us/articles/list-of-useful-power-and-power-management-articles-blogs-and-references.

NOTE: As previously in my blogs, any illustrations can be blamed solely on me as no copyright has been infringed or artistic ability shown.

 

Для получения подробной информации о возможностях оптимизации компилятора обратитесь к нашему Уведомлению об оптимизации.