Intel® Xeon Phi™ coprocessor Power Management Part 1: P-States, Reducing power consumption without impacting performance

Right up front, I am going to tell you that P-states are irrelevant, meaning they will not impact the performance of your HPC application. Nevertheless, they are important to your application in a more roundabout way. Since most of you belong to a group of untrusting and always questioning skeptics (i.e. engineers and scientists), I am going to go through the unnecessary exercise of justifying my claim.

The P-states are voltage-frequency pairs that set the speed and power consumption of the coprocessor. The lower the voltage, the lower the power consumption. The lower the frequency, the slower the computation. I can see the thought bubble appearing over your head: “Under what situations would I ever want to enable P-states and, so, possibly reduce the performance of my HPC app?” Having P-states is less important in the HPC domain than in less computationally intense environments, such as client-based computing and data-bound servers. But even in the coprocessor’s HPC environment, longer quiescent states are common between large computational tasks. For example, if you are using the offload model, the coprocessor is likely unused during the periods between offloads. Also, a native application executing on the coprocessor will often be in a quiescent phase for many reasons, such as their having to wait for the next chunk of data to process.

The P-states the coprocessor Power Management (PM) supports is P0 through Pn. The number of P-states a given coprocessor SKU (i.e. type) supports will vary, but there are always at least two. Similarly, some SKUs will support turbo P-states. The coprocessor’s PM SW handles the transition from one P-state to another. The host’s PM SW has little if any involvement.

A natural thing to wonder is how much this is going to impact performance; we do happen to be working in an HPC environment. The simple answer is that there is, in any practical sense, no impact on HPC performance. I am sure that, at this point, you are asking yourself a series of important questions:

(a)    “Wait a minute; how can this be? If the coprocessor is slowing down the processor by reducing the frequency, how can this not affect the performance of my application?”

(b)   “I just want my application to run as fast as possible. Why would I want to reduce power consumption at all?”

Let us first consider (b). I understand that power is not a direct priority for you as an application writer. Even so, it does impact your application’s performance indirectly. More power consumption has to do with all that distant stuff, like higher facility costs in terms of electricity usage due to greater air-conditioning demands, facility space needs, etc. It is simply part of that unimportant stuff administrators, system architects, and facilities management worry about.

Truth be told, you need to worry about it also. This does impact your application in a very important way, though how is not initially obvious. If the facility can get power consumption down while not losing performance, this means that it can pack more processors in the same amount of space all while using the same power budget. To use another American idiom, you get more “bang for the buck.” And this is a good thing for you as a programmer/scientist. When you get right down to it, lower power requirements mean that you can have more processors running in a smaller space, which means that you, as an application designer/scientist, can run not only bigger problems (more cores) but run them faster (less communication latency between those more cores).

Let us go back to P-states. P-states will impact upon performance in a theoretical sense, but not in a way that is relevant to an HPC application. How is this possible? It is because of how transitions between the P-states happen. It all has to do with processor utilization. The PM SW periodically monitors the processor utilization. If that utilization is less than a certain threshold, it increases the P-state, that is, it enters the next higher power efficiency state. The word term in the previous sentence is “utilization”. When executing your computationally intensive HPC task on the coprocessor, what do you want your utilization to be? Ideally, you want it to be as close to 100% as you can get. Given this maximal utilization, what do you imagine is the P-state your app is executing in? Well, it is P0, the fastest P-state (ignoring turbo mode). Ergo, the more energy efficient P-states are irrelevant to your application since there is almost never a situation where a processor supporting your HPC app will enter one of them.

So in summary, the “HPC” portions of an HPC application will run near 100% utilization. Near 100% utilization pretty much guarantees always using the fastest P-state, P0. Ergo, P-states have essentially no impact upon the performance of an HPC application.

How do I get my application to run in one of those turbo modes? You cannot as it is just too dangerous: it is too easy to make a minor mistake resulting in overheating and damaging the coprocessor.

Next: P-states: Of package states and turbo mode.

For more complete information about compiler optimizations, see our Optimization Notice.