Intel® Advisor XE Helps You Understand Parallel Efficiency

Today, tuning software isn’t just about making an application run faster, it is also about making sure it is running efficiently.  Across the wide variation in hardware platforms, from mobile processors to supercomputers, performance per watt is becoming an increasingly important consideration in software design.  Parallel programming is often a means of achieving the best performance, and if done well, can also be a way to improve your overall efficiency (performance per watt).   Intel® Advisor XE is a tool designed to help developers intelligently and easily find where to add parallelism to their application. One of the key features of Advisor XE is the ability to predict how well a serial application’s performance could scale once it has been parallelized. After you run the Suitability tool, you can interactively model your application’s potential parallel behavior by selecting the number of cores and parallel framework in the Suitability tool’s report. Figure 1.

Figure 1

The graph shown by the Suitability tool in Advisor XE is divided into three sections: red for slowdowns, yellow for moderate speedups, and green for energy efficient speedups – more on that in a minute.

As a design goal for parallel execution, Advisor XE encourages programmers to focus on parallelism which will scale efficiently with additional cores. Any speedup the program can achieve will improve its performance, but it does come as a cost.  If you only  get a 1.1x speedup by using 32 cores, your power usage went up by approximately 32x, but your benefit was only ~10% resulting in a power-efficiency of only (1.1/32) = 3% of the original program, Ouch!  This is a very expensive way of improving performance.  At the other extreme if you do achieve a 32x speedup by using 32 cores, you have the same power efficiency in the parallel program as in the original serial program, but now it runs 32 times faster.  If this was the whole story, then you could easily conclude that a parallel program is always less power-efficient than a serial program because it’s almost impossible to achieve the theoretical maximum of a perfect linear speedup.

Where parallel computing can help with power-efficiency is when the serial and parallel machines you are targeting are different.  Quite often the parallel machines are designed to run at a lower clock frequency than the available serial machine.  You might think this is bad choice but it has quite important benefits.  The power usage of the CPU in a machine has a roughly quadratic relationship with the voltage used, and the voltage has a roughly linear relationship with the frequency.  This implies from a rule-of-thumb perspective that you can expect a roughly quadratic relationship between power usage, and CPU frequency.  The way the Suitability tool presents its results reflects this relationship. 

The yellow and green regions are separated by a curve proportional to the square root of the number of processors. If the amount of available parallelism is in the green region, the program will continue to see energy efficient speedups as processors become more parallel. If the available parallelism is in the yellow region, then it would be more efficient (in terms of power) to increase the serial throughput via program changes or faster processors. However, sometimes this isn’t possible because of the code structure or because faster processors are not available. In the yellow region, there is still parallelism available; however it will come at the expense of much lower power-efficiency.

When making decisions about power, performance, and parallelization everyone will have different goals and requirements. In the mobile space, performance per watt may be the most important metric. In that case, parallel solutions that fall in the yellow section of the graph may not be reasonable. However, an HPC developer may want to squeeze the last bit of speedup out of an application and is willing to overclock, parallelize, add more hardware, and max out the power supply if the application can complete a fraction sooner. In that case, anything above the red section could be considered. Advisor XE tries to provide information that will be helpful to any and all developers looking to add parallelism in order to take advantage of multi-core processors.

You can download a fully-featured evaluation copy of Intel Advisor XE here. You can find more information about Intel Advisor XE here, and talk with other users in our forum here.

 

For more complete information about compiler optimizations, see our Optimization Notice.