Intel is Number 1 with a Milky Way

No, not the candy bar (though I could really go for a Milky Way Midnight bar right now). I'm thinking of the Milky Way 2 (Tianhe-2) computer system at the National Supercomputing Center in Guangzhou China. This machine incorporates 32,000 12-core Intel Xeon processors (E5-2600 v2) and 48,000 Intel Xeon Phi coprocessors. It has been declared the fastest computer on the planet (maybe even the galaxy?) as it sits on top of the June 2013 TOP500 list of supercomputer systems released at the ISC2013 conference.

This is the second time a Chinese system has occupied the number one slot on a TOP500 list. The system achieved a computation speed of 33.2 Petaflops on the Linpack benchmark. The first China system to reach the number one position on the list was in November 2010 when the Tianhe-1A system was able to attain 2.57 petaflop/s. That system used over 14,000 Intel Xeon processors and over 7000 NVIDIA Tesla GPUs.

Comparing these two machines shows off the latest trend in achieving top-notch performance in HPC computations: the use of accelerators. There is an increase of 13X in the petaflop/s rate in the 31 months between the appearance of Tianhe-1A and Tianhe-2 on the lists. From the “rule of thumb” guidance attributed to Gordon Moore, one might have expected an increase around 4X. Between the two generations of Tianhe, there has been a doubling of the number of processors and almost a 7X increase in the number of accelerator resources incorporated. I think this has more to do with the performance differences than just the raw increase in number of processors and cores. As we march toward exascale machines, I believe the use of accelerator technology will become more common.

One other point I’d like to note is the energy efficiency of the top machines. The number 2 system on the most recent list, Titan at Oak Ridge National Laboratory, has a power rating of 8.2MW. The Tianhe-2 system uses about twice that much power and achieves almost two times the petaflop/s. I can only imagine that the flops per watt rating of future machines will increase as processor manufacturers release even more power efficient chips in the coming years.

For more complete information about compiler optimizations, see our Optimization Notice.

2 comments

Top
Clay B.'s picture

The whole LINPACK benchmark is artificial to some degree, but it does still involve basic computations that HPC systems are frequently called on to be doing. It's been used all these years, so there's not much impetus to change now. While a more "real-life" workload could give a better picture of true system speed for actual computations, which one should be chosen? Though it may not be prefect, the LINPACK does exercise modern processor features like vectorization and parallel execution. My hope is that future processor design decisions aren't based on just being able to execute floating-point multiply-add operations, but will consider more realistic computations being seen today and tomorrow.

I don't know of any standard benchmarks for power consumption on the software side. As with the LINPACK benchmark, it may not be reflective of the overall mix of instructions that HPC systems execute, Flops per Watt is easy to measure and easy to understand. The mix of instructions that applications execute is going to be different for each HPC installation due to the different workloads that the machines are used to compute, so picking something a bit more sophisticated will provoke more controversy than it might be worth. And then, there is all the auxiliary equipment (cooling, monitoring, etc.) that is needed to run an HPC system. Should that be added to the machine's power consumption profile?

One measure that should be done would be to measure the resting power consumption of the machine. That is, while the whole machine sits idling, what amount of power is being drawn? HPC systems are rarely sitting idle, but the difference between idle power used and fully loaded execution gives a good idea about how much power the actual running of jobs requires.

Dmitry Oganezov (Intel)'s picture

Clay is back, Hurrah!

To be honest I don't think the energy efficiency is mesuared in an accurate way. It's based on LINPACK, which is a synthetic benchmark, so a real-life workload would give a different picture. BTW, are there any standard benchmarks for HPC power consumption?

Add a Comment

Have a technical question? Visit our forums. Have site or software product issues? Contact support.