The Last 25 Years of Parallel Computing

I am back from a very interesting 25th year anniversary of the IPDPS conference in Anchorage Alaska. I was able to interact with a number of professors and bounce ideas around on parallel education.

To learn more about what occurred at IPDPS take a look at Lauren Dankiewicz' blog where she lays out the conference processing with links to video coverage of various keynotes and panels.

I listened attentively to the panel which looked back on 25 years in parallel and distributed computing. I cannot adequately summarize each panelists view but I have included a link to the broadcast so you can view & hear these opinions first hand. The panel consisted of:
Moderator: Yves Robert, Ecole Normale Supérieure de Lyon, France
William (Bill) Dally, Stanford & NVIDIA
Jack Dongarra, University of Tennessee & Oak Ridge National Laboratory
Satoshi Matsuoka, Tokyo Institute of Technology, Japan
Rob Schreiber, HP Labs, Palo Alto, CA
Arnold Rosenberg from University of Massachusetts, Amherst
Uzi Vishkin, University of Maryland

Speakers were asked to address what went right, what went wrong, what were the striking events and the big surprises which have arisen in the past 25 years. My thoughts about this panel session are included below and I encourage readers to watch the video for this panel discussion to formulate their own take-aways.

Some of the positive vibes from this panel included how the last 25 years have seen impressive gains in performance as a result of parallelism. LINPAC numbers have improved from 70 -80 flops on a single 1980’s vintage 6800 based processor on LINPAC 30 years ago, to 1.2 PetaFlops today - a 150 trillion fold increase. Advances during these same years have demonstrated the value of parallel computing to an audience much wider than the typical IPDPS audience and the need for parallel education is being noted by key panelists at the conference.

Rob Schreiber, a mathematician, gave a list of necessary ideas that historically had to be tested but which he contends were bad ideas from the past. Among these ideas were Amdahl's law, "weak scaling" AKA Gustafson-Baris Law, Automatic Parallelization via compilers, High Performance Fortran, RISC & VLIW architectures, external accelerators (GPGPU).

I disagree with Rob's contention that “weak scaling” supported by Gustafson-Baris was a bad idea. I think history has proven Gustafson-Baris right - the clear trend in computing has been to make simulations more real, more detailed, solve larger more complicated problems. Yes we all strive hard to do this in real time as well. Bill Dally from Stanford also took exception with "weak scaling". Dally said that people don't want a bigger version of "angry birds", the popular Smartphone app. Since most of our personal compute devices have many cores now days, he cited an example of a phone that has 70-80 cores (depending on how you define a core), he contends that we need to tackle “strong scaling”, making existing applications faster to make parallelism useful to users. His contention is that strong scaling is the big challenge going forward and he argues that we cannot continue to rest on weak scaling to come to our rescue going forward. Again, I disagree with him on this point. I think the ability to do new things, solve harder problems, to make apps behave in ever more real ways with greater resolution has been a clear historical trend and ultimately is what end users care about. My counter to Bill's "Angry Birds" argument is that I believe people don't care if MS Word runs any faster, but they do care about access to detailed medical imaging that might prevent invasive surgical procedures.

Bill did make what I thought was a excellent point: most parallel systems today are really serial or nearly serial processors bolted together through their IO channels and as a result communication has microsecond level latency that programmers have to contend with. This makes the programming parallel systems much harder because programmers now have four parallel programming challenges to deal with rather than simply three. The "easy" three are parallelism, locality, load balance. The tough challenge is programming around differing latency issues that are man-made artifacts. He pointed to examples of his own work at MIT years ago that demonstrated that short latency communication is possible at an architectural level (see his J Machine work).

I also agree with Dally’s premise that academia, even at his own institution at Stanford has not focused appropriately on parallel computing and the state has been "appalling". He argued that a single course on parallel computing was insufficient and cited an example of a typical algorithms course which still taught complexity theory as if Floating Operations were important! He says FLOPs are NOT important! What matters is data movement. This implies there is much to be done to revamp today's algorithms courses to make them useful for real applications on real architectures.

Rob's inclusion of external accelerators (think GPGPU) as a bad idea from the past may need some explanation. His argument rests on how difficult they have been to program in the past - his term is PITB - to program. He said they have historically always been faster than the CPU's of the day but have also been PITB to use. He did say, however, that when these accelerators have been integrated into the main CPU they became easier to program – case in point floating point accelerators.

The past 25 or so years in computing have been an exciting time and now new and lofty challenges are cropping up, such as the ever increasing impact of power consumption, how to minimize data movement and improve communication speed and lower latency, among others. Stay tuned for my review of the panelist views of the next 25 years.

Bob C

For more complete information about compiler optimizations, see our Optimization Notice.