The Last 25 Years of Parallel Computing

I am back from a very interesting 25th year anniversary of the IPDPS conference in Anchorage Alaska. I was able to interact with a number of professors and bounce ideas around on parallel education.

To learn more about what occurred at IPDPS take a look at Lauren Dankiewicz' blog where she lays out the conference processing with links to video coverage of various keynotes and panels.

I listened attentively to the panel which looked back on 25 years in parallel and distributed computing. I cannot adequately summarize each panelists view but I have included a link to the broadcast so you can view & hear these opinions first hand. The panel consisted of:
Moderator: Yves Robert, Ecole Normale Supérieure de Lyon, France
Panelists:
William (Bill) Dally, Stanford & NVIDIA
Jack Dongarra, University of Tennessee & Oak Ridge National Laboratory
Satoshi Matsuoka, Tokyo Institute of Technology, Japan
Rob Schreiber, HP Labs, Palo Alto, CA
Arnold Rosenberg from University of Massachusetts, Amherst
Uzi Vishkin, University of Maryland

Speakers were asked to address what went right, what went wrong, what were the striking events and the big surprises which have arisen in the past 25 years. My thoughts about this panel session are included below and I encourage readers to watch the video for this panel discussion to formulate their own take-aways.

Some of the positive vibes from this panel included how the last 25 years have seen impressive gains in performance as a result of parallelism. LINPAC numbers have improved from 70 -80 flops on a single 1980’s vintage 6800 based processor on LINPAC 30 years ago, to 1.2 PetaFlops today - a 150 trillion fold increase. Advances during these same years have demonstrated the value of parallel computing to an audience much wider than the typical IPDPS audience and the need for parallel education is being noted by key panelists at the conference.

Rob Schreiber, a mathematician, gave a list of necessary ideas that historically had to be tested but which he contends were bad ideas from the past. Among these ideas were Amdahl's law, "weak scaling" AKA Gustafson-Baris Law, Automatic Parallelization via compilers, High Performance Fortran, RISC & VLIW architectures, external accelerators (GPGPU).

I disagree with Rob's contention that “weak scaling” supported by Gustafson-Baris was a bad idea. I think history has proven Gustafson-Baris right - the clear trend in computing has been to make simulations more real, more detailed, solve larger more complicated problems. Yes we all strive hard to do this in real time as well. Bill Dally from Stanford also took exception with "weak scaling". Dally said that people don't want a bigger version of "angry birds", the popular Smartphone app. Since most of our personal compute devices have many cores now days, he cited an example of a phone that has 70-80 cores (depending on how you define a core), he contends that we need to tackle “strong scaling”, making existing applications faster to make parallelism useful to users. His contention is that strong scaling is the big challenge going forward and he argues that we cannot continue to rest on weak scaling to come to our rescue going forward. Again, I disagree with him on this point. I think the ability to do new things, solve harder problems, to make apps behave in ever more real ways with greater resolution has been a clear historical trend and ultimately is what end users care about. My counter to Bill's "Angry Birds" argument is that I believe people don't care if MS Word runs any faster, but they do care about access to detailed medical imaging that might prevent invasive surgical procedures.

Bill did make what I thought was a excellent point: most parallel systems today are really serial or nearly serial processors bolted together through their IO channels and as a result communication has microsecond level latency that programmers have to contend with. This makes the programming parallel systems much harder because programmers now have four parallel programming challenges to deal with rather than simply three. The "easy" three are parallelism, locality, load balance. The tough challenge is programming around differing latency issues that are man-made artifacts. He pointed to examples of his own work at MIT years ago that demonstrated that short latency communication is possible at an architectural level (see his J Machine work).

I also agree with Dally’s premise that academia, even at his own institution at Stanford has not focused appropriately on parallel computing and the state has been "appalling". He argued that a single course on parallel computing was insufficient and cited an example of a typical algorithms course which still taught complexity theory as if Floating Operations were important! He says FLOPs are NOT important! What matters is data movement. This implies there is much to be done to revamp today's algorithms courses to make them useful for real applications on real architectures.

Rob's inclusion of external accelerators (think GPGPU) as a bad idea from the past may need some explanation. His argument rests on how difficult they have been to program in the past - his term is PITB - to program. He said they have historically always been faster than the CPU's of the day but have also been PITB to use. He did say, however, that when these accelerators have been integrated into the main CPU they became easier to program – case in point floating point accelerators.

The past 25 or so years in computing have been an exciting time and now new and lofty challenges are cropping up, such as the ever increasing impact of power consumption, how to minimize data movement and improve communication speed and lower latency, among others. Stay tuned for my review of the panelist views of the next 25 years.

Bob C

For more complete information about compiler optimizations, see our Optimization Notice.

Comments

Bob, Why do you think academia has not focused appropriately on parallel computing? What will need to change to put a bigger focus on its importance?


David, Thanks for the question. Here's my take on this.

Some folks in Academia are focusing on parallel computing but many have not yet addressed the issue. Intel Academic Community has been working with academics throughout the worked for several years. In some command economies, such as China & to some extent India, we have seen the more rapid adoption of parallel computing. In Europe & here in the US, there remain challenges.

One challenge is that in the US, there is still a large degree of autonomy among CS departments. Adoption at one school does not carry over to others necessarily, which slows broad adoption. For some schools it could take years for committees argue about changes, and then require many months of effort to craft new scope & sequence, set new budgets based on newer HW and up to date textbooks ( there still are not many texts covering parallelism for CS1 & 2, algorithms etc).
In some cases, teachers have taught courses for many years and have it tuned as is. Another challenge is in trying to determine the core set of topics that most adequately train young computer scientists to be ready to change the world. The range of topics covered in many CS curricula has exploded: architecture, CS 1, CS2, algorithms, structures, OS, gaming, networks, network security, web programming, database design, etc. Sometimes, I am asked, what do I NOT teach, if you want me to add parallelism? The answer is not always so obvious.

To address the autonomy issue more head on, our team has transitioned over the last few years to working with ACM & IEEE to guide overarching standards. We are working with subcommittees such as TCPP to help drive early adopters of parallel curricula.

We have worked with schools such as UC Berkeley & U of Illinois Urbana Champagne and of course they are adopting these topics in their courses. Other schools such as Arizona State University and USC are adapting parallelism into some of their featured CS areas, particularly, gaming! Liberal Arts schools such as St. Olaf in Minnesota, and Kent State in Ohio, as well as some state universities such as Matt Wolf and others at Georgia Tech, and Dan Grossman at the University of Washington are adopting parallelism in their broader curriculums. Professor Dick Brown at St Olaf has done some very interesting coursework around map-reduce which he has been sharing with the broader community. Charlie Peck of Earlham College and Tom Murphy of Contra Costa have been championing lower cost tiny atom based clusters so that smaller schools can have HW platforms to use.

What I have observed is the schools that are making progress in adopting parallelism are the schools tweaking (not wholesale revamping) an existing curriculum. These schools typically begin mentioning parallelism in CS 1 & 2 then doing more parallel exposure in architectures, algorithms and OS courses. This incremental curriculum approach usually does not require as much heavy lifting convincing department chairs or deans to adopt a wholesale change to curricula.


Bob, this is a comment to

Bob, this is a comment to your "SFTS007 - Explicit Approaches to Parallelization and Vectorization" IDF13 presentation (I couldn't find a direct contact, therefore this random place choice). The code snippet on page 12:


cilk_for (int i=0; i < size; i+=s) {

     int m = std::min(s, size-i);


seems to be incorrect. Shouldn't it be:

cilk_for (int i=0; i < size; i+=s) {

     int m = std::min(i+s, size);

Best regards,
Paul