On the shoulders of giants in parallel computing education

The first tangible step in the recent surge of parallelism on our campus came five years ago last month, when three students (it’s all undergraduates at St. Olaf, a Minnesota liberal-arts college) assembled our first Beowulf cluster, as their work for a one-month team project course.  They constructed the cluster from a few Pentium 2’s and an old 10/100 megabit switch, all being discarded by our IT services department, and used open-source software for everything; I don’t think we spent a dime on it.  The next term a senior Physics student used the “Castaway” cluster to produce a convincing MPI-based simulation of planetary motion in the solar system. Later in 2006, we used grant funding (HHMI) to build a “production” cluster of 16 Sun servers. Undergraduate researchers did all the work.  At first, we focused on science applications.  Things started taking off after hearing Intel’s (Michael Wrinn) and Google’s calls at SIGCSE 2008 for teaching more parallelism.  We built a third cluster from virtual machines running on our classroom and lab computers, and taught our first course in parallel computing (January 2009), collaborating with Libby Shoop of Macalester.  Then Libby and I received NSF funding to produce flexible modules for teaching parallelism at all levels of the undergraduate curriculum, our csinparallel.org project.  Now we have some larger multicore machines in our cluster room, with Infiniband networking on the way, and lots of activity – it’s all come so fast!

It couldn’t have happened quickly without collaboration.  Working with Libby for the last two years has been remarkably productive, and since the beginning a succession of talented undergraduate research students have become full collaborators in all of this work.

But none of this could have happened at all without the pioneering work of many more folks about 20 years ago.

This is a great time to put up a Beowulf cluster.  Computers are inexpensive (cast-off machines are plentiful at a college), they all come with networking, Ethernet switches are cheap, the open-source software you need is readily available, and there’s plenty of documentation online.  This made it possible for my three inexperienced undergraduates to build one independently and install and benchmark some useful applications in less than a month.  Once you have a cluster, you can start creating interesting parallel programs with MPI, or install the open-source Hadoop map-reduce computing framework and do some scalable computing with massive data sets.  We owe a lot to Donald Becker and Thomas Sterling for conceiving the idea of Beowulf clusters (1993), and for the hundreds of folks who developed and refined the software and practices of cluster computing with commodity components.

Having some parallel systems available to use (whether Beowulf clusters or multicore computers) is invaluable for teaching parallelism, because hands-on learning is so effective.  But having a body of knowledge about parallelism is even more essential for a CS educator.  There again, we owe a great debt to predecessors. Such topics as concurrent programming and the use of parallelism in computer architecture have appeared in national curricular recommendations for CS majors since ACM’s Curriculum ’68, reaching something of a high point in the 1990s.  For example, the ACM/IEEE joint curricular recommendations in 1991 called for at least three hours of instruction in parallel algorithms, and at least three more hours on distributed and parallel programming constructs, including study of the “promise of functional, logic, object-oriented or other special-purpose languages on highly parallel or distributed architectures,” accompanied by parallel programming experience in a language such as Ada, Concurrent Pascal, Occam, or Parlog.  But few institutions incorporated this push towards parallel computing into their CS academic programs.  (Note:  I use parallel computing as a blanket term, encompassing concurrency, parallel architectures, distributed systems, etc.)  The 2001 ACM/IEEE joint curricular recommendations reduced the minimum requirement in parallel algorithms to zero hours, and dropped the knowledge unit on parallelism in programming languages altogether (although they did call for web-inspired client-server applications).

The industry shift to multi-core architectures for commodity computers means that software products cannot stay solely sequential for long and remain competitive in performance.  Our customary exponential improvement in hardware performance will now primarily be achieved by multiplying the number of cores per computer, and this creates a natural mandate for virtually all CS students to learn more about parallelism.  The lack of such a mandate undermined efforts to expand parallelism in CS curricula 20 years ago, in spite of the presence of an intellectual body of work that could have supported such an expansion.  Fortunately, we have that foundation to build on now as we prepare for the challenge of teaching every undergraduate CS student more about parallel computation.

Isaac Newton famously remarked, “If I have seen a little further it is by standing on the shoulders of Giants. We now see a new parallel future, and we in CS education owe a debt of appreciation to our predecessors in parallelism, on whose shoulders we stand.  We can also use their help to the extent it’s available-in fact, we need the collaboration of everyone in CS education-as we all face the monumental and immediate task of preparing our CS students for the advance of parallel computation.

Reportez-vous à notre Notice d'optimisation pour plus d'informations sur les choix et l'optimisation des performances dans les produits logiciels Intel.