Teach Parallel! #6. Professor Wen-mei Hwu, Strategies for Parallelism

Professor Wen-mei Hwu, University of Illinois Urbana-Champaign.  Common Strategies for Paralelism.

It was a treat, indeed an honor, to talk with Professor Wen-mei Hwu on our recent Teach Parallel broadcast.  It was also a bit frustrating in that we really only had about 20 minutes of substantive discussion for a topic on which we could have spent hours.

In a pre-broadcast call, we determined that we would try to tackle 6 topics using professor Hwu's ECE498al course as exemplar:

1. solving data intensive problems using bulk synchronous parallel processors, where regular data access patterns and regular instruction execution patterns are key to high performance.

2. "common strategies" (tricks) that practitioners use to achieve  regular patterns: tiling, data structure padding,  data transposition, data binning, locality based layout,  hierarchical data structures, and loop transformations.
These tricks manifest themselves differently in different  types of applications and different types of hardware architectures.

3. The course relies on a three sets of case studies to teach these common strategies. The micro case
studies include matrix-matrix multiplication, reduction, and pre-fix scan to illustrate the concepts involved in the strategies. The application case studies include a MRI reconstruction example, a molecular dynamics example, and a computational fluid dynamics example to illustrate how one can go from sequential code to high-performance parallel code by applying the strategies. The project case studies are selected by students and often mentored by domain experts to give students real parallel application development experience.

4. The course is currently mostly taken by graduate students from a wide range of disciplines: Physics, Chemistry, Mechanical, Civil, and Electrical. Many  students take this course because of their thesis research needs.

5. The tricks often conflict with software engineering practice. Currently, it is hard to see an elegant way of injecting the course into the mainstream undergraduate CS curriculum. At the same time, these tricks cannot be ignored by mainstream CS education for much longer since they are the reality of effective parallel programming.

6. Some of the work that we are doing at the Intel/Microsoft Illinois UPCRC is to automate  these tricks. With automation, the programmers focus on giving the compilers and tools accurate information about their data structure shapes and boundaries so that the tools can perform these tricks in a reliable way. Such advancement may be what it takes to move course such as ECE498al into the mainstream CS undergraduate curriculum.

What would particularly intrigue me would be to dive down more into the specifics of wen-mei's "tricks" and begin to understand how these could be incorporated into the undergraduate curriculum.

Fortunately, Wen-mei has already agreed to return in the Fall when Tom and I kick-off our deep dive series that we have tebnttively titled TeachParallel++ (pace Bjarne).

Until then, please watch the video or listen to the podcast and let us know your thoughts
For more complete information about compiler optimizations, see our Optimization Notice.