Celebrating 20 Years of OpenMP*
The OpenMP* application programming interface turns 20 this year. To celebrate, we tapped Michael Klemm (the current CEO of the OpenMP Architecture Review Board, or ARB) and some of his colleagues to give an overview of the newest features in the specification―particularly, enhancements to task-based parallelism and offloading computations to specialized accelerators.
Our feature article covers The Present and Future of the OpenMP API Specification, so I’ll say a little about its past. I half-jokingly refer to the early to mid-1990s as the bad old days of high-performance computing (HPC). There were many, many diﬀerent parallel programming models and parallel architectures dotting a fast-changing HPC landscape. For distributed-memory architectures, there were low-level, message-passing methods like SHMEM, high-level, methods like PVM or MPI, and even higher levels of abstraction with High Performance Fortran and Unified Parallel C. For shared-memory architectures, there were low-level threading methods like Pthreads or higher-level compiler-directed threading. One thing was clear: There were no magic compilers that could automatically parallelize real applications. Parallel compiler directives were the next best thing.
For those of us who remember parallel compiler directives before OpenMP, there were many vendor-specific sets to choose from (e.g., Cray, SGI, Intel, Kuck and Associates, Inc.), each doing the same thing but with diﬀerent syntaxes. In exasperation, several large governmental HPC facilities demanded a unified syntax for parallel compiler directives.
OpenMP was born in 1997. Most of the original vendors are still on the ARB, and many more members have been added since (the ARB currently has 29 members). It remains the gold standard for portable, vendor-neutral parallel programming directives because it never lost sight of its original purpose.
Today, MPI and OpenMP cover most application requirements in HPC. There are still challenges. Memory subsystems are as unbalanced as ever, diﬀerent processor architectures now commonly exist within the same system, and keeping data coherent among these diﬀerent processing elements is an additional burden on the programmer. But MPI and OpenMP continue to evolve with these challenges, so the HPC future looks bright.
New Tools for Tuning Serial Performance
Parallelism is great, but would you parallelize code that has not been properly tuned? No, you wouldn’t. So this issue of The Parallel Universe also looks at tuning serial performance. My first supercomputer was a Cray X-MP, so I learned early the importance of vectorization. Vectorization Opportunities for Improved Performance with Intel® Advanced Vector Extensions 512 (Intel® AVX-512) gives a good overview of tuning code with the new Intel AVX-512 instruction set and shows how to use these instructions to expose vectorization opportunities that were not previously possible. The new Intel® Advisor Rooﬂine and Intel® VTune™ Amplifier Memory Analysis features help visualize performance optimization tradeoﬀs and how memory access is aﬀecting an application’s performance. These features are demonstrated in Intel® Advisor Rooﬂine Analysis and Identify Scalability Problems in Parallel Applications. We round out this issue with tips for optimizing general matrix-matrix multiplication operations in the Intel® Math Kernel Library (Reducing Packing Overhead in Matrix-Matrix Multiplication) and a brief overview of Intel software support for machine learning (Intel-Powered Deep Learning Frameworks).
Hello, I’m New Here
Finally, I’d like to introduce myself as the new editor of The Parallel Universe. I’ve been doing HPC since about 1990, but I was originally doing research in computational life science. Each successive research project required more computing power. To stay relevant, I had to learn about performance tuning and parallel programming. My academic background is in biochemistry and genetics, so I resented the intrusion of computer science into my scientific domain. But my initial resistance gave way to fascination when I saw how HPC could change my research and make it possible to answer new and bigger research questions. Hardware and software advances allow me to quickly run simulations on my laptop that once took days on a circa 1995 supercomputer. I used to dread the heterogeneous parallel computing future. Now, I welcome it with the same fascination I had as a young graduate student.