In a previous blog, I mentioned that I used to dread the heterogeneous parallel computing future. Ordinary parallelism was hard enough. Spreading concurrent operations across different processor architectures would add a level of complexity beyond my programming ability. Fortunately, as the University of California at Berkeley Parallel Computing Laboratory predicted in 2006, this increasing complexity would force a greater separation of concerns between domain experts and tuning experts. (See The Landscape of Parallel Computing Research: A View from Berkeley for details.) For example, I know how to apply the Fast Fourier Transform in my scientific domain, but I would never dream of writing an FFT myself because experts have already done it for me. I can just use their libraries to get all the benefit of their expertise.
With that in mind, James Reinders, our editor emeritus, joins us again to continue his series on FPGA programming—this time, to show us how to exploit heterogeneous parallelism using Intel’s new OpenVINO™ toolkit (which stands for open visual inference and neural network optimization). OpenVINO™ Toolkit and FPGAs describes this toolkit to incorporate computer vision in applications that span processor architectures, including FPGAs, from edge devices all the way to cloud and data center. OpenVINO toolkit encapsulates the expertise of computer vision and hardware experts and makes it accessible to application developers. (I also interviewed James recently about the future of Intel® Threading Building Blocks (Intel® TBB). Among other things, we discuss how the Intel TBB parallel abstraction embodies the separation the concerns between application developers and parallel runtime developers, and how the Intel TBB Flow Graph API could provide a path to heterogeneous parallelism. You can find this interview on the Tech.Decoded knowledge hub.)
The remaining articles in this issue start close to the metal and gradually move up the hardware/software stack. Floating-Point Reproducibility in Intel® Software Tools discusses the inexactness of binary floating-point representations and how to deal with it using the Intel® compilers and performance libraries. Comparing C++ Memory Allocation Libraries does just what the title says. It compares the performance of various C++ memory allocation libraries using two off-the-shelf benchmarks, then digs into the profiles using Intel® VTune™ Amplifier to explain the often significant performance differences. Moving a little higher up the stack, LIBXSMM: An Open-Source-Based Inspiration for Hardware and Software Development at Intel describes a library that's part research tool and part just-in-time code generator for high-performance small matrix multiplication—an important computational kernel in convolution neural networks and many other algorithms.
At the application level of the hardware/software stack, Advancing the Performance of Astrophysics Simulations with ECHO-3DHPC, from our collaborators at CEA Saclay and LRZ, describes how they optimized one of their critical applications using various tools in Intel® Parallel Studio XE. Finally, at the top of the stack, we have Your Guide to Understanding System Performance. This article gives an overview of the Platform Profiler tech preview feature in Intel® VTune™ Amplifier. As the name implies, Platform Profiler monitors the entire platform to help diagnose system configuration issues that affect performance.
Future issues of The Parallel Universe will bring you articles on parallel computing using Python*, new approaches to large-scale distributed data analytics, new features in Intel® software tools, and much more. In the meantime, check out Tech.Decoded for more information on Intel solutions for code modernization, visual computing, data center and cloud computing, data science, and systems and IoT development.
Henry A. Gabb