The Parallel Universe Issue #34: Edge-to-Cloud Heterogeneous Parallelism with OpenVINO™ Toolkit

By Henry A. Gabb, Published: 09/26/2018, Last Updated: 09/26/2018

In a previous blog, I mentioned that I used to dread the heterogeneous parallel computing future. Ordinary parallelism was hard enough. Spreading concurrent operations across different processor architectures would add a level of complexity beyond my programming ability. Fortunately, as the University of California at Berkeley Parallel Computing Laboratory predicted in 2006, this increasing complexity would force a greater separation of concerns between domain experts and tuning experts. (See The Landscape of Parallel Computing Research: A View from Berkeley for details.) For example, I know how to apply the Fast Fourier Transform in my scientific domain, but I would never dream of writing an FFT myself because experts have already done it for me. I can just use their libraries to get all the benefit of their expertise.

With that in mind, James Reinders, our editor emeritus, joins us again to continue his series on FPGA programming—this time, to show us how to exploit heterogeneous parallelism using Intel’s new OpenVINO™ toolkit (which stands for open visual inference and neural network optimization). OpenVINO™ Toolkit and FPGAs describes this toolkit to incorporate computer vision in applications that span processor architectures, including FPGAs, from edge devices all the way to cloud and data center. OpenVINO toolkit encapsulates the expertise of computer vision and hardware experts and makes it accessible to application developers. (I also interviewed James recently about the future of Intel® Threading Building Blocks (Intel® TBB). Among other things, we discuss how the Intel TBB parallel abstraction embodies the separation the concerns between application developers and parallel runtime developers, and how the Intel TBB Flow Graph API could provide a path to heterogeneous parallelism. You can find this interview on the Tech.Decoded knowledge hub.)

The remaining articles in this issue start close to the metal and gradually move up the hardware/software stack. Floating-Point Reproducibility in Intel® Software Tools discusses the inexactness of binary floating-point representations and how to deal with it using the Intel® compilers and performance libraries. Comparing C++ Memory Allocation Libraries does just what the title says. It compares the performance of various C++ memory allocation libraries using two off-the-shelf benchmarks, then digs into the profiles using Intel® VTune™ Amplifier to explain the often significant performance differences. Moving a little higher up the stack, LIBXSMM: An Open-Source-Based Inspiration for Hardware and Software Development at Intel describes a library that's part research tool and part just-in-time code generator for high-performance small matrix multiplication—an important computational kernel in convolution neural networks and many other algorithms.

At the application level of the hardware/software stack, Advancing the Performance of Astrophysics Simulations with ECHO-3DHPC, from our collaborators at CEA Saclay and LRZ, describes how they optimized one of their critical applications using various tools in Intel® Parallel Studio XE. Finally, at the top of the stack, we have Your Guide to Understanding System Performance. This article gives an overview of the Platform Profiler tech preview feature in Intel® VTune™ Amplifier. As the name implies, Platform Profiler monitors the entire platform to help diagnose system configuration issues that affect performance.

Future issues of The Parallel Universe will bring you articles on parallel computing using Python*, new approaches to large-scale distributed data analytics, new features in Intel® software tools, and much more. In the meantime, check out Tech.Decoded for more information on Intel solutions for code modernization, visual computing, data center and cloud computing, data science, and systems and IoT development.

Henry A. Gabb
October 2018

Read This Issue



Product and Performance Information


Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804