Boost Deep Learning with Intel® Advanced Vector Extensions 512

  • Overview
  • Transcript

Learn how Intel® Advanced Vector Extensions 512 can accelerate deep learning within Intel® Xeon® Scalable processors.

Hello, my name is Alberto Villarreal. In this short video, I want to give you an introduction to a new feature in the Intel® Xeon® Scalable processor that is designed to accelerate the learning use cases. Deep learning has gained significant attention in the industry by achieving state-of-the-art results in image classification, speech recognition, language translation, object detection, and other applications. Second-generation Intel Xeon Scalable processors led to increased performance of deep learning applications, from cloud to edge devices, while using the same hardware for many other types of workloads.

This is because of new features in these processors such as Intel® Advanced Vector Extensions 512 or Intel® AVX-512, which is a set of instructions that can accelerate performance for demanding computation of tasks. Intel AVX-512 now includes Intel® AVX512-Deep Learning Boost, which has new instructions that accelerate deep learning inference workloads such as image classification, object detection, and others. Let's see how this new technology works.

Research has shown that both deep learning training and inference can be performed with lower numerical precision using 16-bit multipliers for training and 8-bit multipliers or fewer for inference with minimal to no loss in accuracy. The previous generation of Intel Xeon Scalable processors enabled lower precision for inference using the Intel AVX 512 instruction set. These instructions enable lower-precision multiplies with higher-precision accumulates. As shown in this figure, multiplying two 8-bit values and accumulating the result of 32 bits requires three instructions with the accumulation in Int32 format.

The new generation of Intel Xeon Scalable Processors now include Intel AVX-512 Deep Learning Boost, which enables 8-bit multiplies with 32-bit accumulates with one single instruction. The three instructions used in the previous generation are now fused into the new instruction. This allows for significantly more performance with less memory requirements.

We can use this new functionality in several ways. First, let me show you how to take advantage of the Intel AVX-512 Deep Learning Boost via functionality available in the Intel® Math Kernel Library for Deep Neural Networks or Intel® MKL-DNN. Intel MKL-DNN is an open-source performance library for deep learning applications intended for acceleration of deep learning frameworks on Intel architecture. It contains vectorized and threaded building blocks that you can use to implement Deep Neural Networks.

This is a good way to make use of the deep learning primitives that are already optimized to run on Intel processors. You can simply use any of the deep learning frameworks or libraries. Many are listed here with more coming soon. They use Intel MKL-DNN to benefit from the performance gains offered by Intel Deep Learning Boost.

You can also link your application to Intel MKL-DNN via C or C++ APIs. This way, you can take advantage of deep learning primitives and performance-critical functions that are already optimized to use Intel® Deep Learning Boost. This allows you to develop your own optimized software products or to optimize existing ones.

For example, let us suppose we want to use the C++ API in Intel MKL-DNN to implement a convolution with a rectified linear unit from the AlexNet topology using lower-precision primitives. This diagram shows the flow of operations and data for this example. Notice that we start performing a quantization step to get low-precision representations of data, weights, and biases for the convolution layer. Then we perform the convolution operation using lower-position, and at the end, the output of the computation is dequantized from 8-bit integers into the original floating-point format.

The source code for this example can be found in the Intel MKL-DNN repository. You can go to the main page in the repository and click on the SimpleNet example, where you can find an introduction to 8-bit integer computations, including the quantization process, which converts a given input into a lower-precision format. On this page, you will find a walkthrough of the source code that implements the convolution operation in this example, showing the different steps involved in implementation. You can use this code sample as a basis to create your own network and take advantage of the new Intel AVX-512 Deep Learning Boost functionality.

The complete source code for this example, as well as other examples, tutorials, and installation directions for Intel MKL-DNN can be downloaded from the GitHub* repository listed in the links section. The code samples that I just showed illustrate how you can use the new Intel AVX-512 Deep Learning Boost feature to accelerate your applications. Of course, you can also take advantage of these new features by using frameworks and libraries that have already been optimized for Intel AVX-512 Deep Learning Boost.

I hope this information was useful for you. Remember to check out the links provided for resources that you can use to make your artificial intelligence applications run faster. Thanks for watching.