Windows* Machine Learning: AI Acceleration on Intel® Hardware

Introduction

Artificial intelligence (AI) is spanning across server, desktop, and edge markets. Intel has AI solutions spanning server, desktop, and edge markets. From the CPU infrastructure, Intel® Xeon® processors, Intel® Core™ processors, and Intel Atom® processors providing the basis of AI processing, to dedicated acceleration provided by Intel Iris® Graphics, the low wattage Intel® Movidius vision processing unit (VPU), new Intel® Gaussian Network Accelerator  (GNA), Mobileye* automotive technology, and dedicated Intel® field-programmable gate array (FPGA) custom integration, the Intel® AI product offerings span the gamut of applications.

Windows* Machine Learning (ML) is an inference engine running on the edge on the Windows operating system (OS) and provides a very simple developer interface that will be optimized under the hood for Intel hardware.

Intel is working very closely with Microsoft to ensure that the hardware optimization using Windows ML are state of the art acceleration of model evaluation.

Windows* ML: Developer Benefits

The Windows ML API, available from the April 2018 Windows® 10 update, contains a very simple programming model. All the application needs to do is load the pretrained model, bind data to the model, and evaluate the model against the data. All other acceleration is optimized in the underlying layers to give the best performance on Intel hardware.

 

Windows ML on Intel Hardware

On Intel hardware specifically, the Windows ML stack (featured below), influences both the power of the Intel® Advanced Vector Extensions 512 (Intel® AVX-512) CPU instruction set and the power of the DirectX* 12 compute pipeline to accelerate execution on the integrated graphics.

This stack features the Windows ML inference engine, which provides the understanding of the model, similar to other inference engines in the market today. The Direct ML abstraction layer selects the target hardware for evaluation and executes the model with either a hand-optimized Intel AVX-512 instruction set-based CPU implementation of the model, or issues shaders employing High-Level Shading Language to the underlying hardware through the Direct3D* DirectX 12 interface.

Intel will be adding dedicated low-power AI accelerators to the mixture, in the form of the newly announced Movidius VPU. Also, in the Windows 10 fall OS update, Intel and Microsoft are working closely together to offer model-level operator acceleration, by employing a new Meta Command interface in the DirectX 12 layer.

 

Using Per Operator Acceleration (Meta Command)

Intel and Microsoft are working closely together to bring the best performance on Intel® integrated graphics devices. As the model is evaluated, the Windows ML software stack will query the driver to determine whether an optimized version of a specific model operation is available. If so, it will use the accelerated operation with no necessary changes to the application.

The operations are inside the driver and available to the application with the next available OS update, and a driver update. They are possible by using hand-optimized kernels that make the best use of Intel integrated graphics by being highly efficient on the execution units, and employing optimal caching strategies for the hardware. These kernels are turned by Intel experts for Intel hardware, with no needed changes to the application by the developer to control the new accelerated capability.

Based on actual AI imaging models, the first accelerated operation is focused on convolution. This operator is basically a large matrix multiplication across the entire image and, as such, is extremely expensive. The initial results are extremely promising and are expected to get even better over time!

The overall results will also improve with time, as more operators are accelerated, with more OS and driver-level improvements enabling more acceleration to exist in ecosystem models through simple OS and driver updates.

So, to get the best performance over time on Intel integrated graphics, use the new Windows ML API and gain performance with no application changes, with additional optimizations via meta commands coming later this year.

The next article presents Windows ML Performance Improvements on Intel Integrated Graphics.

For more complete information about compiler optimizations, see our Optimization Notice.