Introduction

The current version of the Inference Engine comprises a core library, four hardware-specific libraries, a plugin for Intel® Xeon®, Intel® Core™ Processors with Intel® AVX2, Intel Atom® Processors ("CPU Plugin"), a plugin for Intel® HD Graphics ("GPU Plugin"), a plugin for Intel® Arria® A10 discrete cards ("FPGA Plugin"), and few third party libraries.

Terminology

The following acronyms and terms are used in this document:

Acronym/Term

Description

DL

Deep Learning

FP32 format

Single-precision floating-point format

FP16 format

Half-precision floating-point format

CPU Plugin

The plugin for Intel® Processors

GPU Plugin

The plugin for Intel® HD Graphics

FPGA Plugin

The plugin for Intel® Arria® 10 discrete cards

Deployment Problem

Deploying convolutional neural networks (CNN) from the training environment to embedded platforms for inference might be a complex task that introduces a number of technical challenges that must be addressed:

  • There are a number of deep learning frameworks widely used in the industry, such as Caffe, TensorFlow, CNTK, MXNet, etc.

  • Typically the training of the deep learning networks is performed in data centers or server farms while the inference might take place on embedded platforms, optimized for performance and power consumption. Such platforms are typically limited both from software perspective (programming languages, third party dependencies, memory consumption, supported operating systems), and from hardware perspective (different data types, limited power envelope), so usually it is not recommended (and sometimes just impossible) to use original training framework for network inference. An alternative solution would be to use dedicated inference APIs that are well optimized for specific hardware platforms.

  • Additional complications of the deployment process include supporting various layer types and networks that are getting more and more complex. Obviously, ensuring the accuracy of the transforms networks is not trivial.

Deployment Workflow

The process assumes that you have a network model trained using one of the supported frameworks: Caffe*, TensorFlow* or MXNet*. Before starting, make sure that the proper deep learning framework is installed on your development machine. If not, see the following topics:

To install MXNet, make sure you have Python* 3.5 installed and use pip. Then the installation command could be:
pip3 install -Iv mxnet==0.11

The scheme below illustrates the typical workflow for performing inference of a trained deep neural network model:

The steps are:

  1. Configure Model Optimizer for the specific framework (used to train your model).
  2. Run Model Optimizer to produce an optimized Intermediate Representation (IR) of the model based on the certain network topology, weights and biases values, and other parameters.
  3. Test the model in the IR format using the Inference Engine in the target environment via provided Inference Engine sample applications.
  4. Integrate Inference Engine in your application to deploy the model in the target environment.

Model Optimizer 

Model Optimizer is a cross-platform command line tool that facilitates the transition between the training and deployment environment, performs static model analysis and automatically adjusts deep learning models for optimal execution on end-point target devices.

Model Optimizer is designed to support multiple deep learning frameworks while Caffe, TensorFlow or MXNet are currently enabled.

Model Optimizer Workflow

The process assumes that you have a network model trained using one of the supported frameworks. Before starting, make sure that the proper deep learning framework is installed on your development machine. If not, see the following topics:

To install MXNet, make sure you have Python* 3.5 installed and use pip. Then the installation command could be:
pip3 install -Iv mxnet==0.11

The Model Optimizer workflow can be described as following:

  • Configure Model Optimizer for one of the supported deep learning framework that was used to train the model: Caffe, TensorFlow. Model Optimizer for MXNet requires only the MXNet framework version 0.11 installed.
  • Provide as input a trained network that contains a certain network topology, parameters, and the adjusted weights and biases.
  • Run Model Optimizer to perform specific model optimizations (for example, horizontal fusion of certain network layers). Exact optimizations are framework-specific, refer to appropriate documentation pages: Model Optimizer for Caffe*, Model Optimizer for TensorFlow*, Model Optimizer for MXNet*.
  • Model Optimizer produces as output an Intermediate Representation (IR) of the network which is used as an input for the Inference Engine - a pair of files that describe the whole model:
    • Topology file - an XML file that describes the network topology
    • Trained data file - a .bin file that contains the weights and biases binary data

Intermediate Representation (IR) format files can be loaded and inferred with Inference Engine. The latter offers a unified API for a number of supported Intel® platforms.

Model Optimizer for Caffe* supports wide range of deep learning topologies:

  • Classification models:
    • AlexNet
    • VGG-16, VGG-19
    • SqueezeNet v1.0, SqueezeNet v1.1
    • ResNet-50, ResNet-101, ResNet-152
    • Inception v1, Inception v2, Inception v3, Inception v4
    • CaffeNet
    • MobileNet
  • Object detection models:
    • SSD300-VGG16, SSD500-VGG16;
    • Faster-RCNN;
    • Yolo v2, Yolo Tiny;
  • Face detection models:
    • VGG Face;
  • Semantic segmentation models:
    • FCN8;

Model Optimizer for TensorFlow* and MXNet* is avaialbe in the Technical Preview mode. The support of these frameworks is provided for evaluation only. Please be aware that the current interfaces of Model Optimizer for TensorFlow* and MXNet* can be changed. Currently it supports only the list of specific models:

Supported Models

Model Name TensorFlow Slim Models
rev. 09f32cea15
VGG-16 Code, Checkpoint
VGG-19 Code,Checkpoint
ResidiualNet-50 V1 Code, V1 Checkpoint
ResidiualNet-101 V1 Code, V1 Checkpoint
ResidiualNet-152 V1 Code, V1 Checkpoint
Inception v1 Code, Checkpoint
Inception v3 Code, Checkpoint
Inception v4 Code, Checkpoint

 

Model Name MXNet Models
VGG-16 Symbol, Params
VGG-19 Symbol, Params
SqueezeNet_v1.1 Symbol, Params
Inception BN Symbol, Params
CaffeNet Symbol, Params
SSD-ResNet-50 Repo, Model
Fast MRF CNN Repo

Inference Engine  

Inference Engine is a runtime that delivers a unified API to integrate the inference with application logic:

  • Takes as input the model. The model presented in the specific form of Intermediate Representation (IR) produced by Model Optimizer.
  • Optimizes inference execution for target hardware.
  • Delivers inference solution with reduced footprint on embedded inference platforms.

The current version of the Inference Engine supports inference of multiple image classification networks, including AlexNet, GoogLeNet, VGG and ResNet families of networks, fully convolutional networks like FCN8 used for image segmentation, and object detection networks like Faster R-CNN.

The current version of the Inference Engine supports inference on Intel® Xeon® with Intel® AVX2 and AVX512, Intel® Core™ Processors with Intel® AVX2, Intel Atom® Processors with Intel® SSE ("CPU Plugin"), Intel® HD Graphics ("GPU Plugin"), Intel® Arria® A10 discrete cards ("FPGA Plugin"). For more details, refer to the Supported Devices topic.

The Inference Engine can inference models in the FP16 and FP32 format; the supported configurations are given in the table below:

Plugin FP32 FP16
CPU Plugin Supported and preferred Not Supported
GPU Plugin Supported Supported and Preferred
FPGA Plugin Not Supported Supported

The Inference Engine package contains headers, runtime libraries, and sample console applications demonstrating how you can use the Inference Engine in your applications.

See Also

For more complete information about compiler optimizations, see our Optimization Notice.