The current version of the Inference Engine comprises a core library, four hardware-specific libraries, a plugin for Intel® Xeon®, Intel® Core™ Processors with Intel® AVX2, Intel Atom® Processors ("CPU Plugin"), a plugin for Intel® HD Graphics ("GPU Plugin"), a plugin for Intel® Arria® A10 discrete cards ("FPGA Plugin"), and few third party libraries.
The following acronyms and terms are used in this document:
Single-precision floating-point format
Half-precision floating-point format
The plugin for Intel® Processors
The plugin for Intel® HD Graphics
The plugin for Intel® Arria® 10 discrete cards
Deploying convolutional neural networks (CNN) from the training environment to embedded platforms for inference might be a complex task that introduces a number of technical challenges that must be addressed:
There are a number of deep learning frameworks widely used in the industry, such as Caffe, TensorFlow, CNTK, MXNet, etc.
Typically the training of the deep learning networks is performed in data centers or server farms while the inference might take place on embedded platforms, optimized for performance and power consumption. Such platforms are typically limited both from software perspective (programming languages, third party dependencies, memory consumption, supported operating systems), and from hardware perspective (different data types, limited power envelope), so usually it is not recommended (and sometimes just impossible) to use original training framework for network inference. An alternative solution would be to use dedicated inference APIs that are well optimized for specific hardware platforms.
Additional complications of the deployment process include supporting various layer types and networks that are getting more and more complex. Obviously, ensuring the accuracy of the transforms networks is not trivial.
The process assumes that you have a network model trained using one of the supported frameworks: Caffe*, TensorFlow* or MXNet*. Before starting, make sure that the proper deep learning framework is installed on your development machine. If not, see the following topics:
pip. Then the installation command could be:
pip3 install -Iv mxnet==0.11
The scheme below illustrates the typical workflow for performing inference of a trained deep neural network model:
The steps are:
- Configure Model Optimizer for the specific framework (used to train your model).
- Run Model Optimizer to produce an optimized Intermediate Representation (IR) of the model based on the certain network topology, weights and biases values, and other parameters.
- Test the model in the IR format using the Inference Engine in the target environment via provided Inference Engine sample applications.
- Integrate Inference Engine in your application to deploy the model in the target environment.
Model Optimizer is a cross-platform command line tool that facilitates the transition between the training and deployment environment, performs static model analysis and automatically adjusts deep learning models for optimal execution on end-point target devices.
Model Optimizer is designed to support multiple deep learning frameworks while Caffe, TensorFlow or MXNet are currently enabled.
Model Optimizer Workflow
The process assumes that you have a network model trained using one of the supported frameworks. Before starting, make sure that the proper deep learning framework is installed on your development machine. If not, see the following topics:
pip. Then the installation command could be:
pip3 install -Iv mxnet==0.11
The Model Optimizer workflow can be described as following:
- Configure Model Optimizer for one of the supported deep learning framework that was used to train the model: Caffe, TensorFlow. Model Optimizer for MXNet requires only the MXNet framework version 0.11 installed.
- Provide as input a trained network that contains a certain network topology, parameters, and the adjusted weights and biases.
- Run Model Optimizer to perform specific model optimizations (for example, horizontal fusion of certain network layers). Exact optimizations are framework-specific, refer to appropriate documentation pages: Model Optimizer for Caffe*, Model Optimizer for TensorFlow*, Model Optimizer for MXNet*.
- Model Optimizer produces as output an Intermediate Representation (IR) of the network which is used as an input for the Inference Engine - a pair of files that describe the whole model:
- Topology file - an XML file that describes the network topology
- Trained data file - a .bin file that contains the weights and biases binary data
Intermediate Representation (IR) format files can be loaded and inferred with Inference Engine. The latter offers a unified API for a number of supported Intel® platforms.
Model Optimizer for Caffe* supports wide range of deep learning topologies:
- Classification models:
- VGG-16, VGG-19
- SqueezeNet v1.0, SqueezeNet v1.1
- ResNet-50, ResNet-101, ResNet-152
- Inception v1, Inception v2, Inception v3, Inception v4
- Object detection models:
- SSD300-VGG16, SSD500-VGG16;
- Yolo v2, Yolo Tiny;
- Face detection models:
- VGG Face;
- Semantic segmentation models:
Model Optimizer for TensorFlow* and MXNet* is avaialbe in the Technical Preview mode. The support of these frameworks is provided for evaluation only. Please be aware that the current interfaces of Model Optimizer for TensorFlow* and MXNet* can be changed. Currently it supports only the list of specific models:
|Model Name||TensorFlow Slim Models
|ResidiualNet-50||V1 Code, V1 Checkpoint|
|ResidiualNet-101||V1 Code, V1 Checkpoint|
|ResidiualNet-152||V1 Code, V1 Checkpoint|
|Inception v1||Code, Checkpoint|
|Inception v3||Code, Checkpoint|
|Inception v4||Code, Checkpoint|
|Model Name||MXNet Models|
|Inception BN||Symbol, Params|
|Fast MRF CNN||Repo|
Inference Engine is a runtime that delivers a unified API to integrate the inference with application logic:
- Takes as input the model. The model presented in the specific form of Intermediate Representation (IR) produced by Model Optimizer.
- Optimizes inference execution for target hardware.
- Delivers inference solution with reduced footprint on embedded inference platforms.
The current version of the Inference Engine supports inference of multiple image classification networks, including AlexNet, GoogLeNet, VGG and ResNet families of networks, fully convolutional networks like FCN8 used for image segmentation, and object detection networks like Faster R-CNN.
The current version of the Inference Engine supports inference on Intel® Xeon® with Intel® AVX2 and AVX512, Intel® Core™ Processors with Intel® AVX2, Intel Atom® Processors with Intel® SSE ("CPU Plugin"), Intel® HD Graphics ("GPU Plugin"), Intel® Arria® A10 discrete cards ("FPGA Plugin"). For more details, refer to the Supported Devices topic.
The Inference Engine can inference models in the FP16 and FP32 format; the supported configurations are given in the table below:
|CPU Plugin||Supported and preferred||Not Supported|
|GPU Plugin||Supported||Supported and Preferred|
|FPGA Plugin||Not Supported||Supported|
The Inference Engine package contains headers, runtime libraries, and sample console applications demonstrating how you can use the Inference Engine in your applications.