Inference Engine Developer Guide

Deployment Challenges

Deploying deep learning networks from the training environment to embedded platforms for inference is a complex task that introduces technical challenges, such as:

  • Several deep learning frameworks are widely used in the industry, such as Caffe*, TensorFlow*, MXNet*, among others
  • Training deep learning networks is typically performed in data centers or server farms and the inference often take place on embedded platforms that are optimized for performance and power consumption.
    These platforms are typically limited from the software perspective:
    • programming languages
    • third party dependencies
    • memory consumption
    • supported operating systems
    and the platforms are limited from the hardware perspective:
    • different data types
    • limited power envelope
    Because of these limitations, it is usually not recommended, and sometimes not possible, to use original training framework for inference. As an alternative, use dedicated inference APIs that are optimized for specific hardware platforms.

For these reasons, ensuring the accuracy of the transforms networks can be a complex task.

Deployment Workflow

The Inference Engine deployment process assumes you used the Model Optimizer to convert your trained model to an Intermediate Representation. The scheme below illustrates the typical workflow for deploying a trained deep learning model.

Intel Computer Vision Basic Workflow

A summary of the steps for optimizing and deploying a trained model:

  1. Configure the Model Optimizer for your framework.
  2. Convert a trained model to produce an optimized Intermediate Representation (IR) of the model based on the trained network topology, weights, and biases values.
  3. Test the model in the Intermediate Representation format using the Inference Engine in the target environment by the Validation application or the sample applications.
  4. Integrate the Inference Engine in your application to deploy the model in the target environment.

Introduction to the Inference Engine

After you created an Intermediate Representation of your model with the Model Optimizer, use the Inference Engine to infer input data.

The Inference Engine is a C++ library with a set of C++ classes to infer input data (images) and get a result. The C++ library provides an API to read the Intermediate Representation, set the input and output formats, and execute the model on devices.

To learn about how to use the Inference Engine API for your application, see the Integrating Inference Engine in Your Application section.

The complete API reference is available in the offline package documentation:

  1. Go to <INSTALL_DIR>/deployment_tools/documentation/ where <INSTALL_DIR> is the installation directory of the Intel® Distribution of OpenVINO™ toolkit.
  2. Open index.html in an Internet browser.
  3. Select API References from the menu at the top of the screen.
  4. From the API References page, select Inference Engine API References.

NOTE: For information about the "legacy" Inference Engine API from previous releases (lower than 2018 R1), see the Integrating Inference Engine in Your Application (legacy API). It is best to stop using the legacy API since it will be removed in a future product release.

Inference Engine uses a plugin architecture. Inference Engine plugin is a software component that contains complete implementation for inference on a certain Intel® hardware device: for example, CPU, GPU, VPU, FPGA. Each plugin implements the unified API and provides additional hardware-specific APIs.

Modules in the Inference Engine Package

Your application must link to the core Inference Engine library and to the C++ header files in the include directory.

The library contains the classes for:

  • Linux* OS: libinference_engine.so
  • Windows* OS: inference_engine.dll

Device-Specific Plugin Libraries

Each supported target device has a plugin, which is a DLL/shared library that contains complete implementation for inference on this particular device. The following plugins are available:

PluginDevice type
GPU pluginIntel® Processor Graphics, including Intel® HD Graphics and Intel® Iris® Graphics
CPU pluginIntel® Xeon® with Intel® AVX2 and AVX512, Intel® Core™ Processors with Intel® AVX2, Intel® Atom® Processors with Intel® SSE
FPGA pluginIntel® Arria® 10 GX FPGA Development Kit, Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA, Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA
MYRIAD pluginIntel® Movidius™ Neural Compute Stick powered by the Intel® Movidius™ Myriad™ 2 VPU,
Intel® Movidius™ Neural Compute Stick 2 powered by the Intel® Movidius™ Myriad™ X VPU
HDDL pluginIntel® Vision Accelerator Design with Intel® Movidius™ VPUs
GNA pluginIntel® Speech Enabling Developer Kit, Amazon Alexa* Premium Far-Field Developer Kit, Intel® Pentium® Silver processor J5005, Intel® Celeron® processor J4005, Intel® Core™ i3-8121U processor
HETERO pluginEnables computing for inference on one network on several Intel® devices

NOTE: When using the HETERO plugin, use the literal strings in the Target column in the getPluginByDevice method. For more information, see the getPluginByDevice API in the in-package documentation.

The table below shows the relationship between libraries and targets.

TargetLinux Library NameLinux Dependency LibrariesWindows Library NameWindows Dependency Libraries
CPUlibMKLDNNPlugin.solibmklml_tiny.so, libiomp5md.soMKLDNNPlugin.dllmklml_tiny.dll, libiomp5md.dll
GPUlibclDNNPlugin.solibclDNN64.soclDNNPlugin.dllclDNN64.dll
FPGAlibdliaPlugin.solibdla_compiler_core.sodliaPlugin.dllNo dependencies
MYRIADlibmyriadPlugin.soNo dependenciesmyriadPlugin.dllNo dependencies
HDDLlibHDDLPlugin.solibbsl.so, libhddlapi.so, libmvnc-hddl.soHDDLPlugin.dllbsl.dll, hddlapi.dll, json-c.dll, libcrypto-1_1-x64.dll, libssl-1_1-x64.dll, mvnc-hddl.dll
GNAlibGNAPlugin.solibgna_api.soGNAPlugin.dllgna.dll
HETEROlibHeteroPlugin.soSame as for selected pluginsHeteroPlugin.dllSame as for selected plugins

Make sure those libraries are in your computer's path or in the place you pointed to in the plugin loader. Make sure each plugin-related dependency is added to the envoronment variable:

  • Linux: LD_LIBRARY_PATH
  • Windows: PATH

On Linux, use the script bin/setupvars.sh to set the environment variables.

On Windows, run the bin\setupvars.bat file to set the environment variables.

Common Workflow for the Inference Engine API

  1. Read the Intermediate Representation - Using the InferenceEngine::CNNNetReader class, read an Intermediate Representation file into a CNNNetwork class. This class represents the network in host memory.
  2. Prepare inputs and outputs format - After loading the network, specify input and output precision, and the layout on the network. For these specification, use the CNNNetwork::getInputInfo() and CNNNetwork::getOutputInfo()
  3. Select Plugin - Select the plugin on which to load your network. Create the plugin with the InferenceEngine::PluginDispatcher load helper class. Pass per device loading configurations specific to this device, and register extensions to this device.
  4. Compile and Load - Use the plugin interface wrapper class InferenceEngine::InferencePlugin to call the LoadNetwork() API to compile and load the network on the device. Pass in the per-target load configuration for this compilation and load operation.
  5. Set input data - With the network loaded, you have an ExecutableNetwork object. Use this object to create an InferRequest in which you signal the input buffers to use for input and output. Specify a device-allocated memory and copy it into the device memory directly, or tell the device to use your application memory to save a copy.
  6. Execute - With the input and output memory now defined, choose your execution mode:
    • Synchronously - Infer() method. Blocks until inference finishes.
    • Asynchronously - StartAsync() method. Check status with the wait() method (0 timeout), wait, or specify a completion callback.
  7. Get the output - After inference is completed, get the output memory or read the memory you provided earlier. Do this with the InferRequest GetBlob API.

For more information about integrating the Inference Engine in your your application, see How to Integrate the Inference Engine in Your Application.

Using Inference Engine Samples

The Inference Engine sample applications are simple console applications that demonstrate how to use Intel's Deep Learning Inference Engine in your applications.

Samples in the Samples Directory

The following sample applications are available in the samples directory in the Inference Engine installation directory:

SampleDescription
CPU ExtensionsLibrary with topology-specific layers, like DetectionOutput, used in the SSD
Image Classification SampleInference of image classification networks like AlexNet* and GoogLeNet*. This sample supports only images as inputs.
Image Classification Sample AsyncMaximize performance via pipelined execution, the sample supports only images as inputs
Security Barrier Camera DemoVehicle and License Plate Detection network followed by the Vehicle Attributes and License Plate Recognition networks
Object Detection for Faster R-CNN DemoInference of object detection networks like Faster R-CNN. This demo supports only images as inputs.
Image Segmentation DemoInference of image segmentation networks like FCN8. This demo supports only images as inputs.
Object Detection for SSD Demo, Async API Performance ShowcaseDemo application for SSD-based Object Detection networks, new Async API performance showcase, and simple OpenCV interoperability. This demo supports video and camera inputs.
Object Detection for SSD SampleInference of object detection networks based on the SSD. This sample is a simplified version of Object Detection for SSD Demonstration. This sample supports only images as inputs
TensorFlow* Object Detection Mask R-CNNs Segmentation Demo Inference of image segmentation networks created with Object Detection API. This demo takes only images as input.  
Automatic Speech Recognition SampleAcoustic model inference based on Kaldi neural networks and speech feature vectors.
Neural Style Transfer SampleStyle Transfer sample. This sample supports only images as inputs.
Hello Infer Request Classification SampleInference of image classification networks via Infer Request API. This sample supports only images as inputs.
Interactive Face Detection DemoFace Detection coupled with Age-Gender and Head-Pose. This demo supports both video and camera inputs.
Crossroad Camera DemoPerson Detection followed by the Person Attributes Recognition and Person Reidentification Retail. This demo supports images, videos, and camera inputs.
Multi-Channel Face Detection DemoFace Detection network inference pipeline for multi-channel face detection. This demo takes only cameras as input.
Hello Autoresize Classification SampleImage Classification networks inference using input autoresize API of the Inference Engine. This sample takes only images as input.
Hello Shape Infer SSD SampleObject Detection networks inference Shape Inference feature of the Inference Engine. The sample takes only images as input.
Human Pose Estimation DemoHuman Pose Estimation network inference for predicting a pose. The demo supports videos and cameras as inputs.  

Object Detection YOLO* V3 Demo Async API

Object Detection network inference with YOLOv3* and Async API. This demo takes only videos as input. 
Pedestrian Tracker DemoPerson Detection network inference followed by Person Reidentification network. This demo takes videos and folder with images as inputs.
Smart Classroom DemoFace Detection network inference followed by Landmarks Regression, Face Reidentification, and Person Detection Action Recognition. This demo supports images, videos, and cameras as inputs.
Super Resolution DemoSingle Image Super Resolution network inference. This demo supports only images as inputs.
Validation ApplicationInfers a pack of images, resulting in total accuracy. This sample supports only images as inputs.
Calibration ToolCalibrates an FP32 model so that it can be run in low-precision 8-bit integer mode while keeping the input data in the original precision
Benchmark Application DemoPerforms inference using convolutional networks and outputs benchmark results
LeNet Network Graph Builder SampleDemonstrates constructing the LeNet network using the Graph Builder API
Text Detection DemoDetects multi-oriented scene text on an input image and puts a bounding box around detected area
Perfcheck SampleEstimates performance by calculating minimum, average, and maximum FPS

Media Files Available for Samples

To run the sample applications, you can use images and videos from the media files collection available at https://github.com/intel-iot-devkit/sample-videos.

Samples that Support Pre-Trained Models Shipped with the Product

You are provided several pre-trained models. The table below shows the correlation between models and samples/plugins. The correlation between the plugins and supported devices see in the Supported Devices section. The samples are available in <INSTALL_DIR>/deployment_tools/inference_engine/samples.

ModelSamples Supported on the ModelCPUGPUHETERO:FPGA,CPUMYRIAD
face-detection-adas-0001Interactive Face Detection DemoSupportedSupportedSupportedSupported
age-gender-recognition-retail-0013Interactive Face Detection DemoSupportedSupportedSupportedSupported
head-pose-estimation-adas-0001Interactive Face Detection DemoSupportedSupportedSupportedSupported
emotions-recognition-retail-0003Interactive Face Detection DemoSupportedSupportedSupportedSupported
facial-landmarks-35-adas-0001Interactive Face Detection DemoSupportedSupportedSupported 
vehicle-license-plate-detection-barrier-0106Security Barrier Camera DemoSupportedSupportedSupportedSupported
vehicle-attributes-recognition-barrier-0039Security Barrier Camera DemoSupportedSupportedSupportedSupported
license-plate-recognition-barrier-0001Security Barrier Camera DemoSupportedSupportedSupportedSupported
person-vehicle-bike-detection-crossroad-0078Crossroad Camera DemoSupportedSupportedSupportedSupported
person-attributes-recognition-crossroad-0200Crossroad Camera DemoSupportedSupported  
person-reidentification-retail-0031Crossroad Camera Demo
Pedestrian Tracker Demo
SupportedSupportedSupportedSupported
person-reidentification-retail-0076Crossroad Camera DemoSupportedSupportedSupportedSupported
person-reidentification-retail-0079Crossroad Camera DemoSupportedSupportedSupportedSupported
road-segmentation-adas-0001Image Segmentation DemoSupportedSupported  
semantic-segmentation-adas-0001Image Segmentation DemoSupportedSupported  
person-detection-retail-0013Any demo that supports SSD*-based models
Pedestrian Tracker Demo
SupportedSupportedSupportedSupported
person-detection-retail-0002Any demo that supports SSD*-based modelsSupportedSupportedSupportedSupported
face-detection-retail-0004Any demo that supports SSD*-based modelsSupportedSupportedSupportedSupported
face-person-detection-retail-0002Any demo that supports SSD*-based modelsSupportedSupportedSupportedSupported
pedestrian-detection-adas-0002Any demo that supports SSD*-based modelsSupportedSupportedSupported 
vehicle-detection-adas-0002Any demo that supports SSD*-based modelsSupportedSupportedSupportedSupported
pedestrian-and-vehicle-detector-adas-0001Any demo that supports SSD*-based modelsSupportedSupportedSupported 
person-detection-action-recognition-0003Smart Classroom DemoSupportedSupportedSupported 
landmarks-regression-retail-0009Smart Classroom DemoSupportedSupportedSupported 
face-reidentification-retail-0095Smart Classroom DemoSupportedSupported  
human-pose-estimation-0001Human Pose Estimation DemoSupportedSupportedSupported 
single-image-super-resolution-0063Super Resolution DemoSupported   
single-image-super-resolution-1011Super Resolution DemoSupported   
single-image-super-resolution-1021Super Resolution DemoSupported   
text-detection-0001Text Detection DemoSupportedSupported  

Inferring Your Model with the Inference Engine Samples

Set Your Environment Variables

Use these steps to make sure your application can find the Interface Engine libraries.

Execute the setupvars script to set the environment variables:

  • For Linux:
    source <INSTALL_DIR>/bin/setupvars.sh
  • For Windows:
    <INSTALL_DIR>/bin/setupvars.bat

where <INSTALL_DIR> is the Intel® Distribution of OpenVINO™ toolkit installation directory.

NOTE: The Intel® Distribution of OpenVINO™ toolkit environment variables are removed when you close the shell. For instructions on permanently setting the environment variables, refer to Set the Envitonment Variables section of the Intel® Distribution of OpenVINO™ toolkit installation guide for Linux OS.

To debug or run the samples on Windows in Microsoft Visual Studio, make sure you have properly configured Debugging environment settings for the Debug and Release configurations. Set correct paths to the OpenCV* libraries, and debug and release versions of the Inference Engine libraries. For example, for the Debug configuration, go to the project's Configuration Properties to the Debugging category and set the PATH variable in the Environment field to the following:

PATH=<INSTALL_DIR>\deployment_tools\inference_engine\bin\intel64\Debug;<INSTALL_DIR>\opencv\bin;%PATH%

Where <INSTALL_DIR> is the directory in which the Intel Distribution of OpenVINO toolkit is installed.

Building the Sample Applications on Linux* OS

Supported Linux build environment:

  • Ubuntu* 16.04 LTS 64-bit or CentOS* 7.4 64-bit
  • GCC* 5.4.0 (for Ubuntu* 16.04) or GCC* 4.8.5 (for CentOS* 7.4)
  • CMake* version 2.8 or higher

Use these steps to prepare your Linux computer for the samples:

NOTE: If you have installed the product as a root user, switch to root mode before you continue: sudo -i

NOTE: Make sure you have set environment variables before building the samples.

  1. Navigate to a directory that you have write access to and create a samples build directory. This example uses a directory named build:
    mkdir build

    NOTE: If you ran the Image Classification demo script, the samples/build/ directory was already created: <INSTALL_DIR>/deployment_tools/inference_engine/samples/build/.

  2. Go to the new directory:
    cd <path_to_build_directory>
  3. Run CMake to generate the Make files for release or debug configuration:
    • For release configuration:
      cmake -DCMAKE_BUILD_TYPE=Release <INSTALL_DIR>/deployment_tools/inference_engine/samples/
    • For debug configuration:
      cmake -DCMAKE_BUILD_TYPE=Debug <INSTALL_DIR>/deployment_tools/inference_engine/samples/
  4. Build the application:
    make

For the release configuration, the sample application binaries are in <path_to_build_directory>/intel64/Release/; for the debug configuration — in  <path_to_build_directory>/intel64/Debug/.

Building the Sample Applications on Windows* OS

Supported Windows build environment:

  • Microsoft Windows* 10
  • Microsoft Visual Studio* 2017 or Microsoft Visual Studio* 2015 Community
  • CMake* 2.8 or later

Follow these steps to prepare your Windows computer for the samples:

  1. Go to the <INSTALL_DIR>\deployment_tools\inference_engine\samples\ directory.
  2. Double-click create_msvc<version>_solution.bat, where <version> is 2015 or 2017 to match your Visual Studio version. For example, for Microsoft Visual Stuio 2017: create_msvc2017_solution.bat. This file generates Microsoft Visual Studio solution.
  3. Open Microsoft Visual Studio*.
  4. Build C:\Users\<username>\Documents\Intel\OpenVINO\inference_engine_samples_<version>\Samples.sln, where <version> is 2015 or 2017 depending on your Visual Studio version.
  5. The sample application binaries are in the C:\Users\<username>\Documents\Intel\OpenVINO directory.

NOTE: When building either release or debug configurations in Microsoft Visual Studio, make sure you select the corresponding build configuration, Release or Debug, in the configuration panel.

NOTE: To debug or run samples in Microsoft Visual Studio, make sure you have properly configured Debugging settings for the Debug and Release configurations. For more information, refer to the Get Ready for Running the Sample Applications section.

Running the Samples

Image Classification Sample

Description

This topic demonstrates how to run the Image Classification sample application, which does inference using image classification networks like AlexNet* and GoogLeNet*.

How It Works

Upon the start-up, the sample application reads command-line parameters and loads a network and an image to the Inference Engine plugin. When inference is done, the application creates an output image and outputs data to the standard output stream.

Running

Running the application with the -h option yields the following usage message:

./classification_sample -h
InferenceEngine:
    API version ............ <version>
    Build .................. <number>

classification_sample [OPTION]
Options:

    -h                        Print a usage message.
    -i "<path1>" "<path2>"    Required. Path to a folder with images or path to an image files: a .ubyte file for LeNet
                              and a .bmp file for the other networks.
    -m "<path>"               Required. Path to an .xml file with a trained model.
        -l "<absolute_path>"  Optional. Absolute path to library with MKL-DNN (CPU) custom layers (*.so).
        Or
        -c "<absolute_path>"  Optional. Absolute path to clDNN (GPU) custom layers config (*.xml).
    -pp "<path>"              Path to a plugin folder.
    -d "<device>"             Specify the target device to infer on; CPU, GPU, FPGA or MYRIAD is acceptable. Sample will look for a suitable plugin for device specified
    -nt "<integer>"           Number of top results (default 10)
    -ni "<integer>"           Number of iterations (default 1)
    -pc                       Enables per-layer performance report
    -p_msg                    Enables messages from a plugin

Running the application with the empty list of options yields the usage message given above and an error message.

To run the sample you can use AlexNet and GoogLeNet models that can be downloaded with the OpenVINO Model Downloader or other image classification models.

NOTE: Before running the sample with a trained model, make sure the model is converted to the Inference Engine format (*.xml + *.bin) using the Model Optimizer tool.

For example, to perform inference of an AlexNet model (previously converted to the Inference Engine format) on CPU, use the following command:

./classification_sample -i <path_to_image>/cat.bmp -m <path_to_model>/alexnet_fp32.xml
Sample Output

By default, the application outputs top-10 inference results. Add the -nt option to the previous command to modify the number of top output results.
For example, to get the top-5 results on Intel® HD Graphics, use the following command:

./classification_sample -i <path_to_image>/cat.bmp -m <path_to_model>/alexnet_fp32.xml -nt 5 -d GPU

Image Classification Sample Async

Description

This sample demonstrates how to build and execute inference in pipelined mode on example of classifications networks.

The pipelined mode might increase the throughput of the pictures. The latency of one inference will be the same as for syncronous execution. The throughput is increased due to follow reasons:

  • Some plugins have heterogenity inside themselves. Transferring of data, execution on remote device, pre-processing and post-processing on the host
  • Using of explicit heterogenious plugin with execution of different parts of network on differnet devices

When two or more devices are involved in the inference process of one picture, creating several infer requests and starting asynchronous inference provides the most efficient way to utilize devices. If two devices are involved in execution, the number 2 is the optimal value for the -nireq option. To be effecient, the Image Classification Sample Async uses a round-robin algorithm for infer requests. The sample starts execution for the current infer request and switches to waiting for the results of the previous inference. After the wait time completes, the machine switches infer requests and repeats the procedure.

Another required aspect of good throughput is a number of iterations. Only with a big number of iterations you can emulate the real application work and see performance

Batch mode is an independent attribute on the pipelined mode. The pipelined mode works efficiently with any batch size.

How It Works

Upon the start-up, the sample application reads command-line parameters and loads a network and an image to the Inference Engine plugin. Then application creates several infer requests pointed in -nireq parameter and loads pictures for inference.

Then in the loop it starts inference for the current infer request and switch for waiting of another one. When results are ready, infer requests will be swapped.

When inference is done, the application outputs data to the standard output stream.

Running

Running the application with the -h option results in the message:

./classification_sample_async -h
InferenceEngine:
    API version ............ <version>
    Build .................. <number>

classification_sample_async [OPTION]
Options:

    -h
                            Print a usage message.
    -i "<path1>" "<path2>"
                            Required. Path to a folder with images or path to an image files: a .ubyte file for LeNet
                            and a .bmp file for the other networks.
    -m "<path>"
                            Required. Path to an .xml file with a trained model.
        -l "<absolute_path>"
                            Optional. Absolute path to library with MKL-DNN (CPU) custom layers (*.so).
        Or
        -c "<absolute_path>"
                            Optional. Absolute path to clDNN (GPU) custom layers config (*.xml).
    -pp "<path>"
                            Path to a plugin folder.
    -d "<device>"
                            Specify the target device to infer on; CPU, GPU, FPGA or MYRIAD is acceptable. Sample will look for a suitable plugin for device specified
    -nt "<integer>"
                            Number of top results (default 10)
    -ni "<integer>"
                            Number of iterations (default 1)
    -pc
                            Enables per-layer performance report
    -nireq "<integer>"
                            Number of infer request for pipelined mode (default 1)
    -p_msg
                            Enables messages from a plugin

Running the application with the empty list of options yields the usage message given above and an error message.

You can do inference on an image using a trained AlexNet* network on FPGA with fallback to Intel® Processors using the following command:

./classification_sample_async -i <path_to_image>/cat.bmp -m <path_to_model>/alexnet_fp32.xml -nt 5 -d HETERO:FPGA,CPU -nireq 2 -ni 200

NOTE: Before running the sample with a trained model, make sure the model is converted to the Inference Engine format (*.xml + *.bin) using the Model Optimizer tool.

Sample Output

By default, the application outputs top-10 inference results for each infer request. In addition to this information, it will provide throughput value measured in frames per seconds.


Security Barrier Camera Demo

Description

This demo showcases Vehicle and License Plate Detection network followed by the Vehicle Attributes and License Plate Recognition applied on top of Vehicle Detection results. The corresponding topologies are shipped with the product:

  • vehicle-license-plate-detection-barrier-0106, which is a primary detection network to find the vehicles and license plates
  • vehicle-attributes-recognition-barrier-0039, which is executed on top of the results from the first network and reports general vehicle attributes, for example, vehicle type (car/van/bus/track) and color
  • license-plate-recognition-barrier-0001, which is executed on top of the results from the first network and reports a string per recognized license plate

For more details on the topologies, please refer to their descriptions in the deployment_tools/intel_models folder of the Intel® Distribution of OpenVINO™ toolkit installation directory.

Other demo objectives are:

  • Video/Camera as inputs, via OpenCV*
  • Example of a simple asynchronous networks pipelining: Vehicle Attributes and License Plate Recognition networks are executed on top of the Vehicle Detection results
  • Visualization of Vehicle Attributes and License Plate information for each detected object
How It Works

On the start-up, the application reads command line parameters and loads the specified networks. The Vehicle and License-Plate Detection network is required, and the other two are optional.

Upon getting a frame from the OpenCV VideoCapture, the application performs inference of Vehicles and License-Plates Detection network, then performs another two inferences using Vehicle Attributes Detection and License Plate Recognition networks (if those specified in command line) and displays the results.

Running

Running the application with the -h option yields the following usage message:

./security_barrier_camera_demo -h
InferenceEngine:
    API version ............ <version>
    Build .................. <number>

interactive_vehicle_detection [OPTION]
Options:

    -h                         Print a usage message.
    -i "<path1>" "<path2>"     Required. Path to video or image files. Default value is "cam" to work with cameras.
    -m "<path>"                Required. Path to the Vehicle and License Plate Detection model .xml file.
    -m_va "<path>"             Optional. Path to the Vehicle Attributes model .xml file.
    -m_lpr "<path>"            Optional. Path to the License Plate Recognition model .xml file.
      -l "<absolute_path>"     Optional. For CPU custom layers, if any. Absolute path to a shared library with the kernels implementation.
          Or
      -c "<absolute_path>"     Optional. For GPU custom kernels, if any. Absolute path to an .xml file with the kernels description.
    -d "<device>"              Optional. Specify the target device for Vehicle Detection (CPU, GPU, FPGA, MYRIAD, or HETERO).
    -d_va "<device>"           Optional. Specify the target device for Vehicle Attributes (CPU, GPU, FPGA, MYRIAD, or HETERO).
    -d_lpr "<device>"          Optional. Specify the target device for License Plate Recognition (CPU, GPU, FPGA, MYRIAD, or HETERO).
    -pc                        Optional. Enable per-layer performance statistics.
    -r                         Optional. Output inference results as raw values.
    -t                         Optional. Probability threshold for vehicle and license plate detections.
    -no_show                   Optional. Do not show processed video.
    -auto_resize               Optional. Enable resizable input with support of ROI crop and auto resize.
    -nireq                     Optional. Number of infer request for pipelined mode (default value is 1)
    -nc                        Optional. Number of processed cameras (default value is 1) if the input (-i) is specified as camera.

Running the application with an empty list of options yields the usage message given above and an error message.

To run the demo, you can use public models or a set of pre-trained and optimized models delivered with the package:

  • <INSTALL_DIR>/deployment_tools/intel_models/vehicle-license-plate-detection-barrier-0106
  • <INSTALL_DIR>/deployment_tools/intel_models/vehicle-attributes-recognition-barrier-0039
  • <INSTALL_DIR>/deployment_tools/intel_models/license-plate-recognition-barrier-0001

For example, to do inference on a GPU with the Intel Distribution of OpenVINO toolkit pre-trained models, run the following command:

./security_barrier_camera_demo -i <path_to_video>/inputVideo.mp4 -m vehicle-license-plate-detection-barrier-0106.xml -m_va vehicle-attributes-recognition-barrier-0039.xml -m_lpr license-plate-recognition-barrier-0001.xml -d GPU

To do inference for two video inputs using two asynchronous infer request on FPGA with the Intel Distribution of OpenVINO toolkit pre-trained models, run the following command:

./security_barrier_camera_demo -i <path_to_video>/inputVideo_0.mp4 <path_to_video>/inputVideo_1.mp4 -m vehicle-license-plate-detection-barrier-0106.xml -m_va vehicle-attributes-recognition-barrier-0039.xml -m_lpr license-plate-recognition-barrier-0001.xml -d HETERO:FPGA,CPU -d_va HETERO:FPGA,CPU -d_lpr HETERO:FPGA,CPU -nireq 2

NOTE: Before running the demo with a trained model, make sure the model is converted to the Inference Engine format (*.xml + *.bin) using the Model Optimizer tool.

Optimization Hints for Heterogeneous Scenarios with FPGA
  • OMP_NUM_THREADS: Specifies number of threads to use. For heterogeneous scenarios with FPGA, when several inference requests are used asynchronously, limiting the number of CPU threads with OMP_NUM_THREADS allows to avoid competing for resources between threads. For the Security Barrier Camera Demo, recommended value is OMP_NUM_THREADS=1.
  • KMP_BLOCKTIME: Sets the time, in milliseconds, that a thread should wait, after completing the execution of a parallel region, before sleeping. The default value is 200ms, which is not optimal for the demo. Recommended value is KMP_BLOCKTIME=1.
Demo Output

The demo uses OpenCV to display the resulting frame with detections rendered as bounding boxes and text:

License plate detection


Object Detection for Faster R-CNN Demo

Description

This topic demonstrates how to run the Object Detection demo application, which does inference using object detection networks like Faster R-CNN on Intel® Processors and Intel® HD Graphics.

How It Works

Upon the start-up, the demo application reads command line parameters and loads a network and an image to the Inference Engine plugin. When inference is done, the application creates an output image and outputs data to the standard output stream.

Downloading and Converting a Caffe* Model

VGG16-Faster-RCNN is a public CNN that can be easily obtained from GitHub:

  1. Download test.prototxt from https://raw.githubusercontent.com/rbgirshick/py-faster-rcnn/master/models/pascal_voc/VGG16/faster_rcnn_end2end/test.prototxt
  2. Download the pretrained models from https://dl.dropboxusercontent.com/s/o6ii098bu51d139/faster_rcnn_models.tgz?dl=0
  3. Unzip the archive. You will need VGG16_faster_rcnn_final.caffemodel file.

For correctly converting the source model, run the Model Optimizer. You can use the following command to convert the source model:

python3 ${MO_ROOT_PATH}/mo_caffe.py --input_model <path_to_model>/VGG16_faster_rcnn_final.caffemodel --input_proto <path_to_model>/deploy.prototxt

For documentation on how to convert Caffe models, refer to Using the Model Optimizer to Convert Caffe* Models

Running

Running the application with the -h option yields the following usage message:

./object_detection_demo -h
InferenceEngine:
    API version ............ <version>
    Build .................. <number>

object_detection_demo [OPTION]
Options:

    -h                        Print a usage message.
    -i "<path>"               Required. Path to an .bmp image.
    -m "<path>"               Required. Path to an .xml file with a trained model.
      -l "<absolute_path>"    Required for MKLDNN (CPU)-targeted custom layers. Absolute path to a shared library with the kernels impl.
      -c "<absolute_path>"    Required for clDNN (GPU)-targeted custom kernels. Absolute path to the xml file with the kernels desc.
    -pp "<path>"              Path to a plugin folder.
    -d "<device>"             Specify the target device to infer on; CPU, GPU, FPGA or MYRIAD is acceptable. The demo will look for a suitable plugin for a specified device.
    -pc                       Enables per-layer performance report
    -ni "<integer>"           Number of iterations (default 1)
    -bbox_name "<string>"     The name of output box prediction layer (default: bbox_pred)
    -proposal_name "<string>" The name of output proposal layer (default: proposal)
    -prob_name "<string>"     The name of output probability layer (default: cls_prob)
    -p_msg                    Enables messages from a plugin

Running the application with the empty list of options yields the usage message given above and an error message.

Use the following command to do inference on Intel® Processors on an image using a trained Faster R-CNN network:

$ ./object_detection_demo -i <path_to_image>/inputImage.bmp -m <path_to_model>/faster-rcnn.xml -d CPU

NOTE: Before running the demo with a trained model, make sure the model is converted to the Inference Engine format (*.xml + *.bin) using the Model Optimizer tool.

Demo Output

The application outputs an image named out_0.bmp with detected objects enclosed in rectangles. It outputs the list of classes of the detected objects along with the respective confidence values and the coordinates of the rectangles to the standard output stream.


Object Detection SSD Demo, Async API Performance Showcase

Description

This demonstration showcases Object Detection with SSD and new Async API. Async API usage can improve overall frame-rate of the application, because rather than wait for inference to complete, the app can continue doing things on the host, while accelerator is busy. Specifically, this demonstration keeps two parallel infer requests and while the current is processed, the input frame for the next is being captured. This essentially hides the latency of capturing, so that the overall framerate is rather determined by the MAXIMUM(detection time, input capturing time) and not the SUM(detection time, input capturing time).

The technique can be generalized to any available parallel slack, such as doing inference while simultaneously encoding the resulting (previous) frames, or running further inference, like emotion detection on top of the face detection results.

Be aware of performance caveats though. When running tasks in parallel, avoid over-using shared compute resources. For example, if performing inference on the FPGA with a mostly idle CPU, perform parallel tasks on the CPU. When doing inference on Intel® Integrated Graphics, you have little gain in tasks like having resulting video encoding on the same GPU in parallel because the device is already busy.

For more performance implications and tips for the Async API, see the Optimization Guide

Other demonstration objectives:

  • Video as input support via OpenCV*
  • Visualization of the resulting bounding boxes and text labels (from the .labels file) or class number (if no file is provided)
  • OpenCV* provides resulting bounding boxes, labels, and other information. You can copy and paste this code without pulling Inference Engine samples helpers into your application.
  • Demonstrate the Async API in action. For this, the demonstration features two modes with a Tab key toggle.
    • Old-style "Sync" way - The frame capturing with OpenCV* executes back-to-back with Detection
    • "Truly Async" way - The Detection is performed on the current frame, while the OpenCV* captures the next frame.
How It Works

On the start-up, the application reads command-line parameters and loads a network to the Inference Engine. Upon getting a frame from the OpenCV VideoCapture it, performs inference and displays the results.

New "Async API" operates with new notion of the "Infer Request" that encapsulates the inputs/outputs and separates scheduling and waiting for result. The difference between performance is as follows:

  1. In the default "Sync" mode, the frame is captured and then immediately processed. In pseudo-code, it looks the following way:
    while(true) {
        capture frame
        populate CURRENT InferRequest
        start CURRENT InferRequest //this call is async and returns immediately
        wait for the CURRENT InferRequest
        display CURRENT result
    }
    This is a reference implementation in which the new Async API is used in a serialized/synch fashion.
  2. In true "Async" mode, the frame is captured and then immediately processed:
    while(true) {
            capture frame
            populate NEXT InferRequest
            start NEXT InferRequest //this call is async and returns immediately
                wait for the CURRENT InferRequest (processed in a dedicated thread)
                display CURRENT result
            swap CURRENT and NEXT InferRequests
        }
    In this case, the NEXT request is populated in the main (application) thread, while the CURRENT request is processed. This is handled in the dedicated thread, internal to the Inference Engine runtime.
Async API

In this release, the Inference Engine offers a new API based on the notion of Infer Requests. With this API, requests encapsulate input and output allocation. You access the blob with the GetBlob method.

You can execute a request asynchronously in the background and wait until you need the result. In the meantime your application can continue:

// load plugin for the device as usual
  auto enginePtr = PluginDispatcher({"../../../lib/intel64", ""}).getSuitablePlugin(
                getDeviceFromStr("GPU"));
// load network
CNNNetReader network_reader;
network_reader.ReadNetwork("Model.xml");
network_reader.ReadWeights("Model.bin");
// populate inputs etc
auto input = async_infer_request.GetBlob(input_name);
...
// start the async infer request (puts the request to the queue and immediately returns)
async_infer_request->StartAsync();
// Continue execution on the host until you need the request results
//...
async_infer_request.Wait(IInferRequest::WaitMode::RESULT_READY);
auto output = async_infer_request.GetBlob(output_name);

NOTE: You have no direct way to measure execution time of the infer request that is running asynchronously, unless you measure the Wait executed immediately after the StartAsync. But this essentially would mean the serialization and synchronous execution.

This is what the odem does for the default "Sync" mode and reports as a Detection time/fps message on the screen. In the truly asynchronous ("Async") mode the host continues execution in the master thread, in parallel to the infer request. If the request is completed before than the Wait is called in the main thread (earlier than OpenCV decoded a new frame), that reporting the time between StartAsync and Wait would be obviously incorrect. That is why in the "Async" mode the inference speed is not reported.

For more information on the new requests-based Inference Engine API, including Async execution, refer to How to Integrate the Inference Engine in Your Application.

Running

Running the application with the -h optionyields the following usage message:

./object_detection_demo_ssd_async -h
InferenceEngine:
    API version ............ <version>
    Build .................. <number>

object_detection_demo_ssd_async [OPTION]
Options:

    -h                        Print a usage message.
    -i "<path>"               Required. Path to a video file (specify "cam" to work with camera).
    -m "<path>"               Required. Path to an .xml file with a trained model.
      -l "<absolute_path>"    Optional. Required for MKLDNN (CPU)-targeted custom layers. Absolute path to a shared library with the kernels impl.
          Or
      -c "<absolute_path>"    Optional. Required for clDNN (GPU)-targeted custom kernels. Absolute path to the xml file with the kernels desc.
    -d "<device>"             Optional. Specify the target device to infer on (CPU, GPU, FPGA, or MYRIAD). The demo will look for a suitable plugin for a specified device.
    -pc                       Optional. Enables per-layer performance report.
    -r                        Optional. Inference results as raw values.
    -t                        Optional. Probability threshold for detections.
    -auto_resize              Optional. Enables resizable input with support of ROI crop & auto resize.

Running the application with an empty list of options results in an error message and the usage list above.

You can use the following command to do inference on GPU with a pre-trained object detection model:

./object_detection_demo_ssd_async -i <path_to_video>/inputVideo.mp4 -m <path_to_model>/ssd.xml -d GPU

NOTE: Before running the demo with a trained model, make sure the model is converted to the Inference Engine format (*.xml + *.bin) using the Model Optimizer tool.

The only GUI knob is using 'Tab' to switch between the synchronized execution and the true Async mode.

Demo Output

The output uses OpenCV* to display the resulting frame with detections rendered as bounding boxes and labels, if provided. In default mode, the demo reports:

  • OpenCV* time: Frame decoding + time to render the bounding boxes, labels, and display of the results.
  • Detection time: Inference time for the objection network. This is reported in the Sync mode.
  • Wallclock time: The combined application-level performance.

Object Detection with SSD-VGG Sample

Description

This topic demonstrates how to run the Object Detection sample application, which does inference using object detection networks like SSD-VGG on Intel® Processors and Intel® HD Graphics.

How It Works

Upon the start-up, the sample application reads command line parameters and loads a network and an image to the Inference Engine plugin. When inference is done, the application creates an output image and outputs data to the standard output stream.

Running

Running the application with the -h option yields the following usage message:

./object_detection_sample_ssd -h
InferenceEngine:
    API version ............ <version>
    Build .................. <number>

object_detection_sample_ssd [OPTION]
Options:

    -h                      Print a usage message.
    -i "<path>"             Required. Path to an .bmp image.
    -m "<path>"             Required. Path to an .xml file with a trained model.
      -l "<absolute_path>"    Required for MKLDNN (CPU)-targeted custom layers. Absolute path to a shared library with the kernels impl.
          Or
      -c "<absolute_path>"    Required for clDNN (GPU)-targeted custom kernels. Absolute path to the xml file with the kernels desc.
    -pp "<path>"            Path to a plugin folder.
    -d "<device>"           Specify the target device to infer on; CPU, GPU, FPGA or MYRIAD is acceptable. Sample will look for a suitable plugin for device specified
    -pc                     Enables per-layer performance report
    -ni "<integer>"         Number of iterations (default 1)
    -p_msg                  Enables messages from a plugin

Running the application with the empty list of options yields the usage message given above and an error message.

To run the sample, you can use a set of pre-trained and optimized models delivered with the package or a Caffe* public model.

For example, to do inference on a CPU with the Intel Distribution of OpenVINO toolkit person detection SSD models, run the following command

./object_detection_sample_ssd -i <path_to_image>/inputImage.bmp -m <INSTALL_DIR>/deployment_tools/intel_models/person-detection-retail-0013/FP32/person-detection-retail-0013.xml -d CPU

or

./object_detection_sample_ssd -i <path_to_image>/inputImage.jpg -m <INSTALL_DIR>/deployment_tools/intel_models/person-detection-retail-0002/FP32/person-detection-retail-0002.xml -d CPU

NOTE: Before running the sample with a trained model, make sure the model is converted to the Inference Engine format (*.xml + *.bin) using the Model Optimizer tool.

Sample Output

The application outputs an image named out_0.bmp with detected objects enclosed in rectangles. It outputs the list of classes of the detected objects along with the respective confidence values and the coordinates of the rectangles to the standard output stream.


TensorFlow* Object Detection Mask R-CNNs Segmentation Demo

Description

This topic demonstrates how to run the Segmentation demo application, which does inference using image segmentation networks created with Object Detection API. Note that batch size equal to 1 is supported only.

The demo has a post-processing part that gathers masks arrays corresponding to bounding boxes with high probability taken from the Detection Output layer. Then the demo produces picture with identified masks.

How It Works

Upon the start-up, the demo application reads command line parameters and loads a network and an image to the Inference Engine plugin. When inference is done, the application creates an output image.

Running

Running the application with the -h option yields the following usage message:

./mask_rcnn_demo -h
InferenceEngine:
    API version ............ <version>
    Build .................. <number>

mask_rcnn_demo [OPTION]
Options:

    -h                      Print a usage message.
    -i "<path>"             Required. Path to an .bmp image.
    -m "<path>"             Required. Path to an .xml file with a trained model.
      -l "<absolute_path>"    Required for MKLDNN (CPU)-targeted custom layers.Absolute path to a shared library with the kernels impl.
          Or
      -c "<absolute_path>"    Required for clDNN (GPU)-targeted custom kernels.Absolute path to the xml file with the kernels desc.
    -pp "<path>"            Path to a plugin folder.
    -d "<device>"           Specify the target device to infer on; CPU, GPU, FPGA or MYRIAD is acceptable. The demo will look for a suitable plugin for a specified device (CPU by default)
    -ni "<integer>"         Number of iterations (default 1)
    -detection_output_name "<string>" The name of detection output layer (default: detection_output)
    -masks_name "<string>" The name of masks layer (default: masks)
    -pc                     Enables per-layer performance report

Running the application with an empty list of options yields the usage message given above and an error message.

You can use the following command to do inference on Intel® Processors on an image using a trained network:

./mask_rcnn_demo -i <path_to_image>/inputImage.bmp -m <path_to_model>/faster_rcnn.xml
Demo Output

The application output is a segmented image (out.png).


Automatic Speech Recognition Sample

This topic shows how to run the speech sample application, which demonstrates acoustic model inference based on Kaldi* neural networks and speech feature vectors.

How It Works

Upon the start-up the Automatic Speech Recognition Sample application reads command line parameters and loads a Kaldi-trained neural network along with Kaldi ARK speech feature vector file to the Inference Engine plugin. It then performs inference on all speech utterances stored in the input ARK file. Context-windowed speech frames are processed in batches of 1-8 frames according to the -bs parameter. Batching across utterances is not supported by this sample. When inference is done, the application creates an output ARK file. If the -r option is given, error statistics are provided for each speech utterance as shown above.

GNA-Specific Details

Quantization

If the GNA device is selected (for example, using the -d GNA_AUTO flag), the GNA Inference Engine plugin quantizes the model and input feature vector sequence to integer representation before performing inference. Several parameters control neural network quantization:

  • The -q flag determines the quantization mode. Three modes are supported:
    • Static - In the static quantization mode, the first utterance in the input ARK file is scanned for dynamic range. The scale factor (floating point scalar multiplier) required to scale the maximum input value of the first utterance to 16384 (15 bits) is used for all subsequent inputs. The neural network is quantized to accommodate the scaled input dynamic range.
    • Dynamic - In the dynamic quantization mode, the scale factor for each input batch is computed just before inference on that batch. The input and network are (re)quantized on-the-fly using an efficient procedure.
    • User-defined - In the user-defined quantization mode, the user may specify a scale factor via the -sf flag that will be used for static quantization.
  • The -qb flag provides a hint to the GNA plugin regarding the preferred target weight resolution for all layers. For example, when -qb 8 is specified, the plugin will use 8-bit weights wherever possible in the network. Note that it is not always possible to use 8-bit weights due to GNA hardware limitations. For example, convolutional layers always use 16-bit weights (GNA harware verison 1 and 2). This limitation will be removed in GNA hardware version 3 and higher.

Execution Modes

Several execution modes are supported via the -d flag:

  • If the device is set to CPU and the GNA plugin is selected, the GNA device is emulated in fast-but-not-bit-exact mode.
  • If the device is set to GNA_AUTO, the GNA hardware is used if available and the driver is installed. Otherwise, the GNA device is emulated in fast-but-not-bit-exact mode.
  • If the device is set to GNA_HW, the GNA hardware is used if available and the driver is installed. Otherwise, an error will occur.
  • If the device is set to GNA_SW, the GNA device is emulated in fast-but-not-bit-exact mode.
  • If the device is set to GNA_SW_EXACT, the GNA device is emulated in bit-exact mode.

Loading and Saving Models

The GNA plugin supports loading and saving the GNA-optimized model (non-IR) via the -rg and -wg flags. Thereby, it is possible to avoid the cost of full model quantization at run time. The GNA plugin also supports export of firmware-compatible embedded model images for the Intel® Speech Enabling Developer Kit and Amazon Alexa* Premium Far-Field Voice Development Kit via the -we flag (save only).

In addition to performing inference directly from a GNA model file, these options make it possible to:

  • Convert from IR format to GNA format model file (-m, -wg)
  • Convert from IR format to embedded format model file (-m, -we)
  • Convert from GNA format to embedded format model file (-rg, -we)
Running

Running the application with the -h option yields the following usage message:

$ ./speech_sample -h
InferenceEngine:
    API version ............ <version>
    Build .................. <number>

speech_sample [OPTION]
Options:

    -h                      Print a usage message.
    -i "<path>"             Required. Path to an .ark file.
    -m "<path>"             Required. Path to an .xml file with a trained model (required if -rg is missing).
    -o "<path>"             Output file name (default name is scores.ark).
    -l "<absolute_path>"    Required for MKLDNN (CPU)-targeted custom layers.Absolute path to a shared library with the kernels impl.
    -d "<device>"           Specify the target device to infer on; CPU, GPU, GNA_AUTO, GNA_HW, GNA_SW, GNA_SW_EXACT is acceptable. Sample will look for a suitable plugin for device specified
    -p                      Plugin name. For example MKLDNNPlugin. If this parameter is pointed, the sample will look for this plugin only
    -pp                     Path to a plugin folder.
    -pc                     Enables performance report
    -q "<mode>"             Input quantization mode:  static (default), dynamic, or user (use with -sf).
    -qb "<integer>"         Weight bits for quantization:  8 or 16 (default)
    -sf "<double>"          Optional user-specified input scale factor for quantization (use with -q user).
    -bs "<integer>"         Batch size 1-8 (default 1)
    -r "<path>"             Read reference score .ark file and compare scores.
    -rg "<path>"            Read GNA model from file using path/filename provided (required if -m is missing).
    -wg "<path>"            Write GNA model to file using path/filename provided.
    -we "<path>"            Write GNA embedded model to file using path/filename provided.
    -nthreads "<integer>"   Optional. Number of threads to use for concurrent async inference requests on the GNA

Running the application with an empty list of options yields the usage message given above and an error message.

Model Preparation

NOTE: Before running the sample with a trained model, make sure the model is converted to the Inference Engine format (*.xml + *.bin) using the Model Optimizer tool.

The following pretrained models are available:

  • wsj_dnn5b_smbr
  • rm_lstm4f
  • rm_cnn4a_smbr

You can download them from https://download.01.org/openvinotoolkit/2018_R3/models_contrib/GNA/.

You can use the following Model Optimizer command to convert a Kaldi nnet1 or nnet2 neural network to Intel IR format:

python3 mo.py --framework kaldi --input_model wsj_dnn5b_smbr.nnet --counts wsj_dnn5b_smbr.counts --remove_output_softmax

Assuming that the Model Optimizer (mo.py), Kaldi-trained neural network (wsj_dnn5b_smbr.nnet), and Kaldi class counts file (wsj_dnn5b_smbr.counts) are in the working directory, this command produces the IR network consisting of wsj_dnn5b_smbr.xml and wsj_dnn5b_smbr.bin.

Speech Inference

Once the IR is created, you can use the following command to do inference on Intel® Processors with the GNA co-processor (or emulation library):

./speech_sample -d GNA_AUTO -bs 2 -i wsj_dnn5b_smbr_dev93_10.ark -m wsj_dnn5b_smbr_fp32.xml -o scores.ark -r wsj_dnn5b_smbr_dev93_scores_10.ark

Here, the floating point Kaldi-generated reference neural network scores (wsj_dnn5b_smbr_dev93_scores_10.ark) corresponding to the input feature file (wsj_dnn5b_smbr_dev93_10.ark) are assumed to be available for comparison.

Sample Output

The acoustic log likelihood sequences for all utterances are stored in the Kaldi ARK file, scores.ark. If the -r option is used, a report on the statistical score error is generated for each utterance such as the following:

Utterance 0: 4k0c0301
Average inference time per frame: 6.26867 ms
         max error: 0.0667191
         avg error: 0.00473641
         avg rms error: 0.00602212
         stdev error: 0.00393488
Use of Sample in Kaldi* Speech Recognition Pipeline

The Wall Street Journal DNN model used in this example was prepared using the Kaldi s5 recipe and the Kaldi Nnet (nnet1) framework. It is possible to recognize speech by substituting the speech_sample for Kaldi nnet-forward command. Since the speech_sample does not yet use pipes, it is necessary to use temporary files for speaker- transformed feature vectors and scores when running the Kaldi speech recognition pipeline. The following operations assume that feature extraction was already performed according to the s5 recipe and that the working directory within the Kaldi source tree is egs/wsj/s5.

  1. Prepare a speaker-transformed feature set given the feature transform specified in final.feature_transform and the feature files specified in feats.scp:
    nnet-forward --use-gpu=no final.feature_transform "ark,s,cs:copy-feats scp:feats.scp ark:- |" ark:feat.ark
  2. Score the feature set using the speech_sample:
    ./speech_sample -d GNA_AUTO -bs 8 -i feat.ark -m wsj_dnn5b_smbr_fp32.xml -o scores.ark
  3. Run the Kaldi decoder to produce n-best text hypotheses and select most likely text given the WFST (HCLG.fst), vocabulary (words.txt), and TID/PID mapping (final.mdl):
    latgen-faster-mapped --max-active=7000 --max-mem=50000000 --beam=13.0 --lattice-beam=6.0 --acoustic-scale=0.0833 --allow-partial=true --word-symbol-table=words.txt final.mdl HCLG.fst ark:scores.ark ark:-| lattice-scale --inv-acoustic-scale=13 ark:- ark:- | lattice-best-path --word-symbol-table=words.txt ark:- ark,t:-  > out.txt &
  4. Run the word error rate tool to check accuracy given the vocabulary (words.txt) and reference transcript (test_filt.txt):
    cat out.txt | utils/int2sym.pl -f 2- words.txt | sed s:\<UNK\>::g | compute-wer --text --mode=present ark:test_filt.txt ark,p:-

Neural Style Transfer Sample

Description

This topic demonstrates how to build and run the Neural Style Transfer sample (NST sample) application, which does inference using models of style transfer topology.

Running

Running the application with the -h option yields the following usage message:

./style_transfer_sample --help
InferenceEngine:
    API version ............ <version>
    Build .................. <number>

style_transfer_sample [OPTION]
Options:

    -h                      Print a usage message.
    -i "<path>"             Required. Path to an .bmp image.
    -m "<path>"             Required. Path to an .xml file with a trained model.
    -pp "<path>"            Path to a plugin folder.
    -d "<device>"           Specify the target device to infer on; CPU, GPU, FPGA or MYRIAD is acceptable. Sample will look for a suitable plugin for device specified
    -ni "<integer>"         Number of iterations (default 1)
    -pc                     Enables per-layer performance report
    -mean_val_r,
    -mean_val_g,
    -mean_val_b             Mean values. Required if the model needs mean values for preprocessing and postprocessing

Running the application with the empty list of options yields the usage message given above and an error message.

You can do inference on an image using a trained model of NST network on Intel® Processors using the following command:

./style_transfer_sample -i <path_to_image>/cat.bmp -m <path_to_model>/1_decoder_FP32.xml

NOTE: Before running the sample with a trained model, make sure the model is converted to the Inference Engine format (*.xml + *.bin) using the Model Optimizer tool.

Sample Output

The application outputs one or more styled image, starting with name out(1).bmp, which were redrawn in style of model which used for inference. Style of output images depend on models which use for sample.


Hello Infer Request Classification Sample

Description

This topic describes how to run the Hello Infer Classification sample application. The sample is a simplified version of the Image Classification Sample. It demonstrates how to use the new Infer Request API of the Inference Engine in applications. See How to Integrate the Inference Engine in Your Application for details.

Running

To do inference on an image using a trained AlexNet* network on Intel® Processors:

./hello_request_classification <path_to_model>/alexnet_fp32.xml <path_to_image>/cat.bmp CPU
Sample Output

The application outputs top-10 inference results.


Interactive Face Detection Demo

This demo showcases Object Detection task applied for face recognition using sequence of neural networks. Async API can improve overall frame-rate of the application, because rather than wait for inference to complete, the application can continue operating on the host while accelerator is busy. This demo executes four parallel infer requests for the Age/Gender Recognition, Head Pose Estimation, Emotions Recognition, and Facial Landmarks Detection networks that run simultaneously. The corresponding pre-trained models are delivered with the product:

  • face-detection-adas-0001, which is a primary detection network for finding faces
  • age-gender-recognition-retail-0013, which is executed on top of the results of the first model and reports estimated age and gender for each detected face
  • head-pose-estimation-adas-0001, which is executed on top of the results of the first model and reports estimated head pose in Tait-Bryan angles
  • emotions-recognition-retail-0003, which is executed on top of the results of the first model and reports an emotion for each detected face
  • facial-landmarks-35-adas-0001, which is executed on top of the results of the first model and reports normed coordinates of estimated facial landmarks

Other demo objectives are:

  • Video as input support via OpenCV*
  • Visualization of the resulting face bounding boxes from Face Detection network
  • Visualization of age/gender, head pose, emotion information, and facial landmarks positions for each detected face

OpenCV is used to draw resulting bounding boxes, labels, and other information. You can copy and paste this code without pulling Inference Engine demo helpers into your application.

How It Works
  1. The application reads command-line parameters and loads up to five networks depending on -m... options family to the Inference Engine.
  2. The application gets a frame from the OpenCV VideoCapture.
  3. The application performs inference on the Face Detection network.
  4. The application performs four simultaneous inferences, using the Age/Gender Recognition, Head Pose Estimation, Emotions Recognition, and Facial Landmarks Estimation networks if those are specified in command line.
  5. The application displays the results.

The new Async API operates with a new notion of the Infer Request that encapsulates the inputs/outputs and separates scheduling and waiting for result. For more information about Async API and the difference between Sync and Async modes performance, refer to How it Works and Async API sections in Object Detection SSD, Async API Performance Showcase Demo.

Running

Running the application with the -h option yields the following usage message:

./interactive_face_detection_demo -h
InferenceEngine:
  API version ............ <version>
  Build .................. <number>

interactive_face_detection_demo [OPTION]
Options:

  -h                         Print a usage message
  -i "<path>"                Required. Path to a video file. Default value is "cam" to work with camera.
  -m "<path>"                Required. Path to an .xml file with a trained Face Detection model.
  -m_ag "<path>"             Optional. Path to an .xml file with a trained Age/Gender Recognition model.
  -m_hp "<path>"             Optional. Path to an .xml file with a trained Head Pose Estimation model.
  -m_em "<path>"             Optional. Path to an .xml file with a trained Emotions Recognition model.
  -m_lm "<path>"             Optional. Path to an .xml file with a trained Facial Landmarks Estimation model.
    -l "<absolute_path>"     Required for CPU custom layers. Absolute path to a shared library with the kernels implementation.
        Or
    -c "<absolute_path>"     Required for GPU custom kernels. Absolute path to an .xml file with the kernels description.
  -d "<device>"              Target device for Face Detection network (CPU, GPU, FPGA, or MYRIAD). The demo will look for a suitable plugin for a specified device.
  -d_ag "<device>"           Target device for Age/Gender Recognition network (CPU, GPU, FPGA, or MYRIAD). The demo will look for a suitable plugin for a specified device.
  -d_hp "<device>"           Target device for Head Pose Estimation network (CPU, GPU, FPGA, or MYRIAD). The demo will look for a suitable plugin for a specified device.
  -d_em "<device>"           Target device for Emotions Recognition network (CPU, GPU, FPGA, or MYRIAD). The demo will look for a suitable plugin for a specified device.
  -d_lm "<device>"           Target device for Facial Landmarks Estimation network (CPU, GPU, FPGA, or MYRIAD). Demo will look for a suitable plugin for device specified.
  -n_ag "<num>"              Number of maximum simultaneously processed faces for Age/Gender Recognition network (default is 16)
  -n_hp "<num>"              Number of maximum simultaneously processed faces for Head Pose Estimation network (default is 16)
  -n_em "<num>"              Number of maximum simultaneously processed faces for Emotions Recognition network (default is 16)
  -n_lm "<num>"              Number of maximum simultaneously processed faces for Facial Landmarks Estimation network (default is 16)
  -dyn_ag                    Enable dynamic batch size for Age/Gender Recognition network
  -dyn_hp                    Enable dynamic batch size for Head Pose Estimation network
  -dyn_em                    Enable dynamic batch size for Emotions Recognition network
  -dyn_lm                    Enable dynamic batch size for Facial Landmarks Estimation network
  -async                     Enable asynchronous mode
  -no_wait                   Do not wait for key press in the end
  -no_show                   Do not show processed video
  -pc                        Enable per-layer performance report
  -r                         Output inference results as raw values
  -t                         Probability threshold for detections

Running the application with an empty list of options yields the usage message given above and an error message.

To run the demo, you can use public models or a set of pre-trained and optimized models delivered with the package:

  • <INSTALL_DIR>/deployment_tools/intel_models/face-detection-adas-0001
  • <INSTALL_DIR>/deployment_tools/intel_models/age-gender-recognition-retail-0013
  • <INSTALL_DIR>/deployment_tools/intel_models/head-pose-estimation-adas-0001
  • <INSTALL_DIR>/deployment_tools/intel_models/emotions-recognition-retail-0003
  • <INSTALL_DIR>/deployment_tools/intel_models/facial-landmarks-35-adas-0001

For example, to do inference on a GPU with the Intel Distribution of OpenVINO toolkit pre-trained models, run the following command:

./interactive_face_detection_demo -i <path_to_video>/inputVideo.mp4 -m <path_to_model>/face-detection-adas-0001.xml -m_ag <path_to_model>/age-gender-recognition-retail-0013.xml -m_hp <path_to_model>/head-pose-estimation-adas-0001.xml -m_em <path_to_model>/emotions-recognition-retail-0003.xml -m_lm <path_to_model>/facial-landmarks-35-adas-0001.xml -d GPU

NOTE: Before running the demo with a trained model, make sure the model is converted to the Inference Engine format (*.xml + *.bin) using the Model Optimizer tool.

Demo Output

The demo uses OpenCV to display the resulting frame with detections (rendered as bounding boxes and labels, if provided). In the default mode, the demo reports:

  • OpenCV time: frame decoding + time to render bounding boxes, labels, and display the results
  • Face Detection time: inference time for the Face Detection network.

If Age/Gender Recognition, Head Pose Estimation, Emotions Recognition, or Facial Landmarks Estimation networks are enabled, the additional information is reported:

  • Face Analysis Networks time: combined inference time of simultaneously executed Age/Gender Recognition, Head Pose Estimation, Emotions Recognition, and Facial Landmarks Estimation networks.

Image Segmentation Demo

Description

This topic demonstrates how to run the Image Segmentation demo application, which does inference using image segmentation networks like FCN8.

How It Works

Upon the start-up, the demo application reads command-line parameters and loads a network and an image to the Inference Engine plugin. When inference is done, the application creates an output image.

Running

Running the application with the -h option yields the following usage message:

./segmentation_demo -h
InferenceEngine:
    API version ............ <version>
    Build .................. <number>

segmentation_demo [OPTION]
Options:

    -h                        Print a usage message.
    -i "<path>"               Required. Path to an .bmp image.
    -m "<path>"               Required. Path to an .xml file with a trained model.
      -l "<absolute_path>"    Required for MKLDNN (CPU)-targeted custom layers. Absolute path to a shared library with the kernels impl.
          Or
      -c "<absolute_path>"    Required for clDNN (GPU)-targeted custom kernels. Absolute path to the xml file with the kernels desc.
    -pp "<path>"              Path to a plugin folder.
    -d "<device>"             Specify the target device to infer on: CPU, GPU, FPGA or MYRIAD is acceptable. The demo will look for a suitable plugin for a specified device (CPU by default).
    -ni "<integer>"           Number of iterations (default 1)
    -pc                       Enables per-layer performance report

Running the application with the empty list of options yields the usage message given above and an error message.

You can use the following command to do inference on Intel® Processors on an image using a trained FCN8 network:

./segmentation_demo -i <path_to_image>/inputImage.bmp -m <path_to_model>/fcn8.xml

NOTE: Before running the demo with a trained model, make sure the model is converted to the Inference Engine format (*.xml + *.bin) using the Model Optimizer tool.

Demo Output

The application outputs are a segmented image named out.bmp.


Crossroad Camera Demo

This demo provides an inference pipeline for person detection, recognition and reidentification. The demo uses Person Detection network followed by the Person Attributes Recognition and Person Reidentification Retail networks applied on top of the detection results. The corresponding pre-trained models are delivered with the product:

  • person-vehicle-bike-detection-crossroad-0078, which is a primary detection network for finding the persons (and other objects if needed)
  • person-attributes-recognition-crossroad-0200, which is executed on top of the results from the first network and reports person attributes like gender, has hat, has long-sleeved clothes
  • person-reidentification-retail-0079, which is executed on top of the results from the first network and prints a vector of features for each detected person. This vector is used to conclude if it is already detected person or not.

For details on the models, please refer to the descriptions in the deployment_tools/intel_models folder of the Intel Distribution of OpenVINO toolkit installation directory.

Other demo objectives are:

  • Images/Video/Camera as inputs, via OpenCV*
  • Example of a simple networks pipelining: Person Attributes Recognition and Person Reidentification networks are executed on top of the Person Detection results
  • Visualization of Person Attributes and Person Reidentification (REID) information for each detected person
How It Works

On the start-up, the application reads command-line parameters and loads the specified networks. The Person Detection network is required, the other two are optional.

Upon getting a frame from the OpenCV VideoCapture, the application performs inference of Person Detection network, then performs another two inferences of Person Attributes Recognition and Person Reidentification Retail networks if they were specified in the command line, and displays the results.

In case of the Person Reidentification Retail network, the resulting vector is generated for each detected person. This vector is compared one-by-one with all previously detected persons vectors using cosine similarity algorithm. If comparison result is greater than the specified (or default) threshold value, it is concluded that the person was already detected and a known REID value is assigned. Otherwise, the vector is added to a global list, and a new REID value is assigned.

Running

Running the application with the -h option yields the following usage message:

./crossroad_camera_demo -h
InferenceEngine:
    API version ............ <version>
    Build .................. <number>

crossroad_camera_demo [OPTION]
Options:

    -h                           Print a usage message.
    -i "<path>"                  Required. Path to a video or image file. Default value is "cam" to work with camera.
    -m "<path>"                  Required. Path to the Person/Vehicle/Bike Detection Crossroad model (.xml) file.
    -m_pa "<path>"               Optional. Path to the Person Attributes Recognition Crossroad model (.xml) file.
    -m_reid "<path>"             Optional. Path to the Person Reidentification Retail model (.xml) file.
      -l "<absolute_path>"       Optional. For MKLDNN (CPU)-targeted custom layers, if any. Absolute path to a shared library with the kernels impl.
          Or
      -c "<absolute_path>"       Optional. For clDNN (GPU)-targeted custom kernels, if any. Absolute path to the xml file with the kernels desc.
    -d "<device>"                Optional. Specify the target device for Person/Vehicle/Bike Detection (CPU, GPU, FPGA, MYRIAD, or HETERO).
    -d_pa "<device>"             Optional. Specify the target device for Person Attributes Recognition (CPU, GPU, FPGA, MYRIAD, or HETERO).
    -d_reid "<device>"           Optional. Specify the target device for Person Reidentification Retail (CPU, GPU, FPGA, MYRIAD, or HETERO).
    -pc                          Optional. Enables per-layer performance statistics.
    -r                           Optional. Output Inference results as raw values.
    -t                           Optional. Probability threshold for person/vehicle/bike crossroad detections.
    -t_reid                      Optional. Cosine similarity threshold between two vectors for person reidentification.
    -no_show                     Optional. No show processed video.
    -auto_resize                 Optional. Enables resizable input with support of ROI crop & auto resize.

Running the application with an empty list of options yields the usage message given above and an error message.

To run the demo, you can use public models or a set of pre-trained and optimized models delivered with the package:

  • <INSTALL_DIR>/deployment_tools/intel_models/person-vehicle-bike-detection-crossroad-0078
  • <INSTALL_DIR>/deployment_tools/intel_models/person-attributes-recognition-crossroad-0200
  • <INSTALL_DIR>/deployment_tools/intel_models/person-reidentification-retail-0079

For example, to do inference on a GPU with the Intel Distribution of OpenVINO toolkit pre-trained models, run the following command:

./crossroad_camera_demo -i <path_to_video>/inputVideo.mp4 -m person-vehicle-bike-detection-crossroad-0078.xml -m_pa person-attributes-recognition-crossroad-0200.xml -m_reid person-reidentification-retail-0079.xml -d GPU

NOTE: Before running the demo with a trained model, make sure the model is converted to the Inference Engine format (*.xml + *.bin) using the Model Optimizer tool.

Demo Output

The demo uses OpenCV to display the resulting frame with detections rendered as bounding boxes and text. In the default mode, the demo reports Person Detection time - inference time for the Person/Vehicle/Bike Detection network.

If Person Attributes Recognition or Person Reidentification Retail are enabled, the additional info below is reported also:

  • Person Attributes Recognition time - Inference time of Person Attributes Recognition averaged by the number of detected persons.
  • Person Reidentification time - Inference time of Person Reidentification averaged by the number of detected persons.

Multi-Channel Face Detection Demo

This demo provides an inference pipeline for multi-channel face detection. The demo uses Face Detection network. The corresponding pre-trained model delivered with the product is face-detection-retail-0004, which is a primary detection network for finding faces.

For details on the models, please refer to the descriptions in the deployment_tools/intel_models folder of the Intel Distribution of OpenVINO toolkit installation directory.

Other demo objectives are:

  • Up to 16 Cameras as inputs, via OpenCV*
  • Visualization of detected faces from all channels on single screen
How It Works

NOTE: Running the demo requires using at least one web camera attached to your machine.

On the start-up, the application reads command line parameters and loads the specified networks. The Face Detection network is required.

Running

Running the application with the -h option yields the following usage message:

./multi-channel-demo -h

multichannel_face_detection [OPTION]
Options:

    -h                           Print a usage message.
    -m "<path>"                  Required. Path to an .xml file with a trained face detection model.
      -l "<absolute_path>"       Required for MKLDNN (CPU)-targeted custom layers. Absolute path to a shared library with the kernels impl.
          Or
      -c "<absolute_path>"       Required for clDNN (GPU)-targeted custom kernels. Absolute path to the xml file with the kernels desc.
    -d "<device>"                Specify the target device for Face Detection (CPU, GPU, FPGA, or MYRIAD). The demo will look for a suitable plugin for a specified device.
    -nc                          Maximum number of processed camera inputs (web cams)
    -bs                          Processing batch size, number of frames processed per infer request
    -n_ir                        Number of infer requests
    -n_iqs                       Frame queue size for input channels
    -fps_sp                      FPS measurement sampling period. Duration between timepoints, msec
    -n_sp                        Number of sampling periods
    -pc                          Enables per-layer performance report.
    -t                           Probability threshold for detections.
    -no_show                     No show processed video.
    -show_stats                  Enable statictics output
    -duplicate_num               Enable and specify number of channel additionally copied from real sources
    -real_input_fps              Disable input frames caching, for maximum throughput pipeline
    -i                           Specify full path to input video files

For example, to run the demo with the pre-trained face detection model on FPGA with fallback on CPU, with one single camera, use the following command:

./multi-channel-demo -m <INSTALL_DIR>/deployment_tools/intel_models/face-detection-retail-0004/FP32/face-detection-retail-0004.xml
-l <demos_build_folder>/intel64/Release/lib/libcpu_extension.so -d HETERO:FPGA,CPU -nc 1

To run with a single camera but several channels, specify additional parameter: -duplicate_num 3. You will see four channels: one real and three duplicated.

./multi-channel-sample -m <INSTALL_DIR>/deployment_tools/intel_models/face-detection-retail-0004/FP32/face-detection-retail-0004.xml
-l <samples_build_folder>/intel64/Release/lib/libcpu_extension.so -d HETERO:FPGA,CPU -i /path/to/file1 /path/to/file2

Video files will be processed repeatedly.

You can also run the demo on web cameras and video files simultaneously by specifying both parameters: -nc <number_of_cams> -i <video files_sequentially_separated_by_space>. To run the demo with a single input source(a web camera or a video file), but several channels, specify an additional parameter: -duplicate_num 3. You will see four channels: one real and three duplicated. With several input sources, the -duplicate_num parameter will duplicate each of them.

NOTE: Before running the demo with a trained model, make sure the model is converted to the Inference Engine format (*.xml + *.bin) using the Model Optimizer tool.

Demo Output

The demo uses OpenCV to display the resulting bunch of frames with detections rendered as bounding boxes. On the top of the screen, the demo reports throughput (in frames per second). If needed, it also reports more detailed statistics (use -show_stats option while running the demo to enable it).


Hello Autoresize Classification Sample

This topic describes how to run the Hello Autoresize Classification sample application. The sample is simplified version of Image Classification Sample. It demonstrates how to use the new input autoresize API of Inference Engine in applications. Refer to How to Integrate the Inference Engine in Your Application for details.

There is also a new API introduced to crop a ROI object and set it as input without additional memory re-allocation. To properly demonstrate this new API, it is required to run several networks in pipeline, bur it is out of scope of this sample. Please refer to Object Detection SSD Demo, Async API Performance Showcase, Security Barrier Camera Demo, or Crossroad Camera Demo with an example of new crop ROI API.

Running

You can do inference on an image using a trained AlexNet network on Intel® Processors using the following command:

./hello_autoresize_classification <path_to_model>/alexnet_fp32.xml <path_to_image>/cat.bmp CPU

NOTE: Before running the sample with a trained model, make sure the model is converted to the Inference Engine format (*.xml + *.bin) using the Model Optimizer tool.

Sample Output

The application outputs top-10 inference results.


Hello Shape Infer Sample

This topic demonstrates how to run the Hello Shape Infer SSD application, which does inference using object detection networks like SSD-VGG. The sample shows how to use Shape Inference feature.

Running

You can use the following command to do inference on Intel® Processors on an image using a trained SSD network:

./hello_shape_infer_ssd <path_to_model>/ssd_300.xml <path_to_image>/500x500.bmp CPU 3

NOTE: Before running the sample with a trained model, make sure the model is converted to the Inference Engine format (*.xml + *.bin) using the Model Optimizer tool.

Sample Output

The application renders an image with detected objects enclosed in rectangles. It outputs a list of classes of the detected objects along with the respective confidence values and the coordinates of the rectangles to the standard output stream.


Human Pose Estimation Demo

This demo showcases the work of multi-person 2D pose estimation algorithm. The task is to predict a pose: body skeleton, which consists of keypoints and connections between them, for every person in an input video. The pose may contain up to 18 keypoints: ears, eyes, nose, neck, shoulders, elbows, wrists, hips, knees, and ankles. Some of potential use cases of the algorithm are action recognition and behavior understanding. The following pre-trained model is delivered with the product:

  • human-pose-estimation-0001, which is a human pose estimation network, that produces two feature vectors. The algorithm uses these feature vectors to predict human poses.

The input frame height is scaled to model height, frame width is scaled to preserve initial aspect ratio and padded to multiple of 8.

Other demo objectives are:

  • Video/Camera as inputs, via OpenCV*
  • Visualization of all estimated poses
How It Works

On the start-up, the application reads command line parameters and loads human pose estimation model. Upon getting a frame from the OpenCV VideoCapture, the application executes human pose estimation algorithm and displays the results.

Running

Running the application with the -h option yields the following usage message:

./human_pose_estimation_demo -h
InferenceEngine:
    API version ............ <version>
    Build .................. <number>

human_pose_estimation_demo [OPTION]
Options:

    -h                         Print a usage message.
    -i "<path>"                Required. Path to a video. Default value is "cam" to work with camera.
    -m "<path>"                Required. Path to the Human Pose Estimation model (.xml) file.
    -d "<device>"              Optional. Specify the target device for Human Pose Estimation (CPU, GPU, FPGA or MYRIAD is acceptable). Default value is "CPU".
    -pc                        Optional. Enable per-layer performance report.
    -no_show                   Optional. Do not show processed video.
    -r                         Optional. Output inference results as raw values.

Running the application with an empty list of options yields an error message.

To run the demo, use the pre-trained and optimized human-pose-estimation-0001 model delivered with the product. The model is located at <INSTALL_DIR>/deployment_tools/intel_models/.

For example, to do inference on a CPU, run the following command:

./human_pose_estimation_demo -i <path_to_video>/input_video.mp4 -m <path_to_model>/human-pose-estimation-0001.xml -d CPU

NOTE: Before running the demo with a trained model, make sure the model is converted to the Inference Engine format (*.xml + *.bin) using the Model Optimizer tool.

Demo Output

The demo uses OpenCV to display the resulting frame with estimated poses and text report of FPS - frames per second performance for the human pose estimation demo.


Object Detection YOLO* V3 Demo, Async API Performance Showcase

This demo showcases Object Detection with YOLO* V3 and Async API.

To learn more about Async API features, please refer to Object Detection for SSD Demo, Async API Performance Showcase.

Other demo objectives are:

  • Video as input support via OpenCV*
  • Visualization of the resulting bounding boxes and text labels (from the .labels file) or class number (if no file is provided)
  • OpenCV provides resulting bounding boxes, labels, and other information. You can copy and paste this code without pulling Inference Engine samples helpers into your application
  • Demonstration of the Async API in action. For this, the demo features two modes toggled by the Tab key:
    • Old-style "Sync" way, where the frame captured with OpenCV executes back-to-back with the Detection
    • Truly "Async" way, where the detection is performed on a current frame, while OpenCV captures the next frame
How It Works

On the start-up, the application reads command-line parameters and loads a network to the Inference Engine. Upon getting a frame from the OpenCV VideoCapture, it performs inference and displays the results.

Running

Running the application with the -h option yields the following usage message:

./object_detection_demo_yolov3_async -h
InferenceEngine:
    API version ............ <version>
    Build .................. <number>

object_detection_demo_yolov3_async [OPTION]
Options:

    -h                        Print a usage message.
    -i "<path>"               Required. Path to a video file (specify "cam" to work with camera).
    -m "<path>"               Required. Path to an .xml file with a trained model.
      -l "<absolute_path>"    Optional. Required for CPU custom layers.Absolute path to a shared library with the layers implementation.
          Or
      -c "<absolute_path>"    Optional. Required for GPU custom kernels.Absolute path to the .xml file with the kernels description.
    -d "<device>"             Optional. Specify a target device to infer on (CPU, GPU). The demo will look for a suitable plugin for the specified device
    -pc                       Optional. Enable per-layer performance report.
    -r                        Optional. Output inference results raw values showing.
    -t                        Optional. Probability threshold for detections.
    -iou_t                    Optional. Filtering intersection over union threshold for overlapping boxes.
    -auto_resize              Optional. Enable resizable input with support of ROI crop and auto resize.

Running the application with the empty list of options yields the usage message given above and an error message. You can use the following command to do inference on GPU with a pre-trained object detection model:

./object_detection_demo_yolov3_async -i <path_to_video>/inputVideo.mp4 -m <path_to_model>/yolo_v3.xml -d GPU

NOTE: Before running the demo with a trained model, make sure the model is converted to the Inference Engine format (*.xml + *.bin) using the Model Optimizer tool.

The only GUI knob is to use Tab to switch between the synchronized execution and the true Async mode.

Demo Output

The demo uses OpenCV to display the resulting frame with detections (rendered as bounding boxes and labels, if provided). In the default mode, the demo reports:

  • OpenCV time: frame decoding + time to render the bounding boxes, labels, and to display the results
  • Detection time: inference time for the object detection network. It is reported in the "Sync" mode only.
  • Wallclock time: combined application-level performance

Pedestrian Tracker Demo

This demo showcases Pedestrian Tracking scenario: it reads frames from an input video sequence, detects pedestrians in the frames, and builds trajectories of movement of the pedestrians in a frame-by-frame manner. The corresponding pre-trained models are delivered with the product:

  • person-detection-retail-0013, which is the primary detection network for finding pedestrians
  • person-reidentification-retail-0031, which is executed on top of the results from inference of the first network and makes reidentification of the pedestrians

For more details on the topologies, refer to the descriptions in the deployment_tools/intel_models folder of the Intel Distributions of OpenVINO toolkit installation.

How It Works

On the start-up, the application reads command line parameters and loads the specified networks.

Upon getting a frame from the input video sequence (either a video file or a folder with images), the application performs inference of the pedestrian detector network.

After that, the bounding boxes describing the detected pedestrians are passed to the instance of the tracker class that matches the appearance of the pedestrians with the known (i.e. already tracked) persons. In obvious cases (when pixel-to-pixel similarity of a detected pedestrian is sufficiently close to the latest pedestrian image from one of the known tracks), the match is made without inference of the reidentification network. In more complicated cases, the demo uses the reidentification network to make a decision if a detected pedestrian is the next position of a known person or the first position of a new tracked person.

After that, the application displays the tracks and the latest detections on the screen and goes to the next frame.

Running

Running the application with the -h option yields the following usage message:

./pedestrian_tracker_demo -h
InferenceEngine:
    API version ............ <version>
    Build .................. <number>

pedestrian_tracker_demo [OPTION]
Options:

    -h                             Print a usage message.
    -i "<path>"                  Required. Path to a video file or a folder with images (all images should have names 0000000001.jpg, 0000000002.jpg, etc).
    -m_det "<path>"              Required. Path to the Pedestrian Detection Retail model (.xml) file.
    -m_reid "<path>"             Required. Path to the Pedestrian Reidentification Retail model (.xml) file.
    -l "<absolute_path>"         Optional. For CPU custom layers, if any. Absolute path to a shared library with the kernels implementation.
          Or
    -c "<absolute_path>"         Optional. For GPU custom kernels, if any. Absolute path to the .xml file with the kernels description.
    -d_det "<device>"            Optional. Specify the target device for pedestrian detection (CPU, GPU, FPGA, MYRIAD, or HETERO).
    -d_reid "<device>"           Optional. Specify the target device for pedestrian reidentification (CPU, GPU, FPGA, MYRIAD, or HETERO).
    -r                             Optional. Output pedestrian tracking results in a raw format (compatible with MOTChallenge format).
    -pc                            Optional. Enable per-layer performance statistics.
    -no_show                       Optional. Do not show processed video.
    -delay                         Optional. Delay between frames used for visualization. If negative, the visualization is turned off (like with the option 'no_show'). If zero, the visualization is made frame-by-frame.
    -out "<path>"                Optional. The file name to write output log file with results of pedestrian tracking. The format of the log file is compatible with MOTChallenge format.
    -first                         Optional. The index of the first frame of video sequence to process. This has effect only if it is positive and the source video sequence is an image folder.
    -last                          Optional. The index of the last frame of video sequence to process. This has effect only if it is positive and the source video sequence is an image folder.
[ INFO ] Execution successful

To run the demo, you can use public models or the following pre-trained and optimized models delivered with the package:

  • <INSTALL_DIR>/deployment_tools/intel_models/person-detection-retail-0013
  • <INSTALL_DIR>/deployment_tools/intel_models/person-reidentification-retail-0031

For example, to run the application with the Intel Distribution of OpenVINO toolkit pre-trained models with inferencing pedestrian detector on a GPU and pedestrian reidentification on a CPU, run the following command:

./pedestrian_tracker_demo -i <path_video_file> \
                          -m_det <path_person-detection-retail-0013>/person-detection-retail-0013.xml \
                          -m_reid <path_person-reidentification-retail-0031>/person-reidentification-retail-0031.xml \
                          -d_det GPU

NOTE: Before running the demo with a trained model, make sure the model is converted to the Inference Engine format (*.xml + *.bin) using the Model Optimizer tool.

Demo Output

The demo uses OpenCV to display the resulting frame with detections rendered as bounding boxes, curves (for trajectories displaying), and text.


Smart Classroom Demo

The demo shows an example of joint usage of several neural networks to detect three basic actions (sitting, standing, raising hand) and recognize people by faces in the classroom environment. The demo uses Async API for action and face detection networks. It allows to parallelize execution of face recognition and detection: while face recognition is running on one accelerator, face and action detection could be performed on another. The corresponding pre-trained models are delivered with the product:

  • face-detection-adas-0001, which is a primary detection network for finding faces.
  • landmarks-regression-retail-0009, which is executed on top of the results from the first network and outputs a vector of facial landmarks for each detected face.
  • face-reidentification-retail-0095, which is executed on top of the results from the first network and outputs a vector of features for each detected face.
  • person-detection-action-recognition-0003, which is a detection network for finding persons and simultaneously predicting their current actions.
How It Works

On the start-up, the application reads command line parameters and loads up to four networks to the Inference Engine for execution on different devices depending on -m... options family. Upon getting a frame from the OpenCV VideoCapture, it performs inference of Face Detection and Action Detection networks. After that, the ROIs obtained by Face Detector are fed to the Facial Landmarks Regression network. Then landmarks are used to align faces by affine transform and feed them to the Face Recognition network. The recognized faces are matched with detected actions to find an action for a recognized person for each frame.

Creating a Gallery for Face Recognition

To recognize faces on a frame, the demo needs a gallery of reference images. Each image should contain a tight crop of face. You can create the gallery from an arbitrary list of images:

  1. Put images containing tight crops of frontal-oriented faces to a separate empty folder. Each identity could have multiple images. Name images as id_name.0.png, id_name.1.png, ....
  2. Run the create_list.py <path_to_folder_with_images> command to get a list of files and identities in .json format.
Running

Running the application with the -h option yields the following usage message:

./smart_classroom_demo -h
InferenceEngine:
    API version ............ <version>
    Build .................. <number>

smart_classroom_demo [OPTION]
Options:

    -h                             Print a usage message.
    -i '<path>'                    Required. Path to a video or image file. Default value is "cam" to work with camera.
    -m_act '<path>'                Required. Path to the Person/Action Detection Retail model (.xml) file.
    -m_fd '<path>'                 Required. Path to the Face Detection Retail model (.xml) file.
    -m_lm '<path>'                 Required. Path to the Facial Landmarks Regression Retail model (.xml) file.
    -m_reid '<path>'               Required. Path to the Face Reidentification Retail model (.xml) file.
    -l '<absolute_path>'           Optional. For CPU custom layers, if any. Absolute path to a shared library with the kernels implementation.
          Or
    -c '<absolute_path>'           Optional. For GPU custom kernels, if any. Absolute path to an .xml file with the kernels description.
    -d_act '<device>'              Optional. Specify the target device for Person/Action Detection Retail (CPU, GPU, FPGA, MYRIAD, or HETERO).
    -d_fd '<device>'               Optional. Specify the target device for Face Detection Retail (CPU, GPU, FPGA, MYRIAD, or HETERO).
    -d_lm '<device>'               Optional. Specify the target device for Landmarks Regression Retail (CPU, GPU, FPGA, MYRIAD, or HETERO).
    -d_reid '<device>'             Optional. Specify the target device for Face Reidentification Retail (CPU, GPU, FPGA, MYRIAD, or HETERO).
    -out_v  '<path>'               Optional. File to write output video with visualization to.
    -pc                            Optional. Enables per-layer performance statistics.
    -r                             Optional. Output Inference results as raw values.
    -ad                            Optional. Output file name to save per-person action statistics in.
    -t_act                         Optional. Probability threshold for persons/actions detections.
    -t_fd                          Optional. Probability threshold for face detections.
    -inh_fd                        Optional. Input image height for face detector.
    -inw_fd                        Optional. Input image width for face detector.
    -exp_r_fd                      Optional. Expand ratio for bbox before face recognition.
    -t_reid                        Optional. Cosine distance threshold between two vectors for face reidentification.
    -fg                            Optional. Path to a faces gallery in .json format.
    -no_show                       Optional. Do not show processed video.
    -last_frame                    Optional. Last frame number to handle in demo. If negative, handle all input video.

Running the application with the empty list of options yields the usage message given above and an error message.

To run the demo, you can use public models or the following pre-trained and optimized models delivered with the package:

  • <INSTALL_DIR>/deployment_tools/intel_models/face-detection-adas-0001
  • <INSTALL_DIR>/deployment_tools/intel_models/landmarks-regression-retail-0009
  • <INSTALL_DIR>/deployment_tools/intel_models/face-reidentification-retail-0095
  • <INSTALL_DIR>/deployment_tools/intel_models/person-detection-action-recognition-0003

NOTE: Before running the demo with a trained model, make sure the model is converted to the Inference Engine format (*.xml + *.bin) using the Model Optimizer tool.

Example of a valid command line to run the application:

./smart_classroom_demo -m_act <path to the person/action detection retail model .xml file> \
                       -m_fd <path to the face detection retail model .xml file> \
                       -m_reid <path to the face reidentification retail model .xml file> \
                       -m_lm <path to the landmarks regression retail model .xml file> \
                       -fg <path to faces_gallery.json> \
                       -i <path to the input video>
Demo Output

The demo uses OpenCV to display the resulting frame with labeled actions and faces.


Super Resolution Demo

This topic demonstrates how to run the Super Resolution demo application, which reconstructs the high resolution image from the original low resolution one.

The corresponding pre-trained model is delivered with the product:

  • single-image-super-resolution-0034, which is the primary and only model that performs super resolution 4x upscale on a 200x200 image

For details on the model, please refer to the description in the deployment_tools/intel_models folder of the Intel Distribution of OpenVINO toolkit installation directory.

How It Works

On the start-up, the application reads command-line parameters and loads the specified network. After that, the application reads a 200x200 input image and performs 4x upscale using super resolution.

Running

Running the application with the -h option yields the following usage message:

./super_resolution_demo -h
InferenceEngine:
    API version ............ <version>
    Build .................. <number>

super_resolution_demo [OPTION]
Options:

    -h                      Print a usage message.
    -i "<path>"             Required. Path to an image.
    -m "<path>"             Required. Path to an .xml file with a trained model.
    -pp "<path>"            Path to a plugin folder.
    -d "<device>"           Specify the target device to infer on (CPU, GPU, FPGA, or MYRIAD). The demo will look for a suitable plugin for the specified device.
    -ni "<integer>"         Number of iterations (default value is 1)
    -pc                     Enable per-layer performance report

Running the application with the empty list of options yields the usage message given above and an error message.

To run the demo, you can use public models or a pre-trained and optimized model delivered with the package:

  • <INSTALL_DIR>/deployment_tools/intel_models/single-image-super-resolution-0034

To do inference on CPU using a trained model, run the following command:

./super_resolution_demo -i <path_to_image>/image.bmp -m <path_to_model>/model.xml

NOTE: Before running the demo with a trained model, make sure the model is converted to the Inference Engine format (*.xml + *.bin) using the Model Optimizer tool.

Demo Output

The application outputs a reconstructed high-resolution image and saves it in the current working directory as *.bmp file with sr prefix.


Benchmark Application Demo

This topic demonstrates how to use the Benchmark Application to estimate deep learning inference performance on supported devices. Performance can be measured for two inference modes: synchronous and asynchronous.

NOTE: This topic describes usage of C++ implementation of the Benchmark Application.

How It Works

NOTE: To achieve benchmark results similar to the official published results, set CPU frequency to 2.9GHz and GPU frequency to 1GHz.

Upon the start-up, the application reads command-line parameters and loads a network and images to the Inference Engine plugin. The number of infer requests and execution approach depend on a mode defined with the -api command-line parameter.

Synchronous API

For synchronous mode, the primary metric is latency. The application creates one infer request and executes the Infer method. A number of executions is defined by one of the two values:

  • Number of iterations defined with the -niter command-line argument
  • Predefined duration if -niter is skipped. Predefined duration value depends on device.

During the execution, the application collects two types of metrics:

  • Latency for each infer request executed with Infer method
  • Duration of all executions

Reported latency value is calculated as mean value of all collected latencies. Reported throughput value is a derivative from reported latency and additionally depends on batch size.

Asynchronous API

For asynchronous mode, the primary metric is throughput in frames per second (FPS). The application creates a certain number of infer requests and executes the StartAsync method. A number of infer is specified with the -nireq command-line parameter. A number of executions is defined by one of the two values:

  • Number of iterations defined with the -niter command-line argument
  • Predefined duration if -niter is skipped. Predefined duration value depends on device.

The infer requests are executed asynchronously. Wait method is used to wait for previous execution to complete. The application measures all infer requests executions and reports the throughput metric based on batch size and total execution duration.

Running

Running the application with the -h option yields the following usage message:

./benchmark_app -h
InferenceEngine:
        API version ............ <version>
        Build .................. <number>
[ INFO ] Parsing input parameters

benchmark_app [OPTION]
Options:

    -h                      Print a usage message
    -i "<path>"             Required. Path to a folder with images or to image files.
    -m "<path>"             Required. Path to an .xml file with a trained model.
    -pp "<path>"            Path to a plugin folder.
    -api "<sync/async>"     Required. Enable using sync/async API.
    -d "<device>"           Specify a target device to infer on: CPU, GPU, FPGA or MYRIAD. Use "-d HETERO:<comma separated devices list>" format to specify HETERO plugin. The application looks for a suitable plugin for the specified device.
    -niter "<integer>"      Optional. Number of iterations. If not specified, the number of iterations is calculated depending on a device.
    -nireq "<integer>"      Optional. Number of infer requests (default value is 2).
    -l "<absolute_path>"    Required for CPU custom layers. Absolute path to a shared library with the kernels implementations.
          Or
    -c "<absolute_path>"    Required for GPU custom kernels. Absolute path to an .xml file with the kernels description.
    -b "<integer>"          Optional. Batch size value. If not specified, the batch size value is determined from IR.
  

Running the application with the empty list of options yields the usage message given above and an error message.

You can run the application for one input layer four-dimensional models that support images as input, for example, public AlexNet and GoogLeNet models that can be downloaded with the OpenVINO Model Downloade.

NOTE: Before running the application with a trained model, make sure the model is converted to the Inference Engine format (*.xml + *.bin) using the Model Optimizer tool.

For example, to perform inference on CPU in the synchronous mode and get estimated performance metrics for AlexNet model, run the following command:

./benchmark_app -i <path_to_image>/inputImage.bmp -m <path_to_model>/alexnet_fp32.xml -d CPU -api sync

For the asynchronous mode:

./benchmark_app -i <path_to_image>/inputImage.bmp -m <path_to_model>/alexnet_fp32.xml -d CPU -api async
Demo Output

Application output depends on a used API. For synchronous API, the application outputs latency and throughput:

[ INFO ] Start inference synchronously (60000 ms duration)

[ INFO ] Latency: 37.91 ms
[ INFO ] Throughput: 52.7566 FPS

For asynchronous API, the application outputs only throughput:

[ INFO ] Start inference asynchronously (60000 ms duration, 2 inference requests in parallel)

[ INFO ] Throughput: 48.2031 FPS

Validation Application

Inference Engine Validation Application is a tool that allows to infer deep learning models with standard inputs and outputs configuration and to collect simple validation metrics for topologies. It supports top-1 and top-5 metric for Classification networks and 11-points mAP metric for Object Detection networks.

NOTE: Before running the application on a trained model, make sure the model is converted to the Inference Engine format (*.xml + *.bin) using the Model Optimizer tool.

Possible use cases of the tool:

  • Check if the Inference Engine infers the public topologies well (the engineering team uses the Validation Application for regular testing)
  • Verify if a custom model is compatible with the default input/output configuration and compare its accuracy with the public models
  • Use Validation Application as another sample: although the code is much more complex than in classification and object detection samples, the source code is open and can be re-used
Validation Application Options

The Validation Application provides the following command-line interface (CLI):

Usage: validation_app [OPTION]

Available options:

    -h                        Print a help message
    -t <type>                 Type of an inferred network ("C" by default)
      -t "C" for classification
      -t "OD" for object detection
    -i <path>                 Required. Folder with validation images. Path to a directory with validation images. For Classification models, the directory must contain folders named as labels with images inside or a .txt file with a list of images. For Object Detection models, the dataset must be in VOC format.
    -m <path>                 Required. Path to an .xml file with a trained model
    -lbl <path>               Labels file path. The labels file contains names of the dataset classes
    -l <absolute_path>        Required for CPU custom layers. Absolute path to a shared library with the kernel implementations
    -c <absolute_path>        Required for GPU custom kernels.Absolute path to an .xml file with the kernel descriptions.
    -d <device>               Target device to infer on: CPU (default), GPU, FPGA, or MYRIAD. The application looks for a suitable plugin for the specified device.
    -b N                      Batch size value. If not specified, the batch size value is taken from IR
    -ppType <type>            Preprocessing type. Options: "None", "Resize", "ResizeCrop"
    -ppSize N                 Preprocessing size (used with ppType="ResizeCrop")
    -ppWidth W                Preprocessing width (overrides -ppSize, used with ppType="ResizeCrop")
    -ppHeight H               Preprocessing height (overrides -ppSize, used with ppType="ResizeCrop")
    --dump                    Dump file names and inference results to a .csv file

    Classification-specific options:
      -Czb true               "Zero is a background" flag. Some networks are trained with a modified dataset where the class IDs  are enumerated from 1, but 0 is an undefined "background" class (which is never detected)

    Object detection-specific options:
      -ODkind <kind>          Type of an Object Detection model. Options: SSD
      -ODa <path>             Required for Object Detection models. Path to a directory containing an .xml file with annotations for images.
      -ODc <file>             Required for Object Detection models. Path to a file containing a list of classes
      -ODsubdir <name>        Directory between the path to images (specified with -i) and image name (specified in the .xml file). For VOC2007 dataset, use JPEGImages.

The tool options are divided into two categories:

  • Common options named with a single letter or a word, such as -b or --dump. These options are the same in all Validation Application modes.
  • Network type-specific options named as an acronym of the network type (C or OD) followed by a letter or a word.
General Workflow

When executed, the Validation Application perform the following steps:

  1. Loads a model to an Inference Engine plugin
  2. Reads validation set (specified with the -i option):
    • If you specified a directory, the application tries to load labels first. To do this, it searches for the file with the same name as a model, but with .labels extension (instead of .xml). Then it searches for the specified folder, detects its sub-folders named as known labels, and adds all images from these sub-folders to the validation set. When there are no such sub-folders, validation set is considered empty.
    • If you specified a .txt file, the application reads this file expecting every line to be in the correct format. For more information about the format, refer to the Preparing the Dataset section below.
  3. Reads the batch size value specified with the -b option and loads this number of images to the plugin.

    NOTE: Images loading time is not a part of inference time reported by the application.

  4. The plugin infers the model, and the Validation Application collects the statistics.

You can also retrieve infer result by specifying the --dump option, however it generates a report only for Classification models. This CLI option enables creation (if possible) of an inference report in the .csv format.

The structure of the report is a set of lines, each of them contains semicolon-separated values:

  • Image path
  • A flag representing correctness of prediction
  • ID of Top-1 class
  • Probability that the image belongs to Top-1 class in per cents
  • ID of Top-2 class
  • Probability that the image belongs to Top-2 class in per cents

This is an example line from such report:

"ILSVRC2012_val_00002138.bmp";1;1;8.5;392;6.875;123;5.875;2;5.5;396;5;

It means that the given image was predicted correctly. The most probable prediction is that this image represents class 1 with the probability 0.085.

The next section shows how to use the Validation application in classification mode to score a classification CNN on a pack of images.

Prepare a Dataset

You must prepare the dataset before running the Validation Application. The format of dataset depends on a type of the model you are going to validate. Make sure that the dataset is format is applicable for the chosen model type.

Dataset Format for Classification: Folders as Classes

In this case, a dataset has the following structure:

|-- <path>/dataset
    |-- apron
        |-- apron1.bmp
        |-- apron2.bmp
    |-- collie
        |-- a_big_dog.jpg
    |-- coral reef
        |-- reef.bmp
    |-- Siamese
        |-- cat3.jpg

This structure means that each folder in dataset directory must have the name of one of the classes and contain all images of this class. In the given example, there are two images that represent the class apron, while three other classes have only one image each.

NOTE: A dataset can contain images of both .bmp and .jpg formats.

The correct way to use such dataset is to specify the path as -i <path>/dataset.

Dataset Format for Classification: List of Images (ImageNet*-like)

If you want to use this dataset format, create a single file with a list of images. In this case, the correct set of files must be similar to the following:

|-- <path>/dataset
    |-- apron1.bmp
    |-- apron2.bmp
    |-- a_big_dog.jpg
    |-- reef.bmp
    |-- cat3.jpg
    |-- labels.txt

Where labels.txt looks like:

apron1.bmp 411
apron2.bmp 411
cat3.jpg 284
reef.bmp 973
a_big_dog.jpg 231

Each line of the file must contain the name of the image and the ID of the class that it represents in the format <image_name> tabulation <class_id>. For example, apron1.bmp represents the class with ID 411.

NOTE: A dataset can contain images of both .bmp and .jpg formats.

The correct way to use such dataset is to specify the path as -i <path>/dataset/labels.txt.

Dataset Format for Object Detection (VOC-like)

Object Detection SSD models can be inferred on the original dataset that was used as a testing dataset during the model training. To prepare the VOC dataset, follow the steps below:

  1. Download the pre-trained SSD-300 model from the SSD GitHub* repository at https://github.com/weiliu89/caffe/tree/ssd.
  2. Download VOC2007 testing dataset:
    $wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar   tar -xvf VOCtest_06-Nov-2007.tar
  3. Convert the model with the Model Optimizer.
  4. Create a proper .txt class file from the original labelmap_voc.prototxt. The new file must be in the following format:
    none_of_the_above 0
    aeroplane 1
    bicycle 2
    bird 3
    boat 4
    bottle 5
    bus 6
    car 7
    cat 8
    chair 9
    cow 10
    diningtable 11
    dog 12
    horse 13
    motorbike 14
    person 15
    pottedplant 16
    sheep 17
    sofa 18
    train 19
    tvmonitor 20

    Save this file as VOC_SSD_Classes.txt.

Validate Classification Models

Once you have prepared the dataset (refer to the Preparing the Dataset section above), run the following command to infer a classification model on the selected dataset:

./validation_app -t C -i <path_to_images_directory_or_txt_file> -m <path_to_classification_model>/<model_name>.xml -d <CPU|GPU>
Validate Object Detection Models

NOTE: Validation Application was validated with SSD CNN. Any network that can be inferred by the Inference Engine and has the same input and output format as one of these should be supported as well.

Once you have prepared the dataset (refer to the Preparing the Dataset section above), run the following command to infer an Object Detection model on the selected dataset:

./validation_app -d CPU -t OD -ODa "<path_to_VOC_dataset>/VOCdevkit/VOC2007/Annotations" -i "<path_to_VOC_dataset>/VOCdevkit" -m "<path_to_model>/vgg_voc0712_ssd_300x300.xml" -ODc "<path_to_classes_file>/VOC_SSD_Classes.txt" -ODsubdir JPEGImages
Understand Validation Application Output

During the validation process, you can see the interactive progress bar that represents the current validation stage. When it is full, the validation process is over, and you can analyze the output.

Key data from the output:

  • Network loading time - time spent on topology loading in ms
  • Model - path to a chosen model
  • Model Precision - precision of the chosen model
  • Batch size - specified batch size
  • Validation dataset - path to a validation set
  • Validation approach - type of the model: Classification or Object Detection
  • Device - device type

Below you can find the example output for Classification models, which reports average infer time and Top-1 and Top-5 metric values:

Average infer time (ms): 588.977 (16.98 images per second with batch size = 10)

Top1 accuracy: 70.00% (7 of 10 images were detected correctly, top class is correct)
Top5 accuracy: 80.00% (8 of 10 images were detected correctly, top five classes contain required class)

Below you can find the example output for Object Detection models:

Progress: [....................] 100.00% done
[ INFO ] Processing output blobs
Network load time: 27.70ms
Model: /home/user/models/ssd/withmean/vgg_voc0712_ssd_300x300/vgg_voc0712_ssd_300x300.xml
Model Precision: FP32
Batch size: 1
Validation dataset: /home/user/Data/SSD-data/testonly/VOCdevkit
Validation approach: Object detection network

Average infer time (ms): 166.49 (6.01 images per second with batch size = 1)
Average precision per class table:

Class   AP
1   0.796
2   0.839
3   0.759
4   0.695
5   0.508
6   0.867
7   0.861
8   0.886
9   0.602
10  0.822
11  0.768
12  0.861
13  0.874
14  0.842
15  0.797
16  0.526
17  0.792
18  0.795
19  0.873
20  0.773

Mean Average Precision (mAP): 0.7767

This output shows the resulting mAP metric value for the SSD300 model used to prepare the dataset. This value repeats the result stated in the SSD GitHub* repository and in the original arXiv paper.


Calibration Tool

Inference Engine Calibration Tool calibrates a given FP32 model so that is can be run in low-precision 8-bit integer mode while keeping the input data of this model in the original precision.

Calibration Tool Options

The core command-line options for the Calibration Tool are the same as for Validation Application. However, the Calibration Tool has the following specific options: -t, -subset, -output, and -threshold.

Running the Calibration Tool with the -h option yields the following usage message with all CLI options listed:

Usage: calibration_tool [OPTION]

Available options:

    -h                        Print a help message
    -t <type>                 Type of an inferred network ("C" by default)
      -t "C" to calibrate Classification network and write the calibrated network to IR
      -t "OD" to calibrate Object Detection network and write the calibrated network to IR
      -t "RawC" to collect only statistics for Classification network and write statistics to IR. With this option, a model is not calibrated. For calibration and statisctics collection, use "-t C" instead.
      -t "RawOD" to collect only statistics for Object Detection network and write statistics to IR. With this option, a model is not calibrated. For calibration and statisctics collection, use "-t OD" instead
    -i <path>                 Required. Path to a directory with validation images. For Classification models, the directory must contain folders named as labels with images inside or a .txt file with a list of images. For Object Detection models, the dataset must be in VOC format.
    -m <path>                 Required. Path to an .xml file with a trained model, including model name and extension.
    -lbl <path>               Labels file path. The labels file contains names of the dataset classes
    -l <absolute_path>        Required for CPU custom layers. Absolute path to a shared library with the kernel implementations.
    -c <absolute_path>        Required for GPU custom kernels. Absolute path to an .xml file with the kernel descriptions.
    -d <device>               Target device to infer on: CPU (default), GPU, FPGA, or MYRIAD. The application looks for a suitable plugin for the specified device.
    -b N                      Batch size value. If not specified, the batch size value is taken from IR
    -ppType <type>            Preprocessing type. Options: "None", "Resize", "ResizeCrop"
    -ppSize N                 Preprocessing size (used with ppType="ResizeCrop")
    -ppWidth W                Preprocessing width (overrides -ppSize, used with ppType="ResizeCrop")
    -ppHeight H               Preprocessing height (overrides -ppSize, used with ppType="ResizeCrop")
    --dump                    Dump file names and inference results to a .csv file
    -subset                   Number of pictures from the whole validation set tocreate the calibration dataset. Default value is 0, which stands forthe whole provided dataset
    -output <output_IR>       Output name for calibrated model. Default is <original_model_name>_i8.xml|bin
    -threshold                Threshold for a maximum accuracy drop of quantized model. Must be an integer number (percents) without a percent sign. Default value is 1, which stands for accepted accuracy drop in 1%
    - stream_output           Flag for printing progress as a plain text. When used, interactive progress bar is replaced with multiline output


    Classification-specific options:
      -Czb true               "Zero is a background" flag. Some networks are trained with a modified dataset where the class IDs  are enumerated from 1, but 0 is an undefined "background" class (which is never detected)

    Object detection-specific options:
      -ODkind <kind>          Type of an Object Detection model. Options: SSD
      -ODa <path>             Required for Object Detection models. Path to a directory containing an .xml file with annotations for images.
      -ODc <file>             Required for Object Detection models. Path to a file with a list of classes
      -ODsubdir <name>        Directory between the path to images (specified with -i) and image name (specified in the .xml file). For VOC2007 dataset, use JPEGImages.

The tool options are divided into two categories:

  1. Common options named with a single letter or a word, such as -b or --dump. These options are the same in all calibration tool modes.
  2. Network type-specific options named as an acronym of the network type (C or OD) followed by a letter or a word.
Calibrate a Classification Model

To calibrate a classification convolutional neural network (CNN) on a subset of images (first 2000 images) from the given dataset (specified with the -i option), run the following command:

./calibration_tool -t C -i <path_to_images_directory_or_txt_file> -m <path_to_classification_model>/<model_name>.xml -d <CPU|GPU> -subset 2000

The dataset must have the correct format. Classification models support two formats: folders named as labels that contain all images of this class and ImageNet*-like format, with the .txt file containing list of images and IDs of classes.

For more information on the structure of the datasets, refer to the Prepare a Dataset section of the Validation Application document.

If you decide to use the subset of the given dataset, use the ImageNet-like format instead of folder-as-classes format. This brings a more accurate calibration as you are likely to get images representing different classes.

To run the sample you can use classification models that can be downloaded with the OpenVINO Model Downloader or other image classification models.

For example, to calibrate the trained Caffe* resnet-50 classification model, run the following command:

./calibration_tool -t C -m resnet-50.xml -i ILSVRC2012_val.txt -Czb false -ppType "ResizeCrop" -ppSize 342 -b 1 -d CPU -subset 2000

NOTE: Before running the tool on a trained model, make sure the model is converted to the Inference Engine format (*.xml + *.bin) using the Model Optimizer tool.

Calibrate Object Detection Model

This topic demonstrates how to run the Calibration Tool on the Object Detection CNN on a set of images. Please review the list of Object Detection models used for validation of the Calibration Tool in the 8-bit Inference Introduction. Any network that can be inferred with the Inference Engine and has the same input and output format as the SSD CNN should be supported as well.

Run SSD Network on the VOC dataset

Before you start calibrating the model, make sure your dataset is in the correct format. For more information, refer to the Prepare a Dataset section of the Validation Application document.

Once you have prepared the dataset, you can calibrate the model on it by running the following command:

./calibration_tool -d CPU -t OD -ODa "<path_to_image_annotations>/VOCdevkit/VOC2007/Annotations" -i "<path_to_image_directory>/VOCdevkit" -m "<path_to_model>/vgg_voc0712_ssd_300x300.xml" -ODc "<path_to_classes_list>/VOC_SSD_Classes.txt" -ODsubdir JPEGImages -subset 500

Text Detection Demo

The demo shows an example of using a single neural network to detect printed text rotated at any angle in various environment. The corresponding pre-trained model is delivered with the product:

  • text-detection-0001, which is a detection network for finding text.
How It Works

On the start-up, the application reads command line parameters and loads one network to the Inference Engine for execution. Upon getting an image, it performs inference of text detection and prints the result as four points (x1, y1), (x2, y2), (x3, y3), (x4, y4) for each text bounding box.

Running

Running the application with the -h option yields the following usage message:

./text_detection_demo -h

text_detection_demo [OPTION]
Options:

    -h                           Print a usage message.
    -i "<path>"                  Required. Path to an image file.
    -m "<path>"                  Required. Path to the Text Detection model (.xml) file.
    -d "<device>"                Optional. Specify the target device to infer on: CPU, GPU, FPGA, or MYRIAD. The demo will look for a suitable plugin for a specified device.
    -l "<absolute_path>"         Optional. Absolute path to a shared library with the CPU kernels implementation for custom layers.
    -c "<absolute_path>"         Optional. Absolute path to the GPU kernels implementation for custom layers.
    -no_show                     Optional. If it is true, then detected text will not be shown on image frame. By default, it is false.
    -r                           Optional. Output Inference results as raw values.

Running the application with the empty list of options yields the usage message given above and an error message.

To run the demo, you can use the following pre-trained and optimized model delivered with the package:

  • <INSTALL_DIR>/deployment_tools/intel_models/text-detection-0001

For example, use the following command line command to run the application:

./text_detection_demo -m <path_to_model> -i <path_to_image>
Demo Output

The demo uses OpenCV to display the resulting frame with detections rendered as bounding boxes.


LeNet Number Classifications Network Using Graph Builder API

This sample demonstrates how to execute inference using Inference Engine Graph Builder API to build a network on example of the LeNet classifications network. An XML file is not required for network building. Inference Engine Graph Builder API allows network building "on the fly" from source code. The sample uses one-channel ubyte pictures as input.

How It Works

Upon the start-up the sample reads command line parameters and builds a network using Graph Builder API and passed weights file. Then, the application loads built network and an image to the Inference Engine plugin.

When inference is done, the application outputs inference results to the standard output stream.

Running

Running the application with the -h option yields the following usage message:

./lenet_network_graph_builder -h
InferenceEngine:
    API version ............ <version>
    Build .................. <number>

lenet_network_graph_builder [OPTION]
Options:

    -h                      Print a usage message.
    -m "<path>"             Path to a .bin file with weights for trained model
    -i "<path>"             Required. Path to image or folder with images
    -d "<device>"           Specify the target device to infer on this. Sample will look for a suitable plugin for device specified(default value is CPU)
    -pp "<path>"            Path to a plugin folder
    -pc                     Enables per-layer performance report
    -nt "<integer>"         Number of top results (default 10)
    -ni "<integer>"         Number of iterations (default 1)

Running the application with empty list of options yields the usage message given above.

For example, to do inference of an ubyte image on a GPU, run the following command:

./lenet_network_graph_builder -i <path_to_image> -m <path_to_weights_file> -d GPU
Demo Output

By default, the application outputs top-10 inference results for each infer request. In addition to this, it provides throughput value measured in frames per second.


Perfcheck Sample

This topic demonstrates how to build and run the Perfcheck sample application, which estimates performance by calculating minimum, average, and maximum FPS.

How It Works

Upon the start-up, the sample application reads command line parameters and loads a network and its inputs from given directory to the Inference Engine plugin. Then application starts infer requests in asynchronous mode till specified number of iterations is finished. 

After inference stage, Perfcheck sample computes total time of execution, divides execution time in 10 intervals and evaluates minimum, average and maximum FPS among these intervals.

Running

Running the application with the -h option yields the following usage message:

./perfcheck -h
[ INFO ] Inference Engine:
        API version ............ <version>
        Build .................. <number>

perfcheck [OPTIONS]
[OPTIONS]:
        -m                       <value>        Required. Path to an .xml file with a trained model.
        -h                                      Optional. Print a usage message.
        -d                       <value>        Optional. Specify the target device to infer on. Sample will look for a suitable plugin for device specified. Default value: CPU.
        -pp                      <value>        Optional. Path to a plugin folder.
        -l                       <value>        Optional. Required for CPU custom layers. Absolute path to a shared library with the kernels implementation.
        -c                       <value>        Optional. Required for GPU custom kernels. Absolute path to an .xml file with the kernels description.
        -inputs_dir              <value>        Optional. Path to a folder with images and binaries for inputs. Default value: ".".
        -config                  <value>        Optional. Path to a configuration file.
        -num_iterations          <value>        Optional. Specify number of iterations. Default value: 1000. Must be greater than or equal to 1000.
        -batch                   <value>        Optional. Specify batch. Default value: 1.
        -num_networks            <value>        Optional. Specify number of networks. Default value: 1. Must be less than or equal to 16.
        -num_requests            <value>        Optional. Specify number of infer requests. Default value depends on specified device.
        -num_fpga_devices        <value>        Optional. Specify number of FPGA devices. Default value: 1.

Running the application with the empty list of options yields an error message.

For example, you can use the following command to do inference on CPU on images from a folder using a trained Faster R-CNN network:

./perfcheck -m <path_to_model>/faster_rcnn.xml -inputs_dir <path_to_inputs> -d CPU

NOTE: Public models should be first converted to the Inference Engine format (.xml + .bin) using the Model Optimizer tool.

Sample Output

The application outputs a performance statistics that shows: total execution time (in milliseconds), number of iterations, batch size, minimum, average and maximum FPS.

Example of sample output:

[ INFO ] Inference Engine:
	API version ............ <version>
	Build .................. <number>
[ INFO ] Loading network files:
[ INFO ] 	<path_to_model_xml_file>
[ INFO ] 	<path_to_model_bin_file>
[ INFO ] Loading network 0
[ INFO ] All networks are loaded

Total time:     8954.61 ms
Num iterations: 1000
Batch:          1
Min fps:        110.558
Avg fps:        111.674
Max fps:        112.791

How to Integrate the Inference Engine in Your Application

  • This section talks about API information. For more information about APIs, see the offline documentation that was included in your package. To locate the current API Developer Guide topics:
    1. Go to <INSTALL_DIR>/deployment_tools/documentation/ where <INSTALL_DIR> is the directory in which the Intel® Distribution of OpenVINO™ toolkit is installed.
    2. Open index.html in an Internet browser.
    3. Select Integrating Inference Engine in Your Application (current API) from the contents.
  • This document refers to APIs from previous releases as "legacy" API. It is best to stop using the legacy API since it will be removed in a future product release. To locate the legacy API Developer Guide topics:
    1. Go to <INSTALL_DIR>/deployment_tools/documentation/ under the directory in which the Intel® Distribution of OpenVINO™ toolkit is installed.
    2. Open index.html in an Internet browser.
    3. Select Integrating Inference Engine in Your Application (legacy API) from the contents.
  • Complete API documentation is also in the full offline package documentation.
    1. Go to <INSTALL_DIR>/deployment_tools/documentation/ under the directory in which the Intel® Distribution of OpenVINO™ toolkit is installed.
    2. Open index.html in an Internet browser.
    3. Select Open Data Structures from the menu at the top of the screen.

Integrate the Inference Engine API with Your Application

This section provides a high-level description of the process of integrating the Inference Engine into your application. See Using Inference Engine Samples for examples of using the Inference Engine in applications.

Using the Inference Engine API in Your Code

The core libinference_engine.so library implements loading and parsing a model Intermediate Representation, and triggers inference using a specified plugin. The core library has the following API:

  • InferenceEngine::PluginDispatcher - This class allows find suitable plugin for specified device in given directories.
  • InferenceEngine::BlobInferenceEngine::TBlob
  • InferenceEngine::BlobMap
  • InferenceEngine::InputInfoInferenceEngine::InputsDataMap
  • InferenceEngine::OuputsDataMap

​C++ Inference Engine API wraps the capabilities of the core library:

  • InferenceEngine::CNNNetReader
  • InferenceEngine::CNNNetwork
  • InferenceEngine::IInferencePlugin - The main plugin interface. Every Inference Engine plugin implements this method. Use it through an InferenceEngine::InferenceEnginePluginPtr instance.
  • InferenceEngine::ExecutableNetwork
  • InferenceEngine::InferRequest

Integration Process

Integration process consists of the following steps:

  1. Load a plugin by creating an instance of InferenceEngine::InferenceEnginePluginPtr. Wrap it by creating an instance of InferenceEngine::InferencePlugin> from C++ Inference Engine API. Specify the plugin or let the Inference Engine choose it with InferenceEngine::PluginDispatcher.
    InferenceEnginePluginPtr engine_ptr = PluginDispatcher(pluginDirs).getSuitablePlugin(TargetDevice::eGPU);
    InferencePlugin plugin(engine_ptr);
  2. Create an Intermediate Representation reader by creating an instance of InferenceEngine::CNNNetReader and read a model Intermediate Representation (IR):
    CNNNetReader network_reader;
    network_reader.ReadNetwork("Model.xml");
    network_reader.ReadWeights("Model.bin");
  3. Configure input and output. Request input and output information using the InferenceEngine::CNNNetReader::getNetwork(), InferenceEngine::CNNNetwork::getInputsInfo(), and InferenceEngine::CNNNetwork::getOutputsInfo() methods:
    auto network = network_reader.getNetwork();
    /** Taking information about all topology inputs **/
    InferenceEngine::InputsDataMap input_info(network.getInputsInfo());
    /** Taking information about all topology outputs **/
    InferenceEngine::OutputsDataMap output_info(network.getOutputsInfo());
    

    Optionally, set the number format (precision) and memory layout for inputs and outputs. Refer to the Supported Devices section to choose the relevant configuration:

    /** Iterating over all input info**/
    for (auto &item : input_info) {
        auto input_data = item.second;
        input_data->setPrecision(Precision::U8);
        input_data->setLayout(Layout::NCHW);
    }
    /** Iterating over all output info**/
    for (auto &item : output_info) {
        auto output_data = item.second;
        output_data->setPrecision(Precision::FP32);
        output_data->setLayout(Layout::NC);
    }
    

    Skipping of this step sets default values:

    • Input and output precision - Precision::FP32
    • Input layout - Layout::NCHW
    • Output layout depends on number of its dimensions:
      Number of Dimensions54321
      LayoutNCDHWNCHWCHWNCC
  4. Load the model to the plugin using InferenceEngine::InferencePlugin::LoadNetwork():
    auto executable_network = plugin.LoadNetwork(network, {});
    }

    It creates an executable network from a network object. The executable network is associated with single hardware device. It's possible to create as many networks as needed and to use them simultaneously (up to the limitation of the hardware resources). Second parameter is a configuration for plugin. It's map of pairs: (parameter name, parameter value). Choose device from the Supported Devices section for more details about supported configuration parameters:

    /** Optional config. E.g. this enables profiling of performance counters. **/
    std::map<std::string, std::string> config = {{ PluginConfigParams::KEY_PERF_COUNT, PluginConfigParams::YES }};
    auto executable_network = plugin.LoadNetwork(network, config);
  5. Create an infer request using the InferenceEngine::ExecutableNetwork::CreateInferRequest() method:
    auto infer_request = executable_network.CreateInferRequest();
    
  6. Prepare input. There are three options to prepare input:
    • Optimal way for single network. Get blobs allocated by infer request using InferenceEngine::InferRequest::GetBlob() and feed an image and the input data to the blobs:
      /** Iterating over all input blobs **/
      for (auto & item : inputInfo) {
          auto input_name = item->first;
          /** Getting input blob **/
          auto input = infer_request.GetBlob(input_name);
          /** Fill input tensor with planes. First b channel, then g and r channels **/
          ...
      }
      
    • Optimal way for cascade of network (output of one network is input for another one). Get output blob from the first request using InferenceEngine::InferRequest::GetBlob() and set as input for the second request using InferenceEngine::InferRequest::SetBlob():
      auto output = infer_request1->GetBlob(output_name);
      infer_request2->SetBlob(input_name, output);
      
    • Allocate input blobs of the appropriate types, feed an image and the input data to the blobs and call InferenceEngine::InferRequest::SetBlob() to set these blobs for infer request:
      /** Iterating over all input blobs **/
      for (auto & item : inputInfo) {
          auto input_data = item->second;
          /** Creating input blob **/
          InferenceEngine::TBlob<unsigned char>::Ptr input;
          // assuming input precision was asked to be U8 in prev step
          input = InferenceEngine::make_shared_blob<unsigned char, InferenceEngine::SizeVector>(InferenceEngine::Precision:U8, input_data->getDims());
          input->allocate();
          infer_request->SetBlob(item.first, input);
          /** Fill input tensor with planes. First b channel, then g and r channels **/
          ...
      }

    SetBlob() method compares precision and layout of blob with corresponding precision and layout defined on step 3 and throws exception if they does not match. Blob can be filled before and after SetBlob().

  7. Do inference by calling the InferenceEngine::InferRequest::StartAsync and InferenceEngine::InferRequest::Wait methods for asynchronous request:
    infer_request->StartAsync();
    infer_request.Wait(IInferRequest::WaitMode::RESULT_READY);
    

    or by calling the InferenceEngine::InferRequest::Infer method for synchronous request:

    sync_infer_request->Infer();
    

    StartAsync returns immediately and starts inference without blocking main thread, Infer blocks main thread and returns when inference is completed. Call Wait for waiting result to become available for asynchronous request.

    There are three ways to use it:

    • Specify maximum duration in milliseconds to block for. The method is blocked until the specified timeout has elapsed, or the result becomes available, whichever comes first.
    • InferenceEngine::IInferRequest::WaitMode::RESULT_READY - Waits until inference result becomes available
    • InferenceEngine::IInferRequest::WaitMode::STATUS_ONLY - Immediately returns request status. It doesn't block or interrupts current thread.

    Both requests are thread-safe: can be called from different threads without fearing corruption and failures.

    Multiple requests for single ExecutableNetwork are executed sequentially one by one in FIFO order.

    While request is ongoing all its methods except InferenceEngine::InferRequest::Wait would throws exception.

  8. Go over the output blobs and process the results. Note that casting Blob to TBlob via std::dynamic_pointer_cast is not a recommended way, better to access data via buffer() and as() methods as follows:
    for (auto &item : output_info) {
        auto output_name = item.first;
        auto output = infer_request.GetBlob(output_name);
        {
            auto const memLocker = output->cbuffer(); // use const memory locker
            // output_buffer is valid as long as the lifetime of memLocker
            const float *output_buffer = memLocker.as<const float *>();
            /** output_buffer[] - accessing output blob data **/

Building Your Application

For details about building your application, see the CMake files for the sample applications. All samples reside in the samples directory in the Inference Engine installation directory: <INSTALL_DIR>/deployment_tools/inference_engine/samples/

Running the Application

Before running compiled binary files, make sure your application can find the Inference Engine libraries. On Linux* operating systems, including Ubuntu* and CentOS*, the LD_LIBRARY_PATH environment variable is usually used to specify directories to be looked for libraries. You can update the LD_LIBRARY_PATH with paths to the directories in the Inference Engine installation directory where the libraries reside.

Add a path the directory containing the core and plugin libraries:

  • For the Inference Engine installed within the Intel Distribution of OpenVINO toolkit package:
    $ export LD_LIBRARY_PATH=/opt/intel/computer_vision_sdk_<version>/inference_engine/lib/<linux_version>/intel64:$LD_LIBRARY_PATH
  • For Intel® Deep Learning Deployment Toolkit installation:
    $ export LD_LIBRARY_PATH=/opt/intel/deep_learning_sdk_<version>/deployment_tools/inference_engine/lib/<linux_version>/intel64:$LD_LIBRARY_PATH

Add paths the directories containing the required third-party libraries:

  • For Inference Engine installed within the Intel Distribution of OpenVINO toolkit package:
    $ export LD_LIBRARY_PATH=/opt/intel/computer_vision_sdk_<version>/inference_engine/external/mklml_lnx/lib:$LD_LIBRARY_PATH
    $ export LD_LIBRARY_PATH=/opt/intel/computer_vision_sdk_<version>/inference_engine/external/cldnn/lib:$LD_LIBRARY_PATH
  • For Intel® Deep Learning Deployment Toolkit installation:
    export LD_LIBRARY_PATH=/opt/intel/deep_learning_sdk_<version>/deployment_tools/external/mklml_lnx/lib:$LD_LIBRARY_PATH
          export LD_LIBRARY_PATH=/opt/intel/deep_learning_sdk_<version>/deployment_tools/external/cldnn/lib:$LD_LIBRARY_PATH

Alternatively, you can use the following scripts that reside in the Inference Engine directory of the Intel Distribution of OpenVINO toolkit and Intel® Deep Learning Deployment Toolkit installation folders respectively:

  • /opt/intel/computer_vision_sdk_<version>/bin/setupvars.sh
  • /opt/intel/deep_learning_sdk_<version>/deployment_tools/inference_engine/bin/setvars.sh

To run compiled applications on Windows* OS, make sure that Microsoft* Visual C++ 2015 Redistributable and Intel® C++ Compiler 2017 Redistributable packages are installed and <INSTALL_DIR>/bin/intel64/Release/*.dll files are placed to the application folder or accessible via %PATH% environment variable.


Integration With the Legacy API

NOTE: The subject of this section is Legacy APIs. Legacy APIs are deprecated and will be removed in a future release. It is best to use the current APIs.

This section provides a high-level description of the process of integrating the Inference Engine into your application. See Using Inference Engine Samples for examples of using the Inference Engine in applications.

Using the Inference Engine API in Your Code

The core libinference_engine.so library implements loading and parsing a model Intermediate Representation, and triggers inference using a specified plugin. The core library has the following API:

  • InferenceEngine::IInferencePlugin - The main plugin interface. Every Inference Engine plugin implements this method. Use it through an InferenceEngine::InferenceEnginePluginPtr instance.
  • InferenceEngine::PluginDispatcher - This class finds the suitable plugin for a specified device in given directories.
  • InferenceEngine::CNNNetReader
  • InferenceEngine::CNNNetwork
  • InferenceEngine::Blob, InferenceEngine::TBlob
  • InferenceEngine::BlobMap
  • InferenceEngine::InputInfo, InferenceEngine::InputsDataMap

The Integration Process

  1. Load a plugin by creating an instance of InferenceEngine::InferenceEnginePluginPtr.
  2. Specify the plugin or let the Inference Engine choose it with InferenceEngine::PluginDispatcher. See the selectPlugin() function in the samples.
    InferenceEngine::PluginDispatcher dispatcher(pluginDirs);
    InferenceEngine::InferenceEnginePluginPtr enginePtr (dispatcher.getSuitablePlugin(TargetDevice::eCPU);
  3. Create an Intermediate Representation reader by creating an instance of InferenceEngine::CNNNetReader and read a model Intermediate Representation:
    auto netBuilder = new InferenceEngine::CNNNetReader();
    netBuilder->ReadNetwork("Model.xml");
    netBuilder->ReadWeights("Model.bin");
  4. Request information about inputs (an image and any other input data required) using the InferenceEngine::CNNNetReader::getNetwork() and InferenceEngine::CNNNetwork::getInputsInfo() methods. Set the input number format (precision) using InferenceEngine::InputInfo::setInputPrecision to match the input data format (precision). Allocate input blobs of the appropriate types and feed an image and the input data to the blobs:
    /** Taking information about all topology inputs **/
    InferenceEngine::InputsDataMap inputInfo(netBuilder.getNetwork().getInputsInfo());
    /** Stores all input blobs data **/
    InferenceEngine::BlobMap inputBlobs;
    /** Iterating over all input blobs **/
    for (auto & item : inputInfo) {
        /** Creating input blob **/
        item.second->setInputPrecision(Precision::U8);
        InferenceEngine::TBlob[unsigned char]::Ptr input;
        input = InferenceEngine::make_shared_blob[unsigned char, InferenceEngine::SizeVector](Precision::U8, item.second->getDims());
        input->allocate();
        inputBlobs[item.first] = input;
        /** Fill input tensor with planes. First b channel, then g and r channels **/
        ...
    }
  5. Request information about outputs, using the InferenceEngine::CNNNetReader::getNetwork() and InferenceEngine::CNNNetwork::getOutputsInfo() methods. Allocate output blobs of the appropriate types:
    InferenceEngine::OutputsDataMap outputInfo(netBuilder.getNetwork().getOutputsInfo());
    InferenceEngine::BlobMap outputBlobs;
    for (auto & item : outputInfo) {
        InferenceEngine::TBlob[float]::Ptr output;
        output = InferenceEngine::make_shared_blob[float, InferenceEngine::SizeVector](Precision::FP32, item.second->dims);
        output->allocate();
        outputBlobs[item.first] = output;
    }
  6. Load the model to the plugin using InferenceEngine::IInferencePlugin::LoadNetwork():
    InferenceEngine::StatusCode status = enginePtr->LoadNetwork(netBuilder.getNetwork(), &resp);
    if (status != InferenceEngine::OK) {
        throw std::logic_error(resp.msg);
    }
  7. Do inference by calling the InferenceEngine::IInferencePlugin::Infer method:
    enginePtr->Infer(inputBlobs, outputBlobs, &resp);
    
  8. Go over the output blobs and process the results.
    /** Pointer to the output blob **/
    const TBlob[float]::Ptr fOutput = std::dynamic_pointer_cast[TBlob[float]](outputBlobs.begin()->second);
    /** fOutput->data()[] - accessing output blob data **/

Building Your Application

For details about building your application, see the CMake files for the sample applications. All samples reside in the samples directory in the Inference Engine installation directory.

Running the Application

Before running compiled binary files:

Make sure your application can find the Inference Engine libraries. On Linux* operating systems, the LD_LIBRARY_PATH environment variable specifies the library directories.

Update LD_LIBRARY_PATH with directory paths under the Inference Engine installation directory in which the libraries reside.

Add a path the directory containing the core and plugin libraries:

  • For Inference Engine installed within the Intel® Distribution of OpenVINO™ toolkit package:
    export LD_LIBRARY_PATH=<INSTALL_DIR>/inference_engine/lib/<linux_version>/intel64:$LD_LIBRARY_PATH
    </linux_version>

Add paths the directories containing the required third-party libraries:

  • For Inference Engine installed within the Intel® Distribution of OpenVINO™ toolkit package:
    export LD_LIBRARY_PATH=<INSTALL_DIR>/inference_engine/external/mklml_lnx/lib:$LD_LIBRARY_PATH
    export LD_LIBRARY_PATH=<INSTALL_DIR>/inference_engine/external/cldnn/lib:$LD_LIBRARY_PATH
    

As an alternative, use scripts under the Inference Engine directory for the Intel® Distribution of OpenVINO™ toolkit installation:

<INSTALL_DIR>/bin/setupvars.sh

To run compiled applications on Microsoft* Windows* OS, make sure that Microsoft* Visual C++ 2015 Redistributable and Intel® C++ Compiler 2017 Redistributable packages are installed and <INSTALL_DIR>\deployment_tools\inference_engine\bin\intel64\Release\*.dll files are in the application directory or accessible through the PATH environment variable.

Adding Your Own Kernels in the Inference Engine

A Layer is a CNN building block is implemented in the training framework, such as "Convolution" in Caffe*. Kernel is defined as the corresponding implementation in Inference Engine.

Plug your kernel implementations into the Inference Engine and map them to the layers in the original framework. See the Model Optimizer Developer Guide for information about how a mapping between framework's layers and Inference Engine kernels is registered.

The rest of the section covers custom kernels and how to integrate them into the Inference Engine.

Example of Custom Kernels Support in the Samples

Every sample uses the Inference Engine API to load custom kernels depending on the device type. Specifically, for the CPU this is a shared library that exports certain interface that registers the kernels. For GPU or MYRIAD, it is an .xml file that lists the kernels along with params that the kernels accept, and how these map to the specific Intermediate Representation values.

Example Custom Kernels

The extension folder in the samples directory comes with a few real example of CPU-targeted kernels, for example, DetectionOutput (used in SSD*).

Bunch the GPU-targeted kernels to the binaries upon compiling the samples so the sample applications can easily load them. See the cldnn_global_custom_kernels directory in the GPU plugin installation directory.

How to Implement Custom GPU Layers

You must provide the kernel code in the OpenCL C, and the configuration file that connects the kernel and its parameters to the parameterss of the layer.

You have two options for using the custom layer configuration file.

  • Include a section with your kernels into the global auto-loading file cldnn_global_custom_kernels/cldnn_global_custom_kernels.xml
  • Second one is to provide a separate configuration file and load it using IInferencePlugin::SetConfig() method with the PluginConfigParams::KEY_CONFIG_FILE key and the configuration file name as the value, before loading the network that features the custom layers:
    // Load the Intel® Integrated Graphics plugin InferenceEngine::InferenceEnginePluginPtr plugin_ptr(selectPlugin({…, “GPU”)); InferencePlugin plugin(plugin_ptr);
    // Load the Intel® Integrated Graphics Extensions plugin.SetConfig({{PluginConfigParams::KEY_CONFIG_FILE, ”<path to the xml file>”}});

For details about the configuration parameters and OpenCL kernel, see the Custom Layers Support in Inference Engine.

How to Implement Custom CPU Layers

The instructions below are a brief summary of the CPU Layers section in the Custom Layers support in Inference Engine.

For more details, see the sample source.

  1. Create a custom layer factory CustomLayerFactory class.
    // custom_layer.h
    // A CustomLayerFactory class is an example layer which make exponentiation by 2 for the input and doesn't change dimensions
    class CustomLayerFactory {
    };
  2. Inherit it from the abstract class InferenceEngine::ILayerImplFactory:
    // custom_layer.h
    class CustomLayerFactory: public InferenceEngine::ILayerImplFactory {
    };
  3. Create constructor and virtual destructor, and a data member to keep the layer info
    // custom_layer.h
    class CustomLayerFactory: public InferenceEngine::ILayerImplFactory {
    public:
        explicit CustomLayerFactory(const CNNLayer *layer): cnnLayer(*layer) {}
    private:
        CNNLayer cnnLayer;
    };
  4. Overload and implement the abstract methods (getShapes, getImplementations) of the InferenceEngine::ILayerImplFactory class
    // custom_layer.h
    class CustomLayerFactory: public InferenceEngine::ILayerImplFactory {
    public:
        // ... constructor and destructor
        StatusCode getShapes(const std::vector<tensordesc>& inShapes, std::vector<tensordesc>& outShapes, ResponseDesc *resp) noexcept override {
            if (cnnLayer == nullptr) {
                std::string errorMsg = "Cannot get cnn layer!";
                errorMsg.copy(resp->msg, sizeof(resp->msg) - 1);
                return GENERAL_ERROR;
            }
            if (inShapes.size() != 1) {
                std::string errorMsg = "Incorrect input shapes!";
                errorMsg.copy(resp->msg, sizeof(resp->msg) - 1);
                return GENERAL_ERROR;
            }
            outShapes.clear();
            outShapes.emplace_back(inShapes[0]);
            return OK;
        }
        StatusCode getImplementations(std::vector<ilayerimpl::ptr>& impls, ResponseDesc *resp) noexcept override {
            // You can put cnnLayer to implimentation if it is necessary.
            impls.push_back(ILayerImpl::Ptr(new CustomLayerImpl()));
            return OK;
        }
    };
  5. Create your custom layer implementation CustomLayerImpl class:
    // custom_layer.h
    // A CustomLayerImpl class is an example implementation
    class CustomLayerImpl {
    };
  6. Because the layer uses the execute method to change data, inherit it from the abstract class InferenceEngine::ILayerExecImpl, and overload and implement the abstract methods of this class.
    // custom_layer.h
    // A CustomLayerImpl class is an example implementation
    class CustomLayerImpl: public ILayerExecImpl {
    public:
        explicit CustomLayerImpl(const CNNLayer *layer): cnnLayer(*layer) {}
        StatusCode getSupportedConfigurations(std::vector<layerconfig>& conf, ResponseDesc *resp) noexcept override;
        StatusCode init(LayerConfig& config, ResponseDesc *resp) noexcept override;
        StatusCode execute(std::vector<blob::ptr>& inputs, std::vector<blob::ptr>& outputs, ResponseDesc *resp) noexcept override;
    private:
        CNNLayer cnnLayer;
    };
  7. Implement the getSupportedConfigurations to return all supported configurations for this implementation. To specify formats of data use InferenceEngine::TensorDesc:
    // custom_layer.cpp
    virtual StatusCode CustomLayerImpl::getSupportedConfigurations(std::vector<layerconfig>& conf, ResponseDesc *resp) noexcept {
        try {
            // This layer can be in-place but not constant!!!
            if (cnnLayer == nullptr)
                THROW_IE_EXCEPTION << "Cannot get cnn layer";
            if (cnnLayer->insData.size() != 1 || cnnLayer->outData.empty())
                THROW_IE_EXCEPTION << "Incorrect number of input/output edges!";
            LayerConfig config;
            DataPtr dataPtr = cnnLayer->insData[0].lock();
            if (!dataPtr)
                THROW_IE_EXCEPTION << "Cannot get input data!";
            DataConfig dataConfig;
            dataConfig.inPlace = -1;
            dataConfig.constant = false;
            SizeVector order;
            for (size_t i = 0; i < dataPtr->getTensorDesc().getDims().size(); i++) {
                order.push_back(i);
            }
            // Planar formats for N dims
            dataConfig.desc = TensorDesc(dataPtr->getTensorDesc().getPrecision(),
                                         dataPtr->getTensorDesc().getDims(),
                                         {dataPtr->getTensorDesc().getDims(), order});
            config.inConfs.push_back(dataConfig);
            DataConfig outConfig;
            outConfig.constant = false;
            outConfig.inPlace = 0;
            order.clear();
            for (size_t i = 0; i < cnnLayer->outData[0]->getTensorDesc().getDims().size(); i++) {
                order.push_back(i);
            }
            outConfig.desc = TensorDesc(cnnLayer->outData[0]->getTensorDesc().getPrecision(),
                                        cnnLayer->outData[0]->getDims(),
                                        {cnnLayer->outData[0]->getDims(), order});
            config.outConfs.push_back(outConfig);
            config.dynBatchSupport = 0;
            conf.push_back(config);
            return OK;
        } catch (InferenceEngine::details::InferenceEngineException& ex) {
            std::string errorMsg = ex.what();
            errorMsg.copy(resp->msg, sizeof(resp->msg) - 1);
            return GENERAL_ERROR;
        }
    }
  8. Implement init and execute methods. init is necessary to get selected configuration and check parameters.
    // custom_layer.cpp
    virtual StatusCode CustomLayerImpl::init(LayerConfig& config, ResponseDesc *resp) noexcept {
        StatusCode rc = OK;
        if (config.dynBatchSupport) {
            config.dynBatchSupport = 0;
            rc = NOT_IMPLEMENTED;
        }
        for (auto& input : config.inConfs) {
            if (input.inPlace >= 0) {
                input.inPlace = -1;
                rc = NOT_IMPLEMENTED;
            }
            for (auto& offset : input.desc.getBlockingDesc().getOffsetPaddingToData()) {
                if (offset) {
                    return GENERAL_ERROR;
                }
            }
            if (input.desc.getBlockingDesc().getOffsetPadding()) {
                return GENERAL_ERROR;
            }
            for (size_t i = 0; i < input.desc.getBlockingDesc().getOrder().size(); i++) {
                if (input.desc.getBlockingDesc().getOrder()[i] != i) {
                    if (i != 4 || input.desc.getBlockingDesc().getOrder()[i] != 1)
                        return GENERAL_ERROR;
                }
            }
        }
        for (auto& output : config.outConfs) {
            if (output.inPlace < 0) {
                // NOT in-place
            }
            for (auto& offset : output.desc.getBlockingDesc().getOffsetPaddingToData()) {
                if (offset) {
                    return GENERAL_ERROR;
                }
            }
            if (output.desc.getBlockingDesc().getOffsetPadding()) {
                return GENERAL_ERROR;
            }
            for (size_t i = 0; i < output.desc.getBlockingDesc().getOrder().size(); i++) {
                if (output.desc.getBlockingDesc().getOrder()[i] != i) {
                    if (i != 4 || output.desc.getBlockingDesc().getOrder()[i] != 1)
                        return GENERAL_ERROR;
                }
            }
        }
        return rc;
    }
    virtual StatusCode CustomLayerImpl::execute(std::vector<blob::ptr>& inputs, std::vector<blob::ptr>& outputs, ResponseDesc *resp) noexcept {
        if (inputs.size() != 1 || outputs.empty()) {
            std::string errorMsg = "Incorrect number of input or output edges!";
            errorMsg.copy(resp->msg, sizeof(resp->msg) - 1);
            return GENERAL_ERROR;
        }
        const float* src_data = inputs[0]->buffer();
        float* dst_data = outputs[0]->buffer();
        for (size_t o = 0; o < outputs->size(); o++) {
            if (dst_data == src_data) {
                dst_data[o] *= dst_data[o];
            } else {
                dst_data[o] = src_data[o]*src_data[o];
            }
        }
    }
  9. Create a factory for your own primitives, inherited from the abstract class InferenceEngine::IExtension
    // custom_extension.h
    class CustomExtention : public InferenceEngine::IExtension {
    };
    Implement the utility methods Unload, Release, SetLogCallback:
    // custom_extension.h
    class CustomExtention : public InferenceEngine::IExtension {
    public:
        // could be used to cleanup resources
        void Unload() noexcept override {
        }
        // is used when destruction happens
        void Release() noexcept override {
            delete this;
        }
        // logging is used to track what is going on inside
        void SetLogCallback(InferenceEngine::IErrorListener &listener) noexcept override {}
    };
  10. Implement the utility method GetVersion:
    // custom_extension.h
    class CustomExtention : public InferenceEngine::IExtension {
    private:
        static InferenceEngine::Version ExtensionDescription = {
            {1, 0},             // extension API version
            "1.0",
            "CustomExtention"   // extension description message
        };
    public:
        // gets extension version information
        void GetVersion(const InferenceEngine::Version *& versionInfo) const noexcept override {
            versionInfo = &ExtensionDescription;
        }
    };
    Implement main extension methods:
    // custom_extension.h
    class CustomExtention : public InferenceEngine::IExtension {
    public:
        // ... utility methods
        StatusCode getPrimitiveTypes(char**& types, unsigned int& size, ResponseDesc* resp) noexcept override {
            std::string type_name = "CustomLayer";
            types = new char *[1];
            size = 1;
            types[0] = new char[type_name.size() + 1];
            std::copy(type_name.begin(), type_name.end(), types[0]);
            types[0][type_name.size()] = '\0';
            return OK;
        }
        StatusCode getFactoryFor(ILayerImplFactory *&factory, const CNNLayer *cnnLayer, ResponseDesc *resp) noexcept override {
            if (cnnLayer->type != "CustomLayer") {
                std::string errorMsg = std::string("Factory for ") + cnnLayer->type + " wasn't found!";
                errorMsg.copy(resp->msg, sizeof(resp->msg) - 1);
                return NOT_FOUND;
            }
            factory = new CustomLayerFactory(cnnLayer);
            return OK;
        }
    };
  11. To use your custom layers, compile the code as the shared library, and then use the AddExtension method of the general plugin interface to load your primitives:
    auto extension_ptr = make_so_pointer<inferenceengine::iextension>(“<shared lib path>”);
    // Add extension to the plugin’s list
    plugin.AddExtension(extension_ptr);</inferenceengine::iextension>

How to Implement Custom MYRIAD Layers

  1. Since OpenCL™ toolchain for MYRIAD supports only offline compilation, OpenCL C code should be fisrt compiled using standalone clc compiler with the following command:
    ./clc --strip-binary-header custom_layer.cl -o custom_layer.bin
  2. Write a configuration file with a kernel parameter description and bindings.

    For example, for the following OpenCL kernel signature:

    __kernel void reorg_nhwc(__global const half *src, __global half *out, int w, int h, int c, int stride);

    configuration file might be the following

    <CustomLayer name="ReorgYolo" type="MVCL" version="1">
        <Kernel entry="reorg_nhwc">
            <Source filename="reorg.bin"/>
        </Kernel>
        <Parameters>
            <Tensor arg-name="src"    type="input"  port-index="0"                format="BYXF"/>
            <Tensor arg-name="out"    type="output" port-index="0"                format="BYXF"/>
            <Scalar arg-name="w"      type="int"    port-index="0" source="I.X"                />
            <Scalar arg-name="h"      type="int"    port-index="0" source="I.Y"                />
            <Scalar arg-name="c"      type="int"    port-index="0" source="I.F"                />
            <Scalar arg-name="stride" type="int"                   source="stride"             />
        </Parameters>
        <WorkSizes dim="input,0" global="(Y+7)/8*8,1,1" local="8,1,1"/>
    </CustomLayer>

    Each custom layer is described with a CustomLayer node. The following nodes and attributes must be specified:

    • Root node CustomLayer:
      • Attribute name is the name of the IE layer to bind kernel with.
      • Attributes type and version. We leave them MVCL and 1 for now.
    • Sub-node Kernel:
      • Attribute entry, which is a name of our kernel function as written in source file (reorg_nhwc in example above)
      • Node Source with attribute filename, which is a path to a compiled binary relatively to the .xml binding file
    • Sub-node Parameters, which describes parameters bindings.
    • Sub-node WorkSizes, which describes local and global work group sizes and source for dimension deduction as a pair direction,port. In the example above we describe work group relative dimension of input tensor that comes thought port 0 in IR. Any simple math expressions with +,-,*,/ and () from B(batch), Y(height), X(width), and F(channels) are supported for global and local work group configuration.

    Parameter description format is the following:

    • Tensor and Scalar nodes are supported
    • Each Tensor node must contain:
      • arg-name attribute, which is a name of kernel parameter in kernel signature
      • type, which is input or output as in IR
      • port-index, which is a number of input/output ports as in IR
      • format, which specifies channel order in tensor. Optional repacks would be generated if custom layer format is not compatible with formats of neighboring layers.
    • Each Scalar node must contain:
      • arg-name attribute, which is a name of kernel parameter in kernel signature
      • type, which is int or float and used for correct argument extraction from IR parameters
      • source, which might contain name of the parameter in IR file or input/output (I/O, In/On, where n is a port number) followed by dimension B(batch), Y(height), X(width) or F(channels).
  3. Provide a separate configuration file and load it using IInferencePlugin::SetConfig() method with the PluginConfigParams::KEY_CONFIG_FILE key and the configuration file name as the value, before loading the network that features the custom layers:
    // Load MYRIAD plugin
    InferenceEngine::InferenceEnginePluginPtr plugin_ptr("libmyriadPlugin.so");
    InferencePlugin plugin(plugin_ptr);
    // Load custom layers
    plugin.SetConfig({{PluginConfigParams::KEY_CONFIG_FILE, ”<path to the xml file>”}});

    Optionally, you can set path to custom layers description with a pair of VPU_CUSTOM_LAYERS and /path/to/your/customLayers.xml as a network configuration:

    // Load MYRIAD plugin
    InferenceEngine::InferenceEnginePluginPtr myriad("libmyriadPlugin.so");
    std::map<std::string, std::string> networkConfig;
    config["VPU_CUSTOM_LAYERS"] = "/path/to/your/customLayers.xml";
    // Load custom layers in network config
    IECALL(myriad->LoadNetwork(exeNetwork, cnnNetwork, networkConfig, &resp));

    NOTE: If both native and custom layer implementations are present, custom kernel has a priority over native code.

Cross Check Tool

Cross Check Tool is a console application that enables comparing accuracy and performance metrics for two successive model inferences that are performed on two different supported Intel® devices or with different precisions. The Cross Check Tool can compare metrics per layer or all over the model.

On Linux* OS, before running the Cross Check Tool binary, make sure your application can find the Deep Learning Inference Engine libraries. Navigate to the <INSTALL_DIR>/deployment_tools/inference_engine/bin folder and run the setvars.sh script to set all necessary environment variables:

source setvars.sh

Running the Cross Check Tool

Cross Check Tool is distributed as a binary file and there is no need to build it. To run the Cross Check Tool, execute the tool's binary file with necessary parameters. Please note that the Inference Engine assumes that weights are in the same folder as the .xml file.

You can get the list of all available options using the -h option:

$./cross_check_tool -h
InferenceEngine:
  API version ............ 1.0
  Build .................. ###
[ INFO ] Parsing input parameters

./cross_check_tool [OPTION]
Options:

    -h                     Prints a usage message.
    -i "<path>"            Optional. Path to an input image file or multi-input file to infer. Generates input(s) from normal distribution if empty
    -m "<path>"            Required. Path to an .xml file that represents the first IR of the trained model to infer.
      -l "<absolute_path>" Required for MKLDNN (CPU)-targeted custom layers. Absolute path to a shared library with the kernels implementation.
          Or
      -c "<absolute_path>" Required for clDNN (GPU)-targeted custom kernels. Absolute path to the xml file with the kernels description.
    -conf "<path>"         Optional. Path to config file for -d device plugin
    -ref_conf "<path>"     Optional. Path to config file for -ref_d device plugin
    -pp "<path>"           Optional. Path to a plugin folder.
    -d "<device>"          Required. The first target device to infer the model specified with the -m option. CPU, GPU, HDDL or MYRIAD is acceptable.
    -ref_m "<path>"        Optional. Path to an .xml file that represents the second IR in different precision to compare the metrics.
    -ref_d "<device>"      Required. The second target device to infer the model and compare the metrics. CPU, GPU, HDDL or MYRIAD is acceptable.
    -layers "<options>"    Defines layers to check. Options: all, None - for output layers check, list of comma-separated layer names to check. Default value is None.
    -eps "<float>"         Optional. Threshold for filtering out those blob statistics that do not satisfy the condition: max_abs_diff < eps.
    -dump                  Enables blobs statistics dumping
    -load "<path>"         Path to a file to load blobs from

Examples

  • To check per-layer accuracy and performance of inference in FP32 precision on the CPU against the GPU, run:
    ./cross_check_tool -i <path_to_input_image_or_multi_input_file> \
             -m <path_to_FP32_xml>    \
             -d CPU                   \
             -ref_d GPU               \
             -layers all
                 

    The output looks as follows:

    InferenceEngine:
      API version ............ 1.0
      Build .................. ###
    [ INFO ] Parsing input parameters
        The same IR on both devices: <path_to_IR>
    
    [ INFO ] No extensions provided
    
      API version ............ 1.0
      Build .................. lnx_20180510
      Description ....... MKLDNNPlugin
    
      API version ............ 0.1
      Build .................. ci-main-03659
      Description ....... clDNNPlugin
    [ INFO ] Inputs detected: Placeholder
    [ INFO ] Statistics will be dumped for X layers: <layer_1_name>, <layer_2_name>, ... , <layer_X_name>
    [ INFO ] Layer <layer_1_name> statistics
        Max absolute difference: 1.52588e-05
        Min absolute difference: 0
        Max relative difference: 0.000288028%
        Min relative difference: 0%
                      Blob size: 1000
    
                        Devices:            CPU_FP32            GPU_FP32
                         Status:            EXECUTED            EXECUTED
                     Layer type:             Reshape             Reshape
            Real time, microsec:                  20                 154
                 Execution type:             unknown                 GPU
                  Number of NAN:                   0                   0
                  Number of INF:                   0                   0
                 Number of ZERO:                   0                   0
    ...
    <list_of_layer_statistics>
    ...
    
    [ INFO ] Overall max absolute difference 2.81334e-05 was reached by <layer_name> layer
    [ INFO ] Overall min absolute difference 0 was reached by <layer_name> layer
    [ INFO ] Overall max relative difference 0.744893% was reached by <layer_name> layer
    [ INFO ] Overall min relative difference -2.47948% was reached by <layer_name> layer
    [ INFO ] Execution successful
          
  • To check the overall accuracy and performance of inference on the CPU in FP32 precision against the Intel® Movidius™ Myriad™ device in FP16 precision, run:
    ./cross_check_tool -i <path_to_input_image_or_multi_input_file> \
             -m <path_to_FP16_xml>    \
             -ref_d CPU               \
             -ref_m <path_to_FP32_xml>\
             -d MYRIAD                \

    The output looks as follows:

            InferenceEngine:
              API version ............ 1.0
              Build .................. ###
            
            [ INFO ] Parsing input parameters
            [ INFO ] MYRIAD vs CPU
                IR for MYRIAD : <path_to_FP16_xml>
                IR for CPU : <path_to_FP32_xml>
            
            [ INFO ] No extensions provided
            [ INFO ] Loading plugins
            
              API version ............ 0.1
              Build .................. ###
              Description ....... myriadPlugin
            
            
              API version ............ 1.0
              Build .................. ###
              Description ....... MKLDNNPlugin
            
            [ INFO ] Inputs detected: <list_of_input_layers>
            [ INFO ] Statistics will be dumped for 1 layers: &;t;output_layer_name(s)>
            [ INFO ] Layer <output_layer_name> statistics
                Max absolute difference: 0.003889
                Min absolute difference: 2.49778e-12
                Max relative difference: 290.98%
                Min relative difference: 0.0327804%
                                Devices:         MYRIAD_FP16            CPU_FP32
                    Real time, microsec:        69213.978946         4149.904940
            [ INFO ] Execution successful
            
  • To dump layer statistics from specific list of layers, run:
    ./cross_check_tool -i <path_to_input_image_or_multi_input_file> \
             -m <path_to_FP16_xml>                        \
             -d MYRIAD                                    \
             -dump                                        \
             -layers <comma_separated_list_of_layers>

    The output looks as follows:

      InferenceEngine:
        API version ............ 1.0
        Build .................. ###
      [ INFO ] Blob and statistics dumping enabled
      [ INFO ] No extensions provided
      
        API version ............ 0.1
        Build .................. custom_releases/cvsdk-2018-r2_e28ec0278fb749d6b999c688a8e90a8a25c0f2b5
        Description ....... myriadPlugin
      
      [ INFO ] Inputs detected: <list_of_input_layers>
      [ INFO ] Statistics will be dumped for X layers: <comma_separated_list_of_layers>
      [ INFO ] Dump path: <path_where_dump_will_be_saved>
      [ INFO ] <layer_1_name> layer processing
      ...
      [ INFO ] <layer_X_name> layer processing
      [ INFO ] Execution successful
      

    If you do not provide the -i key, the Cross Check Tool generates an input from normal distributed noise and saves it in a multi-input file format with the filename <path_to_xml>_input_layers_dump.txt in the same folder as the IR.

  • To check the overall accuracy and performance of inference on the CPU in FP32 precision against dumped results, run:
        ./cross_check_tool -i <path_to_input_image_or_multi_input_file> \
                           -m <path_to_FP32_xml>                        \
                           -d CPU                                       \
                           -load <path_to_dump>                         \
                           -layers all
    

    The output looks as follows:

      InferenceEngine:
        API version ............ 1.0
        Build .................. ###
      [ INFO ] Blob and statistics loading enabled. File /localdisk/models/FP16/icv_squeezenet_v1.0_MYRIAD_FP16_dump.txt
          The same IR on both devices: <path_to_FP32_xml>
      
      [ INFO ] No extensions provided
      
        API version ............ 0.1
        Build .................. ###
        Description ....... myriadPlugin
      
      [ INFO ] Inputs detected: <list_of_input_layers>
      [ INFO ] Statistics will be dumped for X layers: <layer_1_name>, <layer_2_name>, ... , <layer_X_name>
      [ INFO ] <layer_1_name> layer processing
      [ INFO ] Layer <layer_1_name> statistics
          Max absolute difference: 0
          Min absolute difference: 0
          Max relative difference: 0%
          Min relative difference: 0%
                        Blob size: 1000
      
                          Devices:         MYRIAD_FP16  MYRIAD_FP16_loaded
                           Status:            EXECUTED            EXECUTED
                       Layer type:             SoftMax             SoftMax
              Real time, microsec:                  43                  43
                   Execution type:             SoftMax             SoftMax
                    Number of NAN:                   0                   0
                    Number of INF:                   0                   0
                   Number of ZERO:                   0                   0
      ...
      <list_of_layer_statistics>
      ...
      [ INFO ] Overall max absolute difference 0
      [ INFO ] Overall min absolute difference 0 was reached by <layer_1_name> layer
      [ INFO ] Overall max relative difference 0%
      [ INFO ] Overall min relative difference 0% was reached by <layer_1_name> layer
      [ INFO ] Execution successful
    

Multi-Input and Dump File Experimental Format

Text file contains description of each layer in structure like this:

  • 1st line is layer name (required)
  • 2nd line is shape like "(1,224,224,3)" (required)
  • 3rd line is a device and precision information like CPU_FP32 (optional for multi-input file)
  • 4th line is execution status Options are: EXECUTED, OPTIMIZED_OUT (optional for multi-input file)
  • 5th line is type of layer (optional for multi-input file)
  • 6th line is execution time in microseconds (optional for multi-input file)
  • 7th line is type of execution (optional for multi-input file)
  • 8th line is word CONTENT which means that the next line or lines are consisted of blob elements.
  • Next line or lines are for blob elements. They may be separated with one or several spaces, tabs and new lines.

Multi-Input File Example

Input_1
(1,10)
CONTENT
0 0.000628471375 0.00185108185
0.000580787659
0.00137138367
0.000561237335 0.0040473938 0 0 0
Input_2
(1,8)
CONTENT
0 0 0.00194549561 0.0017490387 7.73072243e-05 0.000135779381 0.000186920166 0 7.52806664e-05

Dump file example

Softmax
(1,10)
MYRIAD_FP16
EXECUTED
SoftMax
43
SoftMax
CONTENT
7.44462013e-05
0
0.000810623169
0.000361680984
0
9.14335251e-05
0
0
8.15987587e-05
0

Configuration file

There is an option to pass configuration file to plugin by providing -conf and/or --ref_conf keys.

Configuration file is a text file with content of pairs of keys and values.

Structure of configuration file:

KEY VALUE
ANOTHER_KEY ANOTHER_VALUE,VALUE_1

Advanced Topics

Key terms in this section:

Acronym/TermDescription
DLDeep Learning
FP16 formatHalf-precision floating-point format
FP32 formatSingle-precision floating-point format
I16 format2-byte signed integer format
I8 format1-byte signed integer format
U16 format2-byte unsigned integer format
U8 format1-byte unsigned integer format
NCHW, NHWC

Image data layout. Refers to the representation of batches of images.

  • N - Number of images in a batch
  • H - Number of pixels in the vertical dimension
  • W - Number of pixels in the horizontal dimension
  • C - Channels
C, CHW, NCTensor memory layout. For example, the CHW value at index (c,h,w) is physically located at index (c * H + h) * W = w, for others by analogy.

Understanding Inference Engine Memory Primitives

Blobs

InferenceEngine::Blob is the main class intended for working with memory. This class lets you read and write memory and get information about the memory structure, among other tasks.

To create Blob objects with a specific layout, use constructors with InferenceEngine::TensorDesc.

InferenceEngige::TensorDesc tdesc(FP32, {1, 3, 227, 227}, InferenceEngine::Layout::NCHW);
InferenceEngine::Blob::Ptr blob = InferenceEngine::make_shared_blob<float>(tdesc);

Layouts

InferenceEngine::TensorDesc is a special class that provides layout format description.

This class allows to create planar layouts using the standard formats (like InferenceEngine::Layout::NCDHW, InferenceEngine::Layout::NCHW, InferenceEngine::Layout::NC, InferenceEngine::Layout::C and etc) and also non-planar layouts using InferenceEngine::BlockingDesc.

To create a complex layout, use InferenceEngine::BlockingDesc, which allows to define the blocked memory with offsets and strides.

Examples

  • Define a blob with dimensions, {N: 1, C: 25, H: 20, W: 20}, and format, NHWC:
    InferenceEngine::BlockingDesc({1, 20, 20, 25}, {0, 2, 3, 1}); // or
    InferenceEngine::BlockingDesc({1, 20, 20, 25}, InferenceEngine::Layout::NHWC);
  • If you have memory with real dimensions {N: 1, C: 25, H: 20, W: 20}, but with channels that are blocked by 8, define the memory with parameters:
    InferenceEngine::BlockingDesc({1, 4, 20, 20, 8}, {0, 1, 2, 3, 1})
  • Set strides and offsets if the layout contains them. If your blob layout is complex and you don't want to calculate the real offset to data, use InferenceEngine::TensorDesc::offset(size_t l) or InferenceEngine::TensorDesc::offset(SizeVector v).
    For example:
    InferenceEngine::BlockingDesc blk({1, 4, 20, 20, 8}, {0, 1, 2, 3, 1});
    InferenceEngine::TensorDesc tdesc(FP32, {1, 25, 20, 20}, blk);
    tdesc.offset(0); // = 0
    tdesc.offset(1); // = 8
    tdesc.offset({0, 0, 0, 2}); // = 16
    tdesc.offset({0, 1, 0, 2}); // = 17
  • If you want to create a TensorDesc with a planar format and for N dimensions (N can be different: for example, 1, 2, 4), you can use the InferenceEngine::TensorDesc::getLayoutByDims method:
        InferenceEngine::TensorDesc::getLayoutByDims({1}); // InferenceEngine::Layout::C
        InferenceEngine::TensorDesc::getLayoutByDims({1, 2}); // InferenceEngine::Layout::NC
        InferenceEngine::TensorDesc::getLayoutByDims({1, 2, 3, 4}); // InferenceEngine::Layout::NCHW
        InferenceEngine::TensorDesc::getLayoutByDims({1, 2, 3}); // InferenceEngine::Layout::BLOCKED
        InferenceEngine::TensorDesc::getLayoutByDims({1, 2, 3, 4, 5}); // InferenceEngine::Layout::NCDHW
        InferenceEngine::TensorDesc::getLayoutByDims({1, 2, 3, 4, 5, ...}); // InferenceEngine::Layout::BLOCKED

Supported Devices

The Inference Engine can infer models in different formats with various input and output formats. This section provides supported and optimal configurations per device.

The Inference Engine provides unique capabilities to infer deep learning models on the following device types with corresponding plugins:

PluginDevice type
GPU pluginIntel® Processor Graphics, including Intel® HD Graphics and Intel® Iris® Graphics
CPU pluginIntel® Xeon® with Intel® AVX2 and AVX512, Intel® Core™ Processors with Intel® AVX2, Intel® Atom® Processors with Intel® SSE
FPGA pluginIntel® Arria® 10 GX FPGA Development Kit, Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA, Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA
MYRIAD pluginIntel® Movidius™ Neural Compute Stick powered by the Intel® Movidius™ Myriad™ 2 VPU,
Intel® Movidius™ Neural Compute Stick 2 powered by the Intel® Movidius™ Myriad™ X VPU
HDDL pluginIntel® Vision Accelerator Design with Intel® Movidius™ VPUs
GNA pluginIntel® Speech Enabling Developer Kit, Amazon Alexa* Premium Far-Field Developer Kit, Intel® Pentium® Silver processor J5005, Intel® Celeron® processor J4005, Intel® Core™ i3-8121U processor
HETERO pluginEnables computing for inference on one network on several Intel® devices

Supported Configurations

The Inference Engine can inference models in different formats with various input and output formats. This chapter provides supported and optimal configurations for each plugin.

NOTE: VPU plugins include MYRIAD and HDDL plugins.

Supported Model Formats
PluginFP32FP16
CPU pluginSupported and preferredNot supported
GPU pluginSupportedSupported and preferred
FPGA pluginSupportedSupported
VPU pluginsNot supportedSupported
GNA pluginSupportedNot supported

 

Supported Input Precision

PluginFP32FP16U8U16I8I16
CPU pluginSupportedNot supportedSupportedSupportedNot supportedSupported
GPU pluginSupportedSupported* Supported*Supported*Not supportedSupported*
FPGA pluginSupportedSupported*SupportedSupportedNot supportedSupported
VPU pluginsSupportedSupportedSupportedNot supportedNot supportedNot supported
GNA pluginSupportedNot SupportedNot supportedNot supportedSupportedSupported

* - Supported through SetBLob only. GetBlob returns FP32. Supported without mean image.

Supported Output Precision

PluginFP32FP16
CPU pluginSupportedNot supported
GPU pluginSupportedSupported
FPGA pluginSupportedSupported
VPU pluginsSupportedSupported
GNA pluginSupportedNot supported

 

Supported Input Layout

PluginNCDHWNCHWNHWCNC
CPU pluginSupportedSupportedSupportedSupported
GPU pluginNot supportedSupportedSupportedSupported
FPGA pluginNot supportedSupportedSupportedNot supported
VPU pluginsNot supportedSupportedSupportedSupported
GNA pluginNot supportedNot supportedNot supportedSupported
Supported Output Layout
Number of dimensions54321
LayoutNCDHWNCHWCHWNCC

For setting relevant configuration, refer to the Integrate the Inference Engine API with Your Application section (step 3 "Configure input and output").

Supported Layers 

The following layers are supported by the plugins and by the Shape Inference feature:

LayersGPUCPUVPUGNAFPGAShapeInfer
Activation-ClampSupportedSupported***SupportedSupportedSupportedSupported
Activation-ELUSupportedSupported***SupportedNot SupportedSupportedSupported
Activation-Leaky ReLUSupportedSupported***SupportedSupportedSupportedSupported
Activation-PReLUSupportedSupported***SupportedNot SupportedSupportedSupported
Activation-ReLUSupportedSupported***SupportedSupportedSupportedSupported
Activation-ReLU6SupportedSupported***SupportedNot SupportedNot SupportedSupported
Activation-Sigmoid/LogisticSupportedSupported***SupportedSupportedNot SupportedSupported
Activation-TanHSupportedSupported***SupportedSupportedNot SupportedSupported
ArgMaxSupportedSupported**Not SupportedNot SupportedNot SupportedSupported
BatchNormalizationSupportedSupportedSupportedNot SupportedSupported*Supported
ConcatSupportedSupported***SupportedSupportedSupportedSupported
ConstSupportedSupportedSupportedNot SupportedNot SupportedNot Supported
Convolution-DilatedSupportedSupportedSupportedNot SupportedNot SupportedSupported
Convolution-Dilated 3DNot SupportedSupportedNot SupportedNot SupportedNot SupportedNot Supported
Convolution-GroupedSupportedSupportedSupportedNot SupportedSupportedSupported
Convolution-Grouped 3DNot SupportedSupportedNot SupportedNot SupportedNot SupportedNot Supported
Convolution-OrdinarySupportedSupportedSupportedSupported*SupportedSupported
Convolution-Ordinary 3DNot SupportedSupportedNot SupportedNot SupportedNot SupportedNot Supported
CropSupportedSupportedSupportedSupportedNot SupportedSupported
CTCGreedyDecoderSupported**Supported**Supported*Not SupportedNot SupportedSupported
DeconvolutionSupportedSupportedSupportedNot SupportedSupported*Supported
Deconvolution 3DNot SupportedSupportedNot SupportedNot SupportedNot SupportedNot Supported
DetectionOutputSupportedSupported**Supported*Not SupportedNot SupportedSupported
Eltwise-MaxSupportedSupported***SupportedNot SupportedNot SupportedSupported
Eltwise-MulSupportedSupported***SupportedSupportedNot SupportedSupported
Eltwise-SumSupportedSupported***SupportedSupportedSupportedSupported
FlattenSupportedSupportedSupportedNot SupportedNot SupportedSupported
FullyConnected (Inner Product)SupportedSupported***SupportedSupportedSupportedSupported
GatherNot SupportedSupported**Not SupportedNot SupportedNot SupportedSupported
GemmNot SupportedSupportedNot SupportedNot SupportedNot SupportedSupported
GRNSupported**Supported**SupportedNot SupportedNot SupportedSupported
InterpSupported**Supported**SupportedNot SupportedNot SupportedSupported*
LRN (Norm)SupportedSupportedSupported*Not SupportedSupportedSupported
LSTMCellNot SupportedSupportedNot SupportedNot SupportedNot SupportedNot Supported
MemoryNot SupportedSupportedNot SupportedSupportedNot SupportedSupported
MVNSupportedSupported**Supported*Not SupportedNot SupportedSupported
NormalizeSupportedSupported**Supported*Not SupportedNot SupportedSupported
PadSupportedSupported**Supported*Not SupportedNot SupportedSupported
PermuteSupportedSupportedSupportedSupportedNot SupportedSupported
Pooling(AVG,MAX)SupportedSupportedSupportedSupportedSupportedSupported
Pooling(AVG,MAX) 3DNot SupportedSupportedNot SupportedNot SupportedNot SupportedNot Supported
PowerSupportedSupported**SupportedSupportedSupported*Supported
PriorBoxSupportedSupported**SupportedNot SupportedNot SupportedSupported
PriorBoxClusteredSupported**Supported**SupportedNot SupportedNot SupportedSupported
ProposalSupportedSupported**SupportedNot SupportedNot SupportedSupported
PSROIPoolingSupportedSupported**SupportedNot SupportedNot SupportedSupported
RegionYoloSupportedSupported**SupportedNot SupportedNot SupportedSupported
ReorgYoloSupportedSupported**SupportedNot SupportedNot SupportedSupported
ResampleSupportedSupported**SupportedNot SupportedNot SupportedSupported
ReshapeSupportedSupported***SupportedSupportedNot SupportedSupported*
RNNNot SupportedSupportedNot SupportedNot SupportedNot SupportedNot Supported
ROIPoolingSupported*SupportedSupportedSupportedNot SupportedSupported
ScaleShiftSupportedSupported***Supported*SupportedSupportedSupported
SimplerNMSSupportedSupported**Not SupportedNot SupportedNot SupportedSupported
SliceSupportedSupported***SupportedSupportedSupported*Supported
SoftMaxSupportedSupported***SupportedNot SupportedNot SupportedSupported
SpatialTransformerNot SupportedSupported**Not SupportedNot SupportedNot SupportedSupported
SplitSupportedSupported***SupportedSupportedSupported*Supported
TensorIteratorNot SupportedSupportedNot SupportedNot SupportedNot SupportedNot Supported
TileSupported**Supported***SupportedNot SupportedNot SupportedSupported
UnpoolingSupportedNot SupportedNot SupportedNot SupportedNot SupportedNot Supported
UpsamplingSupportedNot SupportedNot SupportedNot SupportedNot SupportedNot Supported

*- support is limited to the specific parameters. Refer to Known Layers Limitation section for each device from the list of supported devices.

**- support is implemented via custom kernels mechanism.

***- supports NCDHW layout.


CPU Plugin 

The CPU plugin provides an opportunity for high-performance scoring of neural networks on the Intel® CPU devices using the Intel® Math Kernel Library for Deep Neural Networks (Intel® MKL-DNN).

The CPU plugin uses OpenMP* to parallelize calculations.

For the information about the layers supported by CPU plugin, refer to the corresponding table in the Supported Devices section. You can expand the set of supported layers with the extensibility library. To add a new layer in this library, use the extensibility mechanism.

Supported Platforms

The Intel® Distribution of OpenVINO™ toolkit is supported and validated on these platforms:

Host64-bit OS
Development
  • Ubuntu* 16.04
  • CentOS 7.4/MS
  • Windows* 10
Target
  • Ubuntu* 16.04
  • CentOS 7.4/MS
  • Windows* 10

The CPU plugin supports inference on Intel® Xeon® with Intel® AVX2 and AVX512, Intel® Core™ Processors with Intel® AVX2, Intel Atom® Processors with Intel® SSE.

Use the -pc flag for samples to learn which configuration is used by a layer. -pc shows execution statistics to use for information about the layer name, execution status, layer type, execution time, and the type of the execution primitive.

Internal CPU Plugin Optimizations

The CPU Plugin supports several graph optimization algorithms:

  • Merging of group convolutions. If topology contains the next pipeline. The CPU Plugin merges it into the one Convolution with the group parameter (Convolutions should have the same parameters).
    Merging of group convolution
  • Fusing Convolution with ReLU or ELU. CPU plugin is fusing all Convolution with ReLU or ELU layers if these layers are located after the Convolution layer.
  • Removing the power layer. CPU plugin removes Power layer from topology if it has the following parameters: power = 1, scale = 1, offset = 0.
  • Fusing Convolution + Sum or Convolution + Sum + ReLu. To improve performance, the CPU plugin fuses the following structure:
     Fusing Convolution + Sum or Convolution + Sum + Relu
    This fuse allows you to upgrade the graph to the following structure: 
    Upgrade the graph optimization algorithm graph

Supported Configuration Parameters

The plugin supports the configuration parameters listed below. All parameters must be set before calling InferenceEngine::IInferencePlugin::LoadNetwork(). For usage examples, refer to Benchmark Application Demo.

CPU plugin supports general parameters, which are also supported by other plugins:

Parameter NameParameter ValuesDefaultDescription
KEY_EXCLUSIVE_ASYNC_REQUESTSYES/NONOForces async requests (also from different executable networks) to execute serially. This prevents potential oversubscription.
KEY_PERF_COUNTYES/NONOEnables gathering performance counters.

It also supports CPU-specific parameters:

Parameter NameParameter ValuesDefaultDescription
KEY_CPU_THREADS_NUMpositive integer values0Specifies number of threads that CPU plugin should use for inference. Zero (default) means using all (logical) cores.
KEY_CPU_BIND_THREADYES/NOYESBinds inference worker threads to CPU cores. The binding is usually performance friendly, especially in server scenarios. The option also limits number of OpenMP* or Intel® TBB threads to the number of hardware cores.
KEY_CPU_THROUGHPUT_STREAMSKEY_CPU_THROUGHPUT_NUMA, KEY_CPU_THROUGHPUT_AUTO, or positive integer values1Specifies number of streams. Upper bound for a number of inference requests that can be executed simulteneously. All available CPU cores are evenly distributed between the streams. The default value is 1, which implies latency-oriented behaviour with all available cores processing requests one by one.
KEY_CPU_THROUGHPUT_NUMA creates as many streams as needed to accomodate NUMA and avoid associated penalties.
KEY_CPU_THROUGHPUT_AUTO creates bare minimum of streams to improve the performance; this is the most portable option if you do not know how many cores your target machine has (and what would be the optimal number of streams).
A positive integer value creates the requested number of streams.
CPU Extensions

The CPU extensions library contains code of important layers that do not come with the CPU plugin. You should compile this library and use the AddExtension method in your application to load the extensions when for models featuring layers from this library. See other samples for AddExtension code examples.

When you compile the entire list of the samples, the cpu_extension library is also compiled.

For performance, the library's cmake script detects your computer configuration and enables platform optimizations. Alternatively, you can explicitly use cmake flags: -DENABLE_AVX2=ON, -DENABLE_AVX512F=ON or -DENABLE_SSE42=ON when cross-compiling this library for another platform.

List of layers that come in the library:

  • ArgMax
  • CTCGreedyDecoder
  • DetectionOutput
  • GRN
  • Interp
  • MVN
  • Normalize
  • PowerFile
  • PReLU
  • PriorBox
  • PriorBoxClustered
  • Proposal
  • PSROIPooling
  • Resample
  • SimplerNMS
  • SpatialTransformer

Use the extensibility mechanism to add a layer. For information, see Adding Your Own Kernels in the Inference Engine.


GPU Plugin 

The GPU plugin uses the Intel® Compute Library for Deep Neural Networks to infer deep neural networks. This is an open source performance library for Deep Learning applications intended for acceleration of deep learning inference on Intel® Processor Graphics, including Intel® HD Graphics and Intel® Iris® Graphics.

Supported Layers

For the information about the layers supported by GPU plugin, refer to the corresponding table in the Supported Devices section.

Supported Optimizations

The plugin supports the following optimizations:

  • Fused layers:
    • Convolution - Activation
    • Deconvolution - Activation
    • Eltwise - Activation
    • Fully Connected - Activation
  • Layers optimized out when conditions allow:
    • Crop
    • Concatenate
    • Reshape
    • Flatten
    • Split
    • Copy
  • Layers executed during load time (not during inference):
    • PriorBox

CPU Executed Layers

The CPU plugin does not accelerate the following layers. They are executed on the host CPU instead.

  • Proposal
  • SimplerNMS
  • PriorBox
  • DetectionOutput

Known Layers Limitations

  • ROIPooling is supported for max value of the method attribute.

Supported Configuration Parameters

The plugin supports the configuration parameters listed below. All parameters must be set before calling InferenceEngine::IInferencePlugin::LoadNetwork().

Parameter NameParameter ValueDefault ValueDescription
KEY_PERF_COUNTYES / NONOCollect performance counters during inference
KEY_CONFIG_FILE"file1 [file2 ...]"""Load custom layer configuration files
KEY_DUMP_KERNELSYES / NONODump the final kernels used for custom layers
KEY_TUNING_MODE

TUNING_DISABLED

TUNING_CREATE

TUNING_USE_EXISTING

TUNING_DISABLED

Disable inference kernel tuning

Create tuning file (expect much longer runtime)

Use an existing tuning layer

KEY_TUNING_FILE"<filename>"""Tuning file to create / use
KEY_PLUGIN_PRIORITY<0-3>0OpenCL queue priority
KEY_PLUGIN_THROTTLE<0-3>0OpenCL queue throttling
KEY_CLDNN_GRAPH_DUMPS_DIR"<dump_dir>"""clDNN graph optimizer stages dump output directory (in GraphViz format)
KEY_CLDNN_SOURCES_DUMPS_DIR"<dump_dir>"""Final optimized clDNN OpenCL sources dump output directory

Debug Capabilities in the GPU Plugin

Inference Engine GPU plugin provides the possibility to dump the user custom OpenCL™ kernels to a file to allow you to  debug compilation issues in your custom kernels.

The application can use the SetConfig() function with the key PluginConfigParams::KEY_DUMP_KERNELS and value: PluginConfigParams::YES. Then during network loading, all custom layers print their OpenCL kernels with the JIT instrumentation added by the plugin. The kernels are stored in the working directory under files named in the format: clDNN_program0.cl, clDNN_program1.cl

The Debug option is disabled by default. Additionally, the application can call the SetConfig() function with the PluginConfigParams::KEY_DUMP_KERNELS key and value: PluginConfigParams::NO before network loading.

To verify that Debug option is disabled:

  1. Delete all clDNN_program*.cl files from the current directory
  2. Run your application to load a network
  3. Examine the working directory for the presence of any kernel file, such as clDNN_program0.cl

    FPGA Plugin 

    The FPGA plugin is developed for high performance scoring of neural networks on Intel® FPGA devices.

    NOTE: Before using the FPGA plugin, ensure that you have installed and configured either the Intel® Arria® 10 GX FPGA Development Kit, Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA with instructions from  Installing the Intel® Distribution of OpenVINO™ toolkit for Linux* with FPGA Support.

    Supported Platforms

    The Intel® Distribution of OpenVINO™ toolkit is officially supported and validated on the following FPGA setup:

    HostOS (64-bit)Platform
    Development
    • Ubuntu* 16.04
    • CentOS* 7.4
    6th-8th Generation Intel® Core™ Processors, Intel® Xeon® v5 family, Xeon® v6 family
    Target
    • Ubuntu* 16.04
    • CentOS* 7.4
    Intel® Arria® 10 GX FPGA Development Kit, Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA
    Intel® Vision Accelerator Design with an Intel® Arria 10 FPGA (preview)
    Supported Layers

    For the full list of supported layers, refer to the Supported Layers section.

    Heterogeneous Execution

    If topology contains layers that are not supported by the FPGA plugin, use the HETERO plugin with a dedicated fallback device.

    If a network has layers that are not supported in the FPGA plugin or in a fallback plugin, you can implement a custom layer on the CPU/GPU and use the extensibility mechanism described in the Adding Your Own Kernels in the Inference Engine section.
    In addition to implementing custom kernels, you must point to the CPU plugin or the GPU plugin as fallback devices for the heterogeneous plugin.

    Supported Networks

    The following network topologies are supported in heterogeneous mode, running on FPGA with fallback to CPU or GPU devices.

    IMPORTANT: Use only bitstreams from the current version of the Intel® Distribution of OpenVINO™ toolkit. Bitstreams from older versions of the Intel® Distribution of OpenVINO™ toolkit are incompatible with later versions of the Toolkit. For example, you cannot use the 1-0-1_A10DK_FP16_Generic bitstream, when the Intel Distribution of OpenVINO toolkit supports the 5-0_A10DK_FP16_Generic bitstream.

    NetworkBitstreams for Intel® Arria® 10 GX FPGA Development KitBitstreams for Intel® PAC with Intel® Arria® 10 GX FPGABitstreams for Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA 
    AlexNet5-0_A10DK_FP16_AlexNet_GoogleNet,
    5-0_A10DK_FP11_AlexNet_GoogleNet
    5-0_RC_FP16_AlexNet_GoogleNet_Generic_VGG,
    5-0_RC_FP11_AlexNet
    5-0_PL1_FP16_AlexNet_GoogleNet_SqueezeNet,
    5-0_PL1_FP11_AlexNet_GoogleNet
    GoogleNet v15-0_A10DK_FP16_AlexNet_GoogleNet,
    5-0_A10DK_FP11_AlexNet_GoogleNet
    5-0_RC_FP16_AlexNet_GoogleNet_Generic_VGG,
    5-0_RC_FP11_GoogleNet
    5-0_PL1_FP16_AlexNet_GoogleNet_SqueezeNet,
    5-0_PL1_FP11_AlexNet_GoogleNet
    VGG-165-0_A10DK_FP16_SqueezeNet_VGG,
    5-0_A10DK_FP11_VGG
    5-0_RC_FP16_AlexNet_GoogleNet_Generic_VGG,
    5-0_RC_FP11_MobileNet_ResNet_VGG_Clamp
    5-0_PL1_FP16_VGG_Generic,
    5-0_PL1_FP11_VGG
    VGG-195-0_A10DK_FP16_SqueezeNet_VGG,
    5-0_A10DK_FP11_VGG
    5-0_RC_FP16_AlexNet_GoogleNet_Generic_VGG,
    5-0_RC_FP11_MobileNet_ResNet_VGG_Clamp
    5-0_PL1_FP16_VGG_Generic,
    5-0_PL1_FP11_VGG
    SqueezeNet v 1.05-0_A10DK_FP16_SqueezeNet_VGG,
    5-0_A10DK_FP11_SqueezeNet
    5-0_RC_FP16_SqueezeNet,
    5-0_RC_FP11_SqueezeNet
    5-0_PL1_FP16_AlexNet_GoogleNet_SqueezeNet,
    5-0_PL1_FP11_SqueezeNet
    SqueezeNet v 1.15-0_A10DK_FP16_SqueezeNet_VGG,
    5-0_A10DK_FP11_SqueezeNet
    5-0_RC_FP16_SqueezeNet,
    5-0_RC_FP11_SqueezeNet
    5-0_PL1_FP16_AlexNet_GoogleNet_SqueezeNet,
    5-0_PL1_FP11_SqueezeNet
    ResNet-185-0_A10DK_FP16_ResNet_TinyYolo,
    5-0_A10DK_FP11_ResNet
    5-0_RC_FP16_ResNet_ELU,
    5-0_RC_FP11_MobileNet_ResNet_VGG_Clamp
    5-0_PL1_FP16_ResNet_TinyYolo_ELU,
    5-0_PL1_FP11_ResNet
    ResNet-505-0_A10DK_FP16_ResNet_TinyYolo,
    5-0_A10DK_FP11_ResNet
    5-0_RC_FP16_ResNet_ELU,
    5-0_RC_FP11_MobileNet_ResNet_VGG_Clamp
    5-0_PL1_FP16_ResNet_TinyYolo_ELU,
    5-0_PL1_FP11_ResNet
    ResNet-1015-0_A10DK_FP16_ResNet_TinyYolo,
    5-0_A10DK_FP11_ResNet
    5-0_RC_FP16_ResNet_ELU,
    5-0_RC_FP11_MobileNet_ResNet_VGG_Clamp
    5-0_PL1_FP16_ResNet_TinyYolo_ELU,
    5-0_PL1_FP11_ResNet
    ResNet-1525-0_A10DK_FP16_ResNet_TinyYolo,
    5-0_A10DK_FP11_ResNet
    5-0_RC_FP16_ResNet_ELU,
    5-0_RC_FP11_MobileNet_ResNet_VGG_Clamp
    5-0_PL1_FP16_ResNet_TinyYolo_ELU,
    5-0_PL1_FP11_ResNet
    MobileNet (Caffe*)5-0_A10DK_FP16_MobileNet_Clamp,
    5-0_A10DK_FP11_MobileNet_Clamp
    5-0_RC_FP16_MobileNet_Clamp,
    5-0_RC_FP11_MobileNet_ResNet_VGG_Clamp
    5-0_PL1_FP16_MobileNet_Clamp,
    5-0_PL1_FP11_MobileNet_Clamp
    MobileNet (TensorFlow*)5-0_A10DK_FP16_MobileNet_Clamp,
    5-0_A10DK_FP11_MobileNet_Clamp
    5-0_RC_FP16_MobileNet_Clamp,
    5-0_RC_FP11_MobileNet_ResNet_VGG_Clamp
    5-0_PL1_FP16_MobileNet_Clamp,
    5-0_PL1_FP11_MobileNet_Clamp
    SqueezeNet-based variant of the SSD*5-0_A10DK_FP16_SqueezeNet_VGG,
    5-0_A10DK_FP11_SqueezeNet
    5-0_RC_FP16_SqueezeNet,
    5-0_RC_FP11_SqueezeNet
    5-0_PL1_FP16_AlexNet_GoogleNet_SqueezeNet,
    5-0_PL1_FP11_SqueezeNet
    GoogLeNet-based variant of SSD5-0_A10DK_FP16_AlexNet_GoogleNet,
    5-0_A10DK_FP11_AlexNet_GoogleNet
    5-0_RC_FP16_AlexNet_GoogleNet_Generic_VGG,
    5-0_RC_FP11_GoogleNet
    5-0_PL1_FP16_AlexNet_GoogleNet_SqueezeNet,
    5-0_PL1_FP11_AlexNet_GoogleNet
    ResNet-based variant of SSD5-0_A10DK_FP16_ResNet_TinyYolo,
    5-0_A10DK_FP11_ResNet
    5-0_RC_FP16_ResNet_ELU,
    5-0_RC_FP11_MobileNet_ResNet_VGG_Clamp
    5-0_PL1_FP16_ResNet_TinyYolo_ELU,
    5-0_PL1_FP11_ResNet

    In addition to the topologies listed above, any topology that consists of layers supported by the FPGA plugin are recommended to be executed on the FPGA.

    Choosing a Proper Pre-compiled FPGA Bitstream File 

    Various pre-compiled bitstream samples for the Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA and the Intel® Arria® 10 GX FPGA Development Kit are available with the Intel® Distribution of OpenVINO™ toolkit installation:

    NameBoardPrecisionLRN SupportLeaky ReLU SupportPReLU SupportClamp SupportELU Support
    5-0_A10DK_FP11_AlexNet_GoogleNetIntel® Arria® 10 GX FPGA Development KitFP11YesYesYesNoNo
    5-0_A10DK_FP11_ELUIntel® Arria® 10 GX FPGA Development KitFP11NoYesYesNoYes
    5-0_A10DK_FP11_GenericIntel® Arria® 10 GX FPGA Development KitFP11YesYesYesNoNo
    5-0_A10DK_FP11_MobileNet_ClampIntel® Arria® 10 GX FPGA Development KitFP11NoYesYesYesNo
    5-0_A10DK_FP11_ResNetIntel® Arria® 10 GX FPGA Development KitFP11NoYesYesNoNo
    5-0_A10DK_FP11_ResNet18Intel® Arria® 10 GX FPGA Development KitFP11NoYesYesNoNo
    5-0_A10DK_FP11_RMNetIntel® Arria® 10 GX FPGA Development KitFP11NoYesYesNoYes
    5-0_A10DK_FP11_SqueezeNetIntel® Arria® 10 GX FPGA Development KitFP11NoYesYesNoNo
    5-0_A10DK_FP11_TinyYolo_SSD300Intel® Arria® 10 GX FPGA Development KitFP11NoYesYesNoNo
    5-0_A10DK_FP11_VGGIntel® Arria® 10 GX FPGA Development KitFP11NoYesYesNoNo
    5-0_A10DK_FP16_AlexNet_GoogleNetIntel® Arria® 10 GX FPGA Development KitFP16YesYesYesNoNo
    5-0_A10DK_FP16_ELUIntel® Arria® 10 GX FPGA Development KitFP16YesYesYesNoYes
    5-0_A10DK_FP16_GenericIntel® Arria® 10 GX FPGA Development KitFP16NoYesYesNoNo
    5-0_A10DK_FP16_MobileNet_ClampIntel® Arria® 10 GX FPGA Development KitFP16NoYesYesYesNo
    5-0_A10DK_FP16_ResNet_TinyYoloIntel® Arria® 10 GX FPGA Development KitFP16NoYesYesNoNo
    5-0_A10DK_FP16_RMNetIntel® Arria® 10 GX FPGA Development KitFP16NoYesYesNoYes
    5-0_A10DK_FP16_SqueezeNet_VGGIntel® Arria® 10 GX FPGA Development KitFP16NoYesYesNoNo
    5-0_A10DK_FP16_SSD300Intel® Arria® 10 GX FPGA Development KitFP16NoYesYesNoNo
    5-0_PL1_FP11_AlexNet_GoogleNetIntel® Vision Accelerator Design with an Intel® Arria® 10 FPGAFP11YesYesYesNoNo
    5-0_PL1_FP11_ELUIntel® Vision Accelerator Design with an Intel® Arria® 10 FPGAFP11NoYesYesNoYes
    5-0_PL1_FP11_GenericIntel® Vision Accelerator Design with an Intel® Arria® 10 FPGAFP11YesYesYesNoNo
    5-0_PL1_FP11_MobileNet_ClampIntel® Vision Accelerator Design with an Intel® Arria® 10 FPGAFP11NoYesYesYesNo
    5-0_PL1_FP11_ResNetIntel® Vision Accelerator Design with an Intel® Arria® 10 FPGAFP11NoYesYesNoNo
    5-0_PL1_FP11_RMNetIntel® Vision Accelerator Design with an Intel® Arria® 10 FPGAFP11NoYesYesNoYes
    5-0_PL1_FP11_SqueezeNetIntel® Vision Accelerator Design with an Intel® Arria® 10 FPGAFP11NoYesYesNoNo
    5-0_PL1_FP11_TinyYolo_SSD300Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGAFP11NoYesYesNoNo
    5-0_PL1_FP11_VGGIntel® Vision Accelerator Design with an Intel® Arria® 10 FPGAFP11NoYesYesNoNo
    5-0_PL1_FP16_AlexNet_GoogleNet_SqueezeNetIntel® Vision Accelerator Design with an Intel® Arria® 10 FPGAFP16YesYesYesNoNo
    5-0_PL1_FP16_MobileNet_ClampIntel® Vision Accelerator Design with an Intel® Arria® 10 FPGAFP16NoYesYesYesNo
    5-0_PL1_FP16_ResNet_TinyYolo_ELUIntel® Vision Accelerator Design with an Intel® Arria® 10 FPGAFP16NoYesYesNoYes
    5-0_PL1_FP16_RMNetIntel® Vision Accelerator Design with an Intel® Arria® 10 FPGAFP16NoYesYesNoYes
    5-0_PL1_FP16_SSD300Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGAFP16NoYesYesNoNo
    5-0_PL1_FP16_VGG_GenericIntel® Vision Accelerator Design with an Intel® Arria® 10 FPGAFP16YesYesYesNoNo
    5-0_RC_FP11_AlexNetIntel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGAFP11YesYesYesNoNo
    5-0_RC_FP11_ELUIntel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGAFP11NoYesYesNoYes
    5-0_RC_FP11_GenericIntel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGAFP11YesYesYesNoNo
    5-0_RC_FP11_GoogleNetIntel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGAFP11YesYesYesNoNo
    5-0_RC_FP11_MobileNet_ResNet_VGG_ClampIntel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGAFP11NoYesYesYesNo
    5-0_RC_FP11_RMNetIntel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGAFP11NoYesYesNoYes
    5-0_RC_FP11_SqueezeNetIntel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGAFP11NoYesYesNoNo
    5-0_RC_FP11_TinyYolo_SSD300Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGAFP11NoYesYesNoNo
    5-0_RC_FP16_AlexNet_GoogleNet_Generic_VGGIntel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGAFP16YesYesYesNoNo
    5-0_RC_FP16_MobileNet_ClampIntel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGAFP16NoYesYesYesNo
    5-0_RC_FP16_ResNet_ELUIntel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGAFP16NoYesYesNoYes
    5-0_RC_FP16_RMNetIntel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGAFP16NoYesYesNoYes
    5-0_RC_FP16_SqueezeNetIntel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGAFP16NoYesYesNoNo
    5-0_RC_FP16_TinyYolo_SSD300Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGAFP16NoYesYesNoNo

    To learn about how to program the bitstream (.aocx file), see the programming instructions in the installation guide. 

    Multiple FPGA Devices Support

    The Inference Engine FPGA plugin provides an ability to load different networks on multiple FPGA devices. For example, to load two networks AlexNet and MobileNet v2 on two different FPGA devices, follow the steps below:

    1. Program each FGPA device with a corresponding bitstream:

      For more information about bitstream programming instructions, refer to the programming instructions in the installation guide.

      • First device:
        aocl program acl0 5-0_A10DK_FP16_AlexNet_GoogleNet.aocx
      • Second device:
        aocl program acl1 5-0_A10DK_FP16_MobileNet_Clamp.aocx
    2. All FPGA devices are enumerated with unique ID starting from 0. By default, all networks are loaded to the default device with ID 0. If you want to load a network on a particular non-default device, specify the KEY_DEVICE_ID parameter for C++ and DEVICE_ID parameter for Python*. The following code snippets demonstrate how to load the AlexNet network on the FPGA device with ID 0 and the MobileNet v2 network on the device with ID 1:
      • With C++:
        // Load AlexNet network on the second FPGA device programmed with bitstream supporting AlexNet
        CNNNetReader reader1;
        reader1.ReadNetwork("alexnet.xml");
        reader1.ReadWeights("alexnet.bin");
        
        CNNNetwork network1 = reader1.getNetwork();
        IExecutableNetwork::Ptr exeNetwork1;
        InferenceEngine::ResponseDesc response;
        
        StatusCode sts = plugin->LoadNetwork(exeNetwork1, network1, { { KEY_DEVICE_ID, "0" } }, &response);
        
        // Load MobileNet network on the second FPGA device programmed with MobileNet bitstream
        
        CNNNetReader reader2;
        reader2.ReadNetwork("mobilenet_v2.xml");
        reader2.ReadWeights("mobilenet_v2.bin");
        
        CNNNetwork network2 = reader2.getNetwork();
        IExecutableNetwork::Ptr exeNetwork2;
        
        sts = plugin->LoadNetwork(exeNetwork2, network2, { { KEY_DEVICE_ID, "1" } }, &response);
            
      • With Python:
                # Load AlexNet network on the first FPGA device programmed with bitstream supporting AlexNet
        net1 = IENetwork(model="alexnet.xml", weights="alexnet.bin")
        plugin.load(network=net1, config={"DEVICE_ID": "0"})
        
        # Load MobileNet network on the second FPGA device programmed with MobileNet bitstream
        net2 = IENetwork(model="mobilenet_v2.xml", weights="mobilenet_v2.bin")
        plugin.load(network=net2, config={"DEVICE_ID": "1"})
        

    NOTE: You have to use asynchronous infer requests to utilize several FPGA devices, otherwise the execution on devices is performed sequentially.

    How to Interpret Performance Counters

    As a result of collecting performance counters using InferenceEngine::IInferencePlugin::GetPerformanceCounts, you can find performance data about execution on FPGA, preprocessing and post-processing data, and data transferring from/to FPGA card.

    If network is divided into two parts that are executed on CPU, you can find performance data about Intel® MKL-DNN kernels, their types, and other useful information.

    FPGA Support Limitations for CNN

    There are certain limitations for the network topologies, kernel parameters and batch size.

    • Depending on the bitstream loaded on the target device, the FPGA device actually performs calculations with precision rates ranging from FP11 to FP16. This may have potential accuracy implications. Use the Validation Application to verify the network accuracy on validation data set.
    • If networks have many non-supported layers on the FPGA that stay in topologies between supported layers, it divides the graph into many subgraphs. This might cause CL_OUT_OF_HOST_MEMORY error. These topologies are not FPGA friendly for this release.
    • When you use the heterogeneous plugin, the affinity and distribution of nodes by devices depends on the FPGA bitstream that you use. Some layers might not be supported by a bitstream or parameters of the layer are not supported by the bitstream.
    • Any Fully-Connected layer can only be followed by another Fully-Connected (possibly with the ReLU) layer. No Convolution layer can follow a Fully-Connected layer, otherwise the graph verification fails and returns an error message.
    • Always consider batching for performance conclusions. Note that depending on the bitstream loaded on the FPGA, the batch size is typically limited to 96.
    • Multiple FPGA devices cannot be used from several CPU processes because of Intel® FPGA RTE for OpenCL™ implementation limitations. If a single process has initialized Intel® FPGA RTE for OpenCL™ libraries, other processes don't have an ability to access the FPGA devices.

    VPU Plugins

    This chapter provides information on the Inference Engine plugins that enable inferencing of deep learning models on the supported VPU devices:

    • Intel® Movidius™ Neural Compute Stick powered by the Intel® Movidius™ Myriad™ 2 supported by the MYRIAD Plugin
    • Intel® Neural Compute Stick 2 powered by the Intel® Movidius™ Myriad™ X supported by the MYRIAD Plugin
    • Intel® Vision Accelerator Design with Intel® Movidius™ VPUs supported by the HDDL Plugin

    For the list of layers supported by VPU plugins, refer to the corresponding table in the Supported Devices section.

    Known Layers Limitations
    • 'ScaleShift' layer is supported for zero value of broadcast attribute only.
    • 'Bias' works for inputs with equal dimensions.
    • 'CTCGreedyDecoder' works with ctc_merge_repeated attribute equal 1.
    • 'DetectionOutput' works with zero values of interpolate_orientation and num_orient_classes parameters only.
    • 'MVN' uses fixed value for eps parameters (1e-9).
    • 'LRN' is supported for region params equal across.
    • 'Normalize' uses fixed value for eps parameters (1e-9) and is supported for zero value of across_spatial only.
    VPU Common Configuration Parameters

    The VPU plugin supports the configuration parameters listed below. The parameters are passed as std::map<std::string, std::string> to InferenceEngine::InferencePlugin::LoadNetwork or InferenceEngine::InferencePlugin::SetConfig.

    Parameter NameParameter ValuesDefault ValueDescription
    KEY_VPU_HW_STAGES_OPTIMIZATIONYES/NOYESTurn on hardware stages usage (applicable for Intel Movidius Myriad X devices only)
    KEY_VPU_NETWORK_CONFIGVPU network configurationEmpty stringExtra configuration for network compilation and optimization
    KEY_VPU_COMPUTE_LAYOUTVPU_AUTO, VPU_NCHW, VPU_NHWCVPU_AUTOSpecify internal input and output layouts for network layers
    KEY_VPU_LOG_LEVELLOG_WARNING, LOG_INFO, LOG_DEBUGLOG_NONESet log level for devices
    KEY_VPU_PRINT_RECEIVE_TENSOR_TIMEYES/NONOAdd device-side time spent waiting for input to PerformanceCounts
    KEY_VPU_INPUT_NORMReal number1.0Deprecated*
    Normalization coefficient for the network input
    KEY_VPU_INPUT_BIASReal number0.0Deprecated*
    Bias value that is added to each element of the network input

    *Instead, use Model Optimizer options.

    VPU Network Configuration

    The VPU network configuration mechanism allows you to override VPU network compiler behavior and tune its optimizations. This mechanism is optional and by default the VPU network compile will use automatic heuristics for network optimizations. The KEY_VPU_NETWORK_CONFIG configuration parameter allows user to specify exact behavior for compiler.

    Terminology used for VPU network configuration:

    • VPU network compiler - entity that translates network IR into special representation that is used for execution on the device.
    • Compiler pass - compiler step, that performs some dedicated optimization.
    • Layer - network layer from original IR.
    • Data - object that is used as input or output of network layers.
    • Stage - execution entity for device. Network layers are mapped to stages, but there is no 1-to-1 correspondence between them, one layer might be represented as several stages, several layers might be merged into single stage, additional stages might be added to executable network.

    The KEY_VPU_NETWORK_CONFIG parameter is a list of key/value pairs separated by ,:

    <key>=<value>,<key>=<value>,<key>=<value>,...

    Supported <key> options:

    • file - <value> is path to XML file with configuration, the format of the file is described below.
    • data - <value> is a name of Data object, next options are applied to this Data:
      • scale - <value> is a SCALE factor. See the Data section.

    The VPU network compiler threats the configuration as a hard requirement and fails if it cannot satisfy it.

    Network Configuration File

    The KEY_VPU_NETWORK_CONFIG parameter allows to use separate file with network configuration. The file is in .xml format file and must have the following format:

    <?xml version="1.0" ?>
    <vpu_net_config version="1">
        [passes section]
        [data section]
        [layers section]
        [stages section]
    </vpu_net_config>

    The version attribute specifies the file format version (currently only 1 is supported). Configuration is divided onto sections for passes, data, layers and stages. Each section is optional.

    Passes Section

    The passes section allows to configure compiler passes. Example of such section:

    <passes>
        <pass name="passName1">
            <enable>true</enable>
        </pass>
        <pass name="passName2">
            <enable>false</enable>
        </pass>
    </passes>

    The enable property allows to turn on/off the specified pass.

    Available passes:

    • packPostOps - merges ReLU with Bias, makes them in-place, turned on by default.
    • eliminateReshapeStages - tries to eliminate reshape operations and make them in-place, turned on by default.
    • swapConcatAndPool - tries to replace Convolution->Concat->Pooling pattern with Convolution->Pooling->Concat turned on by default.
    • splitLargeConvolution - tries to split large Convolution onto tiles along output channels, turned on by default.
    • splitDepthConvolution - tries to split Depth Convolution onto tiles and replace them with HW analog, turned off by default.
    • eliminateCopyStages - tries to eliminate extra Copy stages, turned on by default.
    • tryHCWLayoutForHW - tries to apply HCW layout between HW stages, turned off by default.
    • injectSwOps - tries to merge HW stages with independent SW stages to execute them in parallel, turned on by default.

    Data Section

    The data section allows to configure properties for data objects. Example of such section:

    <data>
        <data name="input">
            <scale>64</scale>
        </data>
    </data>

    The data name corresponds to its producer layer from the original IR (the layer that declares this data as output). If the original layer has the only one output, the output data name will be equal to the layer name. If the original layer has more than one output, each output data will have the following name <layer name>.<port id>, where the <port id> corresponds to <port id="3"> XML node in the IR.

    The scale property allows you to apply SCALE factor to specified data object. The SCALE factor is used to increase the data range to avoid floating math errors on HW. The SCALE factor is propagating across the network until its end or until the layer that cannot propagate it.

    If the data section is missing in network configuration file, the network compiler will try to estimate such SCALE factor automatically based on layer's weights range. The manual configuration might be used in case if automatic one did not work or did not give desired accuracy.

    NOTE: It is better to use power-of-two values for SCALE factors.

    Layers Section

    The layers section allows to configure compiler behavior for layers optimization. Per-layer configuration is applied to all stages implementing selected layer. Example of such section:

    <layers>
        <layer name="conv1">
            <hw>
                [HW options]
            </hw>
        </layer>
    </layers>
    

    The layer name corresponds to the original IR.

    For now, layer configuration supports only HW section, which controls HW optimizations. The HW section make sense only for Convolution, Pooling, and FullyConnected layers.

    Layer HW Section

    The HW optimization configuration section consists of the following options:

    • enable turns on/off HW optimization of the selected layer.
    • depth_conv controls HW optimization of Depth Convolution.
    • tiling controls HW tiling behavior.
    • inputs and outputs control behavior for layer input and output data.
    • sw_injections controls behavior of HW and SW stages merge optimization.

    The enable option has the following syntax:

    <enable>true</enable>
    <enable>false</enable>

    By default, the HW optimization is turned on for all supported layers.

    The depth_conv configuration is effective only for Depth Convolution layers (input and output have same number of channels and group parameter is equal to that number) and only when splitDepthConvolution pass is enabled.

    The depth_conv option has the following syntax:

    <depth_conv>
        <split>NONE</split>
        <!--OR-->
        <split>SINGLE</split>
        <!--OR-->
        <split>COMBINED</split>
        <tile_size>`integer value > 0`</tile_size>
        <!--OR-->
        <num_tiles>`integer value > 0`</num_tiles>
    </depth_conv>

    The split parameter controls the split over channels optimization. It can accept the following modes:

    • NONE - no split over channels, the depth convolution will be executed as single HW convolution.
    • SINGLE - the depth convolution will be split over channels, each tile will be executed as single HW convolution.
    • COMBINED - the compiler will split the current depth convolution along with its predecessor convolution over channels.

    The tile_size and num_tiles parameters are optional and allows to manually specify desired tile size for the split. The tile_size specifies exact tile size, while num_tiles specifies the desired number of tiles. Only one of this parameters can be used at a time.

    Note: For now SINGLE split requires manual tile size/number configuration. COMBINED mode can select tile size automatically.

    The tiling allows for choosing tile size for HW Convolution and Pooling. Compiler can split the HW layer onto tile along all three axis (width, height, and channels).

    The tiling option has the following syntax:

    <tiling>
        <input_tile>
            <dims>
                <dim_w>FULL | AUTO | `integer value > 0`</dim_w>
                <dim_h>FULL | AUTO | `integer value > 0`</dim_h>
                <dim_c>FULL | AUTO | `integer value > 0`</dim_c>
            </dims>
            <!--OR-->
            <nums>
                <num_w>FULL | AUTO | `integer value > 0`</num_w>
                <num_h>FULL | AUTO | `integer value > 0`</num_h>
                <num_c>FULL | AUTO | `integer value > 0`</num_c>
            </nums>
        </input_tile>
        <!--OR-->
        <output_tile>
            <dims>
                <dim_w>FULL | AUTO | `integer value > 0`</dim_w>
                <dim_h>FULL | AUTO | `integer value > 0`</dim_h>
                <dim_c>FULL | AUTO | `integer value > 0`</dim_c>
            </dims>
            <!--OR-->
            <nums>
                <num_w>FULL | AUTO | `integer value > 0`</num_w>
                <num_h>FULL | AUTO | `integer value > 0`</num_h>
                <num_c>FULL | AUTO | `integer value > 0`</num_c>
            </nums>
        </output_tile>
    </tiling>

    You can specify either input tile or output tile. The compiler will update other tile accordingly. To choose tile size user need to specify either its exact size (dims) or the desired number of tiles (nums). Both dims and nums accepts special values:

    • FULL - tile size is equal to the input/output size on selected axis (in other words, no tiling for this axis).
    • AUTO - lets the compiler choose tile size for selected axis automatically.

    If some dimension is missing, AUTO mode is assumed.

    The inputs and outputs options controls the layout and location for HW layer inputs and output. They have the following syntax:

    <inputs>
        <input ind="0">
            <copy_child>true | false</copy_child>
            <location>AUTO | CMX | DDR</location>
            <layout>AUTO | HCW | CHW</layout>
        </input>
    </inputs>
    <outputs>
        <output ind="0">
            <copy_child>true | false</copy_child>
            <location>AUTO | CMX | DDR</location>
            <layout>AUTO | HCW | CHW</layout>
        </output>
    </outputs>

    You need to specify which input/output is configured:

    • <input ind="0"> - layer input
    • <input ind="1"> - layer weights (not applicable for Pooling)
    • <input ind="2"> - layer biases
    • <output ind="0"> - layer output

    Available configuration options:

    • copy_child forces the compiler to insert Copy stage for selected input/output before/after the current layer.
    • location sets the desired location of the input/output.
    • layout sets the desired layout of the input/output (make sense only for ind="0" input and output).

    The sw_injections option allows for disabling SW stages merge into current layer. The syntax of the sw_injections option:

    <sw_injections>
        <enable>false</enable>
    </sw_injections>

    Stages Section

    The stages section allows to configure compiler behavior for specific stage. Example of such section:

    <stages>
        <stage name="conv0@HW@soh=0/6+ReLU+Bias">
            <hw>
                [HW options]
            </hw>
        </stage>
    </stages>

    The stage name is created from its base layer from the original IR plus some meta information added by the compiler.

    NOTE: The meta information embedded into stage name is subject to change. It is better to use per-layer configuration instead. User can get the exact stages name from GetPerformanceCounts output.

    For now, stage configuration support only HW section, which controls HW optimizations. The HW section make sense only for Convolution, Pooling, and FullyConnected stages.

    The HW optimization configuration section consists of the following options:

    • inputs and outputs control behavior for stage input and output data.
    • sw_injections control behavior of HW and SW stages merge optimization.

    The inputs and outputs options have the same meaning as for per-layer configuration (see Layer HW section).

    The sw_injections option for stage has the following syntax:

    <sw_injections>
        <enable>true | false</enable>
        <injected_stages>
            <stage>`stage name`</stage>
            <stage>`stage name`</stage>
        </injected_stages>
    </sw_injections>

    It allows for disabling SW stages merge into current stage or to manually specify which SW stages should be merged into current HW stage.

    Example of Network Configuration File

    This is an example of a network configuration file:

    <?xml version="1.0" ?>
    <vpu_net_config version="1">
        <passes>
            <pass name="splitDepthConvolution">
                <enable>true</enable>
            </pass>
            <pass name="tryHCWLayoutForHW">
                <enable>true</enable>
            </pass>
        </passes>
        <data>
            <data name="input">
                <scale>128</scale>
            </data>
        </data>
        <layers>
            <layer name="conv5/dw">
                <hw>
                    <depth_conv>
                        <split>SINGLE</split>
                        <num_tiles>8</num_tiles>
                    </depth_conv>
                    <tiling>
                        <input_tile>
                            <dims>
                                <dim_w>FULL</dim_w>
                                <dim_h>38</dim_h>
                                <dim_c>FULL</dim_c>
                            </dims>
                        </input_tile>
                    </tiling>
                    <inputs>
                        <input ind="0">
                            <layout>HCW</layout>
                            <location>DDR</location>
                        </input>
                    </inputs>
                    <outputs>
                        <output ind="0">
                            <layout>CHW</layout>
                            <location>CMX</location>
                        </output>
                    </outputs>
                </hw>
            </layer>
            <layer name="conv5">
                <hw>
                    <tiling>
                        <input_tile>
                            <dims>
                                <dim_w>FULL</dim_w>
                                <dim_h>12</dim_h>
                                <dim_c>FULL</dim_c>
                            </dims>
                        </input_tile>
                    </tiling>
                    <inputs>
                        <input ind="0">
                            <layout>CHW</layout>
                            <location>CMX</location>
                        </input>
                    </inputs>
                    <outputs>
                        <output ind="0">
                            <force_copy>true</force_copy>
                            <layout>HCW</layout>
                            <location>CMX</location>
                        </output>
                    </outputs>
                </hw>
            </layer>
        </layers>
    </vpu_net_config>

    MYRIAD Plugin 

    The Inference Engine MYRIAD plugin is developed to infer deep learning models on the following VPU devices:

    • Intel® Movidius™ Neural Compute Stick powered by the Intel® Movidius™ Myriad™ 2
    • Intel® Neural Compute Stick 2 powered by the Intel® Movidius™ Myriad™ X

    For the Get Started page for Intel® Movidius™ Neural Compute Stick 2, refer to https://software.intel.com/en-us/neural-compute-stick/get-started.

    For the full list of supported layers, refer to the corresponding table of the Supported Layers section, column for VPU plugin.

    Hardware Setup   

    For installation instructions on Linux*, refer to the Installation Guide for Linux*.

    For installation instructions on Windows*, refer to the Installation Guide for Windows*.

    Supported Networks

    The Inference Engine MYRIAD plugin supports the following networks:

    Caffe*:

    • AlexNet
    • CaffeNet
    • GoogleNet (Inception) v1, v2, v4
    • VGG family (VGG16, VGG19)
    • SqueezeNet v1.0, v1.1
    • ResNet v1 family (18** ***, 50, 101, 152)
    • MobileNet
    • Inception ResNet v2
    • DenseNet family** (121,161,169,201)
    • SSD-300, SSD-512, SSD-MobileNet, SSD-GoogleNet, SSD-SqueezeNet

    TensorFlow*:

    • AlexNet
    • Inception v1, v2, v3, v4
    • Inception ResNet v2
    • MobileNet v1, v2
    • ResNet v1 family (50, 101, 152)
    • SqueezeNet v1.0, v1.1
    • VGG family (VGG16, VGG19)

    MXNet*:

    • AlexNet and CaffeNet
    • DenseNet family** (121,161,169,201)
    • SqueezeNet v1.1
    • MobileNet v1, v2
    • NiN
    • ResNet v1 (101, 152)
    • SqueezeNet v1.1
    • VGG family (VGG16, VGG19)
    • SSD-Inception-v3, SSD-MobileNet, SSD-ResNet-50, SSD-300

    ** Network is tested on Intel® Movidius™ Neural Compute Stick with BatchNormalization fusion optimization disabled during Model Optimizer import.

    *** Network is tested on Intel® Neural Compute Stick 2 with BatchNormalization fusion optimization disabled during Model Optimizer import

    Supported Configuration Parameters

    For the common configuration parameters, refer to VPU Plugins section.

    In addition to the common parameters, the MYRIAD plugin supports the following options:

    Parameter NameParameter ValuesDefaultDescription
    KEY_VPU_PLATFORMVPU_2450/VPU_2480-If set, the plugin will use a device with spicific platform to allocate a network.
    KEY_VPU_FORCE_RESETYES/NOYESReset stalled devices on plugin initialization, must be used with SetConfig method
    Device Allocation

    Each IExecutableNetwork instance tries to allocate a new device on InferenceEngine::InferencePlugin::LoadNetwork. If all devices are in use already, it will use the one with the minimal number of uploaded networks. The maximum number of networks that a single device can handle depends on the device memory capacity and the size of the networks.

    By default, the plugin resets all stalled devices on initialization for exclusive usage. This behavior can be changed by setting KEY_VPU_FORCE_RESET option to NO. This option must be passed on InferenceEngine::InferencePlugin::SetConfig before the first network is loaded into plugin. Several applications may run simultaneously using different devices, but each application must set KEY_VPU_FORCE_RESET option to NO. A single device cannot be shared across multiple processes.


    HDDL Plugin

    The Inference Engine HDDL plugin is developed for inference of neural networks on Intel® Vision Accelerator Design with Intel® Movidius™ VPUs, which is designed for use cases those require large throughput of deep learning inference. It provides dozens amount of throughput as MYRIAD Plugin.

    Hardware Setup   

    For installation instructions on Linux* OS, refer to the Installation Guide for Linux*.

    For installation instructions on Windows* OS, refer to the Installation Guide for Windows*.

    Supported networks

    For the list of supported netwroks, refer to the MYRIAD Plugin section.

    Supported Configuration Parameters

    For the common configuration parameters, refer to VPU Plugins section.

    In addition to the common parameters, the HDDL plugin supports the following options:

    Parameter NameParameter ValuesDefaultDescription
    KEY_PERF_COUNTYES/NONOEnable performance counter option
    KEY_VPU_HDDL_GRAPH_TAGstringempty stringAllows to execute network on specified count of devices
    KEY_VPU_HDDL_STREAM_IDstringempty stringAllows to execute networks with the same STREAM_ID on the same device

    Heterogeneous Plugin

    The Heterogeneous plugin enables computing for inference on one network on several devices. Purposes to execute networks in Heterogeneous mode:

    • To utilize accelerators power and calculate heaviest parts of network on accelerator and execute not supported layers on fallback devices like CPU
    • To utilize all available hardware more efficiently during one inference

    The execution through the Heterogeneous plugin can be divided into two steps:

    • Setting of affinity to layers (binding them to devices in InferenceEngine::ICNNNetwork)
    • Loading the network to the Heterogeneous plugin, splitting the network into parts and their execution through dedicated plugin.

    These steps are decoupled. The setting of affinity can be done automatically using fallback policy or in manual mode.

    The fallback automatic policy means greedy behavior and assigns all layers which can be executed on certain device on that device follow priorities.

    Some topologies are not friendly or cannot be executed in heterogeneous execution on some devices. These networks might be have activation layers that are't supported on the primary device. If transmitting data from one part of the network to another in heterogeneous mode is time-consuming, then it does not make sense to execute these data in heterogeneous mode on these devices. Instead, define the heaviest part manually and set affinity to avoid sending data back and forth several times in one inference.

    Annotation of Layers per Device and Default Fallback Policy

    Default fallback policy decides which layer goes to which device automatically according to the support in dedicated plugins (FPGA, GPU, CPU).

    Another way to annotate a network is setting affinity manually using CNNLayer::affinity field. This field accepts string values of devices like "CPU" or "FPGA".

    The fallback policy does not work if even one layer has initialized affinity. The sequence should be calling of automating affinity settings and then fix manually.

    // This example demonstrate how to do default affinity initialization and then
    // correct affinity manually for some layers
    InferenceEngine::PluginDispatcher dispatcher({ FLAGS_pp, archPath , "" });
    InferenceEngine::InferenceEnginePluginPtr enginePtr;
    enginePtr = dispatcher.getPluginByDevice("HETERO:FPGA,CPU");
    HeteroPluginPtr hetero(enginePtr);
    hetero->SetAffinity(network, { }, &resp);
    network.getLayerByName("qqq")->affinity = "CPU";
    InferencePlugin plugin(enginePtr);
    auto executable_network = plugin.LoadNetwork(network, {});

    If you rely on default affinity distribution, you can avoid calling IHeteroInferencePlugin::SetAffinity by calling ICNNNetwork::LoadNetwork instead:

    InferenceEngine::PluginDispatcher dispatcher({ FLAGS_pp, archPath , "" });
    InferenceEngine::InferenceEnginePluginPtr enginePtr;
    enginePtr = dispatcher.getPluginByDevice("HETERO:FPGA,CPU");
    InferencePlugin plugin(enginePtr);
    CNNNetReader reader;
    reader.ReadNetwork("Model.xml");
    reader.ReadWeights("Model.bin");
    auto executable_network = plugin.LoadNetwork(network, {});

    Splitting the Network and Execution

    While loading to the Heterogeneous plugin, network is divided to several parts and loaded to dedicated plugins. Intermediate blobs between these subgraphs are allocated automatically in the most efficient way.

    Execution Precision

    Precision for inference in the Heterogeneous plugin is defined by:

    • Precision of the Intermediate Representation
    • Ability of final plugins to execute in precision defined in the Intermediate Representation

    Examples:

    • To execute Intel® Integrated Graphics with a CPU fallback with the FP16 on Intel® Integrated Graphics, use only FP16 for the Intermediate Representation. The Heterogeneous plugin converts the weight from FP16 to FP32 for execution on the CPU.
    • To execute on FPGA with a CPU fallback, use any precision for the Intermediate Representation. The execution on FPGA is defined by bitstream, the execution on CPU happens in FP32.

    Use these samples with the command:

     ./object_detection_sample_ssd -m <path_to_model>/ModelSSD.xml -i <path_to_pictures>/picture.jpg -d HETERO:FPGA,CPU

    where:

    • HETERO is the Heterogeneous plugin 
    • FPGA,CPU is the fallback policy with the priority on FPGA and the fallback to the CPU

    You can point more than two devices: for example, -d HETERO:FPGA,GPU,CPU

    Analyzing With the Heterogeneous Execution

    After enabling the KEY_HETERO_DUMP_GRAPH_DOT config key, dump the GraphViz* .dot files with annotations of devices per layer.

    The Heterogeneous plugin can generate two files:

    • hetero_affinity_<network_name>.dot - annotation of affinities per layer. This file is written to the disk only if the default fallback policy is executed.
    • hetero_subgraphs_<network_name>.dot - annotation of affinities per graph. This file is written to the disk during the execution of ICNNNetwork::LoadNetwork() for the heterogeneous plugin.
      #include "ie_plugin_config.hpp"
      #include "hetero/hetero_plugin_config.hpp"
      using namespace InferenceEngine::PluginConfigParams;
      using namespace InferenceEngine::HeteroConfigParams;
      ...
      enginePtr = dispatcher.getPluginByDevice("HETERO:FPGA,CPU");
      InferencePlugin plugin(enginePtr);
      plugin.SetConfig({ {KEY_HETERO_DUMP_GRAPH_DOT, YES} });

    Use the GraphViz* utility or converters to create .png formats. On Ubuntu* operating system, you can use the following utilities:

    • sudo apt-get install xdot
    • xdot hetero_subgraphs.dot

    Besides generation of .dot files, you can use error listening mechanism:

    class FPGA_ErrorListener : public InferenceEngine::IErrorListener
    {
    public:
        virtual void onError(const char *msg) noexcept override {
            std::cout << msg;
        }
    };
    ...
    FPGA_ErrorListener err_listener;
    plugin->SetLogCallback(err_listener);
    

    If during network loading some layers are deciced to be executed on a fallback plugin, the following message is printed:

    Layer (Name: detection_out, Type: DetectionOutput) is not supported:
    	custom or unknown.
    	Has (3) sets of inputs, must be 1, or 2.
    	Input dimensions (2) should be 4.
    

    You can use performance data (in samples, it is an option -pc) to get performance data on each subgraph.

    Output example for Googlenet v1 running on FPGA with a fallback to the CPU:

    subgraph1: 1. input preprocessing (mean data/FPGA):EXECUTED       layerType:                    realTime: 129        cpu: 129            execType:
    subgraph1: 2. input transfer to DDR:EXECUTED       layerType:                    realTime: 201        cpu: 0              execType:
    subgraph1: 3. FPGA execute time:EXECUTED       layerType:                    realTime: 3808       cpu: 0              execType:
    subgraph1: 4. output transfer from DDR:EXECUTED       layerType:                    realTime: 55         cpu: 0              execType:
    subgraph1: 5. FPGA output postprocessing:EXECUTED       layerType:                    realTime: 7          cpu: 7              execType:
    subgraph1: 6. copy to IE:EXECUTED       layerType:                    realTime: 2          cpu: 2              execType:
    subgraph2: out_prob:          NOT_RUN        layerType: Output             realTime: 0          cpu: 0              execType: unknown
    subgraph2: prob:              EXECUTED       layerType: SoftMax            realTime: 10         cpu: 10             execType: ref
    Total time: 4212     microseconds

    GNA Plugin 

    The GNA plugin is developed for low power scoring of neural networks on the Intel® Speech Enabling Developer Kit, the Amazon Alexa* Premium Far Field Developer Kit, Intel® Pentium® Silver Processor J5005, Intel® Celeron® Processor J4005, Intel® Core™ i3-8121U Processor, and others.

    For the full list of layers supported by GNA plugin, refer to the corresponding table of the Supported Layers section.

    Supported Networks

    The following networks have been tested in this release:

    • Kaldi* Nnet framework:
      • wsj_dnn5b_smbr
      • wsj_cnn4b_smbr
      • rm_lstm4f
      • rm_cnn4a_smbr
      • tedlium_dnn4_smbr
      • tedlium_lstm4f
    • TensorFlow* framework: Not tested in this release

    NOTE: The DNN networks only support batch size greater than 1.

    BIOS, Library, and Drivers

    This release was tested on Intel® NUC7CJYH with BIOS Update [JYGLKCPX.86A] Version: 0037, GNA library version 01.00.00.1317, and GNA driver version 01.00.00.1310 (for Windows* and Linux* OSs).

    Supported Configuration Parameters

    The plugin supports the configuration parameters listed below. The parameters are passed as std::map<std::string, std::string> on InferenceEngine::InferencePlugin::LoadNetwork.

    Parameter NameParameter ValuesDefaultDescription
    GNA_COMPACT_MODEYES/NOYESReuse I/O buffers to save space (makes debugging harder)
    GNA_SCALE_FACTORFP32 number1.0Scale factor to use for input quantization
    KEY_GNA_DEVICE_MODECPU/GNA_AUTO/GNA_HW/GNA_SW/GNA_SW_EXACTGNA_AUTOExecution mode (CPU, GNA, and emulation modes)
    KEY_GNA_FIRMWARE_MODEL_IMAGEstring""Name for embedded model binary dump file
    KEY_GNA_PRECISIONI16/I8I16Hint to GNA plugin: preferred integer weight resolution for quantization
    KEY_PERF_COUNTYES/NONOTurn on performance counter reporting
    KEY_GNA_LIB_N_THREADS1-127 integer number1Sets the number of GNA accelerator library worker threads used for inference computation in software modes

    How to Interpret Performance Counters

    As a result of collecting performance counters using InferenceEngine::IInferencePlugin::GetPerformanceCounts, you can find various performance data about execution on GNA.

    Returned map stores a counter description as a key, counter value is stored in the field realTime_uSec of InferenceEngineProfileInfo structure. Current GNA implementation calculates counters for whole utterance scoring and does not provide "per layer" information. API allows to retrieve counter units in cycles, but they can be converted to seconds as follows:

    seconds = cycles/GNA frequency
    

    Intel Core i3-8121U processor includes GNA with frequency 400MHz, and Intel Pentium Silver J5005 and Intel Celeron J4005 processors - 200MHz.

    Performance counters provided for the time being:

    • Scoring request performance results:
      • number of total cycles spent on scoring in hardware (including compute and memory stall cycles)
      • number of stall cycles spent in hardware

    Multithreading Support in GNA Plugin

    GNA plugin supports the following configuration parameters for multithreading management:

    • KEY_GNA_LIB_N_THREADS

      By default, the GNA plugin uses one worker thread for inference computations. This parameter allows you to create up to 127 threads for software modes.

    NOTE: Multithreading mode does not guarantee the same computation order as the order of issuing. Additionally, in this case, software modes do not implement any serializations.

    Using Shape Inference 

    Shape Inference feature enables resizing network before loading it to a plugin.
    It makes possible to specify differently-sized input upon reading the model by the Inference Engine without going back to the Model Optimizer.
    The feature is exposed to replace InferenceEngine::ICNNNetwork::SetBatchSize as well,
    as setting batch is a special case of setting the whole input shape.

    Usage

    The primary method of the feature is InferenceEngine::CNNNetwork::reshape.
    It gets new input shapes and propagates it from input to output for all intermediates layers of the given network.
    The method takes InferenceEngine::ICNNNetwork::InputShapes - a map of pairs: name of input data and its dimension.

    The algorithm for resizing network is the following:

    1. Collect the map of input names and shapes from Intermediate Representation (IR) using the helper method InferenceEngine::CNNNetwork::getInputShapes.
    2. Set new input shapes.
    3. Call reshape().

    Here is a code example:

    // ------------- 0. Read IR and image ----------------------------------------------
    CNNNetReader network_reader;
    network_reader.ReadNetwork("path/to/IR/xml");
    CNNNetwork network = network_reader.getNetwork();
    cv::Mat image = cv::imread("path/to/image");
    // ---------------------------------------------------------------------------------
    
    // ------------- 1. Collect the map of input names and shapes from IR---------------
    auto input_shapes = network.getInputShapes();
    // ---------------------------------------------------------------------------------
    
    // ------------- 2. Set new input shapes -------------------------------------------
    std::string input_name;
    SizeVector input_shape;
    std::tie(input_name, input_shape) = *input_shapes.begin(); // let's consider first input only
    input_shape[0] = batch_size; // set batch size to the first input dimension
    input_shape[2] = image.rows; // changes input height to the image one
    input_shape[3] = image.cols; // changes input width to the image one
    input_shapes[input_name] = input_shape;
    // ---------------------------------------------------------------------------------
    
    // ------------- 3. Call reshape ---------------------------------------------------
    network.reshape(input_shapes);
    // ---------------------------------------------------------------------------------
    
    ...
    
    // ------------- 4. Loading model to the plugin ------------------------------------
    ExecutableNetwork executable_network = plugin.LoadNetwork(network, {});
    // ---------------------------------------------------------------------------------
    

    Shape Inference feature is used in the Smart Classroom Demo.

    Extensibility

    Custom Shape Inference functions are registered via calling InferenceEngine::ICNNNetwork::AddExtension with implemented InferenceEngine::IShapeInferExtension - holder of the custom implementations.

    Holder requires to implement two key methods:

    • InferenceEngine::IShapeInferExtension::getShapeInferImpl - To return custom shape infer implementation for the given type
    • InferenceEngine::IShapeInferExtension::getShapeInferTypes - To provide all custom types Custom shape infer implementation is represented by InferenceEngine::IShapeInferImpl::inferShapes.

    It is not possible to override built-in (see below Supported layer types) shape infer functions. Custom type must be different from supported once. Extensibility mechanism of Shape Inference feature is demonstrated in the Hello Shape Infer SSD Sample.

    Supported Layer Types

    • Activation
    • ArgMax
    • BatchNormalization
    • CTCGreedyDecoder
    • Clamp
    • Concat
    • Const
    • Convolution
    • Copy
    • Crop
    • Deconvolution
    • DetectionOutput
    • ELU
    • Eltwise
    • Flatten
    • FullyConnected/InnerProduct
    • GRN
    • Input
    • Interp
    • LRN/Norm
    • Logistic
    • MVN
    • Memory
    • Normalize
    • PReLU
    • PSROIPooling
    • Permute
    • Pooling
    • Power
    • PowerFile
    • PriorBox
    • PriorBoxClustered
    • Proposal
    • ROIPooling
    • ReLU
    • ReLU6
    • RegionYolo
    • ReorgYolo
    • Resample
    • Reshape
    • ScaleShift
    • Sigmoid
    • SimplerNMS
    • Slice
    • SoftMax
    • SpatialTransformer
    • Split
    • TanH
    • Tile
    • Upsampling

    Limitations

    Shape Inference is a preview feature with a set of limitations:

    • Reshape layer might not work correctly for TensorFlow* models when its shape and parameters are dynamically depends on other layers (for example, GoogleNet-V3-TF or vehicle-license-plate-detection-barrier-0107).
    • Models with fixed dimensions in the dim attribute of the Reshape layer cannot be resized.
    • Shape inference for the Interp layer works for almost all cases, except for Caffe* models with fixed width and height parameters (for example, semantic-segmentation-adas-0001).

    Using Dynamic Batching

    Dynamic Batching feature allows you to dynamically change batch size for inference calls within preset batch size limit. This feature might be useful when batch size is unknown beforehand and using extra large batch size is undesired or impossible due to resource limitations. For example, face detection with person age, gender, or mood recognition is a typical usage scenario.

    Usage

    You can activate Dynamic Batching by setting KEY_DYN_BATCH_ENABLED flag to YES in a configuration map that is passed to a plugin while loading a network. This configuration creates an ExecutableNetwork object that will allow setting batch size dynamically in all of its infer requests using SetBatch() method. The batch size that was set in passed CNNNetwork object will be used as a maximum batch size limit.

    Here is a code example:

    int dynBatchLimit = FLAGS_bl;   //take dynamic batch limit from command line option
    CNNNetReader networkReader;
    // Read network model
    networkReader.ReadNetwork(modelFileName);
    networkReader.ReadWeights(weightFileName);
    CNNNetwork network = networkReader.getNetwork();
    
    // enable dynamic batching and prepare for setting max batch limit
    const std::map<std::string, std::string> dyn_config =
    { { PluginConfigParams::KEY_DYN_BATCH_ENABLED, PluginConfigParams::YES } };
    network.setBatchSize(dynBatchLimit);
    
    // create executable network and infer request
    auto executable_network = plugin.LoadNetwork(network, dyn_config);
    auto infer_request = executable_network.CreateInferRequest();
    
    
    ...
    
    
    // process a set of images
    // dynamically set batch size for subsequent Infer() calls of this request
    size_t batchSize = imagesData.size();
    infer_request.SetBatch(batchSize);
    infer_request.Infer();
    
    ...
    
    // process another set of images
    batchSize = imagesData2.size();
    infer_request.SetBatch(batchSize);
    infer_request.Infer();
    

    Limitations

    Currently, certain limitations for using Dynamic Batching exist:

    • Use Dynamic Batching with CPU and GPU plugins only.

    • Use Dynamic Batching on topologies that consist of certain layers only:

      • Convolution
      • Deconvolution
      • Activation
      • LRN
      • Pooling
      • FullyConnected
      • SoftMax
      • Split
      • Concatenation
      • Power
      • Eltwise
      • Crop
      • BatchNormalization
      • Copy

    Do not use layers that might arbitrary change tensor shape (such as Flatten, Permute, Reshape), layers specific to object detection topologies (ROIPooling, ProirBox, DetectionOutput), and custom layers. Topology analysis is performed during the process of loading a network into plugin, and if topology is not applicable, an exception is generated.

    Low-Precision 8-bit Integer Inference

    Disclaimer

    Inference Engine with low-precision 8-bit integer inference is in a feature preview and requires the following prerequisites to be satisfied:

    • Inference Engine CPU Plugin must be built with the Intel® Math Kernel Library (Intel® MKL) dependency. In the Intel Distribution of OpenVINO, it is satisfied by default, this is mostly the requirement if you are using OpenVINO available in Open Source, because Open Source OpenVINO can be also built with OpenBLAS that is unacceptable if you want to use 8-bit integer inference.
    • Intel® platforms that support at least one extension to x86 instruction set from the following list:
      • Intel® Advanced Vector Extensions 512 (Intel® AVX-512)
      • Intel® Advanced Vector Extensions 2.0 (Intel® AVX2)
      • Intel® Streaming SIMD Extensions 4.2 (Intel® SSE4.2)
      • A model must contain at least one activation layer of Rectified Linear Unit (ReLU) type. If this requirement is not satisfied, 8-bit inference is unavailable for this particular model in Inference Engine.

    The 8-bit inference feature was validated on the following topologies:

    • Classification models:
      • Caffe Inception v1, Inception v4
      • Caffe ResNet-50 v1, ResNet-101 v1
      • Caffe MobileNet
      • Caffe SqueezeNet v1.0, SqueezeNet v1.1
      • Caffe VGG16, VGG19
      • Caffe DenseNet-121, DenseNet-161, DenseNet-169, DenseNet-201
      • TensorFlow Inception v3, Inception v4, Inception ResNet v2
    • Object detection models:
      • Caffe SSD_SqueezeNet
      • Caffe SSD_MobileNet
      • Caffe SSD_Vgg16_300

    Introduction

    A lot of investigation was made in the field of deep learning with the idea of using low precision computations during inference in order to boost deep learning pipelines and gather higher performance. For example, one of the popular approaches is to shrink the precision of activations and weights values from fp32 precision to smaller ones, for example, to fp11 or int8. For more information about this approach, refer to Brief History of Lower Precision in Deep Learning section in this whitepaper.

    8-bit computations (referred to as int8) offer better performance compared to the results of inference in higher precision (for example, fp32), because they allow to load more data into a single processor instruction. Usually the cost for significant boost is a reduced accuracy. However, it is proved that the drop in accuracy can be negligible and depends on task requirements, so that the application engineer can set up the maximum accuracy drop that is acceptable.

    Current Inference Engine solution for low-precision inference uses Intel MKL-DNN, which supports inference of the following layers in 8-bit integer computation mode:

    • Convolution
    • ReLU
    • Pooling
    • Eltwise
    • Concat

    This means that 8-bit inference can only be performed with the CPU plugin on the layers listed above. All other layers are executed in the format supported by the CPU plugin: 32-bit floating point format (fp32).

    Low-Precision 8-bit Integer Inference Workflow

    For 8-bit integer computations, the original model (or its Intermediate Representation) must be in the fp32 format. In order to perform calculation of layers in the int8 format, the input data (input blob) and weights of the given layer (also biases and/or other blobs of the layer) must be quantized - transitioned from fp32 to int8 format. The quantization process converts model input into a lower-precision format. The precision and accuracy factors are specified by the scale and rounding-mode respectively. Read more about mathematical computations under the hood in the white paper.

    8-bit inference pipeline includes two stages (also refer to the figure below):

    1. Offline stage, or model calibration. During this stage, scale factors and execution profiles are defined for each layer in a way that low-precision accuracy drop for 8-bit integer inference satisfies the specified threshold. The output of this stage is a calibrated model.
    2. Run-time stage. This stage is an internal procedure of the CPU Plugin. During this stage, the calibrated model is loaded to the plugin. For each layer that obtain the corresponding execution profile, the plugin normalizes the weights (and biases, if present). It also adds scale factors at the particular places of the model defined by the internal algorithm with regards to the maximum performance and minimum number of extra layout manipulations.

    Int8 flow on CPU plugin

    Offline Stage: Model Calibration

    One of the vital components for successful data quantization is a set of scale factors for each layer that supports 8-bit computations. These scales are obtained from statistics of layers activations collected by the Calibration Tool on a calibration dataset. The calibration dataset contains images and can be a subset of the validation set. A small fraction of images from validation dataset (1-5%) is enough to create a calibration dataset. For more information on the dataset preparation, refer to the Validation Application.

    To calibrate a model, the calibration tool preforms the following steps:

    1. Collecting layer statistics (minimum and maximum values of layers activations) and baseline of accuracy metric for fp32 inference. Note that accuracy metric depends on the type of the calibrated model. For classification networks, top-1 metric is used; for object detection models, mAP metric is used.
    2. Collecting accuracy metric for 8-bit inference. During this step, different filters are applied to the collected activations statistics to remove activation outliers (isolated values that are very different from the majority of known values). If the resulting accuracy satisfies the required level with respect to the accepted accuracy drop delta, the Calibration Tool stops the calibration process.
    3. Collecting accuracy drop information on the calibration dataset for each layer that supports 8-bit computations using the Normalized Root-Mean-Square Deviation metric. This metric allows to put all layers in decreasing order so that it is clear which layers bring the biggest accuracy drop.
    4. Eliminating layers with the largest accuracy drop from 8-bit computation by switching them back to fp32 mode. After eliminating one layer, the Calibration Tool computes the accuracy of this configuration. Until the resulting accuracy satisfies the required level with respect to the accepted accuracy drop delta (which equals 1% by default), the tool continues switching layers back to fp32 computations in the order defined in the step 3. However, calibration of the model with all layers returned to fp32 computations is meaningless, so that this plays a role of hard stop of the whole calibration process.

    When the calibration completes, the tool writes the resulting statistics and the modified Intermediate Representation (IR) to the .xml file. The tool does not change the IR structure, so the layers hierarchy is the same. However, the layers that are chosen to be executed in 8-bit format are marked with the appropriate profile attribute, and their statistics is stored at the end of the .xml file.

    When you pass the calibrated IR to the CPU plugin, the plugin automatically recognizes it as calibrated and performs the 8-bit inference. At the same time, other plugins do not support 8-bit inference, so if you pass the calibrated model to them, statistics and additional attributes are ignored and the model is inferred in the precision that this plugin supports.

    Run-Time Stage: Quantization

    This is the second stage of the 8-bit integer inference. After you load the calibrated model IR to the CPU plugin, it performs quantization for 8-bit inference:

    • Inserts the corresponding scale factors to transform layer inputs precision to unsigned int8 data type and normalize output layers to unsigned 8-bit integer type, to signed 8-bit integer type, or to 32-bit floating data type
    • Normalizes the weights of convolution layers to fit the signed 8-bit integer data type
    • Normalizes the biases of convolution layers to fit the signed 32-bit integer data type

    Performance Counters

    Information about layer precision is stored in the performance counters that are available from the Inference Engine API. The layers have the following marks:

    • Suffix I8 for layers that had 8-bit data type input and were computed in 8-bit precision
    • Suffix FP32 for layers computed in 32-bit precision

    For example, the performance counters table for the Inception model can look as follows:

    inception_5b/5x5_reduce       EXECUTED       layerType: Convolution        realTime: 417        cpu: 417            execType: gemm_blas_I8
      inception_5b/output           EXECUTED       layerType: Concat             realTime: 34         cpu: 34             execType: ref_I8
      inception_5b/output_U8_nhw... EXECUTED       layerType: Reorder            realTime: 33092      cpu: 33092          execType: reorder_I8
      inception_5b/output_oScale... EXECUTED       layerType: ScaleShift         realTime: 1390       cpu: 1390           execType: jit_avx2_FP32
      inception_5b/output_oScale... EXECUTED       layerType: Reorder            realTime: 143        cpu: 143            execType: reorder_FP32
      inception_5b/pool             EXECUTED       layerType: Pooling            realTime: 59301      cpu: 59301          execType: ref_any_I8

    The execType column of the table includes inference primitives with specific suffixes.

    Overview of Inference Engine Python* API

    NOTE: This is a preview version of the Inference Engine Python* API for evaluation purpose only. Module structure and API itself will be changed in future releases.

    This API provides a simplified interface for the Inference Engine functionality that allows you to:

    • Handle the models
    • Load and configure Inference Engine plugins based on device names
    • Perform inference in synchronous and asynchronous modes with arbitrary number of infer requests (the number of infer requests may be limited by target device capabilities)

    Supported OSes

    Currently, the Inference Engine Python* API is supported on Ubuntu* 16.04, Microsoft Windows* 10, and CentOS* 7.3 OSes.

    Supported Python* versions:

    • On Ubuntu 16.04: 2.7, 3.5, 3.6
    • On Windows 10: 3.5, 3.6
    • On CentOS 7.3: 3.4, 3.5, 3.6

    Setting Up the Environment

    To configure the environment for the Inference Engine Python* API, run:

    • On Ubuntu 16.04: source <INSTALL_DIR>/bin/setupvars.sh .
    • On Windows 10: call <INSTALL_DIR>\deployment_tools\inference_engine\python_api\setenv.bat

    The script automatically detects the latest installed Python* version and configures the environment if the latest installed Python version is supported.

    If you want to use a specific supported Python* version, set the environment variable PYTHONPATH=<INSTALL_DIR>/deployment_tools/inference_engine/python_api/<desired_python_version> after running the environment configuration script.

    IENetLayer Class

    This class contains the main information about a layer and allows you to modify some layer parameters.

    Class Attributes

    • name - Layer name
    • type - Layer type
    • precision - Layer base operating precision. Provides getter and setter interfaces.
    • layout - Returns the layout of shape of the layer.
    • shape - Return the list of the shape of the layer.
    • parents - Returns a list, which contains names of layers preceding this layer.
    • children - Returns a list, which contains names of layers following this layer.
    • weights- Dictionary with layer weights, biases or custom blobs if any
    • params - Layer specific parameters. Provides getter and setter interfaces to get and modify layer parameters.

      NOTE: Some modifications can be ignored and overwritten by target plugin (for example, modification of convolution kernel size will be reflected in layer parameters but finally the plugin will ignore it and will use initial kernel size).

    • affinity - Layer affinity set by a user or default affinity set by the IEPlugin.set_initial_affinity() method

      The affinity attribute provides an interface to get and set the layer affinity, so you can modify the layer affinity directly. For example:

      >>> net = IENetwork(model=<path_to_xml_file>, weights=<path_to_bin_file>)
      >>> plugin = IEPlugin(device="HETERO:FPGA,CPU")
      >>> plugin.set_config({"TARGET_FALLBACK": "HETERO:FPGA,CPU"})
      >>> plugin.set_initial_affinity(net)
      >>> for l in net.layers.values():
      ...     if l.type == "Convolution":
      ...         l.affinity = "CPU"
      

      To correctly set affinity for the network, you must first initialize and properly configure the HETERO plugin:

      • set_config({"TARGET_FALLBACK": "HETERO:FPGA,GPU"}) function configures the plugin fallback devices and their order.
      • plugin.set_initial_affinity(net) function sets affinity parameter of model layers according to its support on specified devices.

      After default affinity is set by the plugin, override the default values by setting affinity manually as described in the example above.

      To understand how default and non-default affinities are set:

      1. Call net.layers function right after model loading and check that layer affinity parameter is empty.
      2. Call plugin.set_default_affinity(net).
      3. Call net.layers and check layer affinity parameters to see how plugin set default affinity.
      4. Set layer affinity manually as described above.
      5. Call net.layers again and check the layer affinity parameters to see how they changed after manual affinity setting.

      For the full usage pipeline, refer to affinity_setting_sample.py.

    IENetwork Class

    This class contains the information about the network model read from IR and allows you to manipulate with some model parameters such as layers affinity and output layers.

    Class Constructor

    • __init__(model: str, weights: str)
      • Parameters:
        • model - Path to .xml file of the IR
        • weights - Path to .bin file of the IR

    Class Attributes

    • name - Name of the loaded network
    • inputs - A dictionary that maps input layer names to InputInfo objects. For example, to get a shape of the input layer:
      >>> net = IENetwork(model=path_to_xml_file, weights=path_to_bin_file)
      >>> net.inputs
      {'data': <inference_engine.ie_api.InputInfo object at 0x7efe042dedd8>}
      >>> net.inputs['data'].shape
      [1, 3, 224, 224]
      
    • outputs - A dictionary that maps output layer names to OutputInfo objects. For example, to get a shape of the output layer:
      >>> net = IENetwork(model=path_to_xml_file, weights=path_to_bin_file)
      >>> net.inputs
      {'prob': <inference_engine.ie_api.OutputInfo object at 0x7efe03ab95d0>}
      >>> net.outputs['prob'].shape
      [1, 1000]
      
    • batch_size - Batch size of the network. Provides getter and setter interfaces to get and modify the network batch size. For example:
      >>> net = IENetwork(model=path_to_xml_file, weights=path_to_bin_file)
      >>> net.batch_size
      1
      >>> net.batch_size = 4
      >>> net.batch_size
      4
      >>> net.inputs['data'].shape
          [4, 3, 224, 224]
      
    • layers - Returns a dictionary that maps network layer names to IENetLayer objects containing layer properties in topological order. For example, to list all network layers:
      >>> net = IENetwork(model=path_to_xml_file, weights=path_to_bin_file)
      >>> net.layers
      {'conv0': <inference_engine.ie_api.IENetLayer object at 0x7f3a4c102370>
      ...
      }
      
    • stats - Returns LayersStatsMap object containing dictionary that maps network layer names to calibration statistics represented by LayerStats objects. LayersStatsMap class inherited from built-in python dict and overrides default update()method to allow to set or modify layers calibration statistics.
      >>> net = IENetwork(model=path_to_xml_file, weights=path_to_bin_file)
      >>> net.stats.update({
      "conv1_2d" : LayserStats(min=(-25, -1, 0), max=(63, 124, 70)),
      "conv2_2d" : LayserStats(min=(-5, -1, 0, 1, -7, 2), max=(63, 124, 70, 174, 99, 106)),
      })
      

      For more details about low precision inference please refer to Low-Precision 8-bit Integer Inference section.

    Class Methods

    • from_ir(model: str, weights: str)
      • Description:

        NOTE: The function is deprecated. Please use the IENetwork() class constructor to create valid instance of IENetwork.

        The class method serves to read the model from the .xml and .bin files of the IR.

      • Parameters:
        • model - path to .xml file of the IR
        • weights - path to .bin file of the IR
      • Return value: An instance of the IENetwork class
      • Usage example:
        >>> net = IENetwork(model=<path_to_xml_file>, weights=<path_to_bin_file>)
        >>> net
        <inference_engine.ie_api.IENetwork object at 0x7fd7dbce54b0>

    Instance Methods

    • add_outputs(outputs):
      • Description:

        The method serves to mark any intermediate layer as output layer to retrieve the inference results from the specified layers.

      • Parameters:
        • outputs - a list of layer names to be set as model outputs. In case of setting one layer as output, string with one layer can be provided.
      • Return value: None
      • Usage example:
        >>> net = IENetwork(model=<path_to_xml_file>, weights=<path_to_bin_file>)
        >>> net.add_outputs(["conv5_1/dwise', conv2_1/expand'])]
        >>> net.outputs
        ['prob', 'conv5_1/dwise', 'conv2_1/expand']

        NOTE: The last layers (nodes without successors in graph representation of the model) are set as output by default. In the case above, prob layer is a default output and conv5_1/dwise, conv2_1/expand are user-defined outputs.

    • reshape(input_shapes: dict):
      • Description:

        The method reshapes the network to change spatial dimensions, batch size, or any dimension.

        NOTE: Before using this method, make sure that the target shape is applicable for the network. Changing the network shape to an arbitrary value may lead to unpredictable behavior.

      • Paramters:
        • input_shapes - The dictionary that maps input layer names to tuples with the target shape
      • Return value: None
      • Usage example:
        >>> net = IENetwork(model=path_to_xml_file, weights=path_to_bin_file)
        >>> input_layer = next(iter(net.inputs))
        >>> n, c, h, w = net.inputs[input_layer]
        >>> net.reshape({input_layer: (n, c, h*2, w*2)}]
              
    • serialize(<path_to_xml>, <path_to_bin>):
      • Description:

        The method serializes the network and stores it in files.

      • Parameters:
        • <path_to_xml> - path to a file, where a serialized model will be stored.
        • <path_to_bin> - path to a file, where serialized weights will be stored.
      • Return value: None
      • Usage example:
        >>> net = IENetwork(model=<path_to_model>, weights=<path_to_weights>)
        >>> net.serialize(<path_to_xml>, <path_to_bin>)       

    LayerStats Class

    This class is a layer calibration statistic container.

    Class Constructor

    • __init__(min: tuple = (), max: tuple = ())
      • Parameters:
        • min - Tuple with per-channel minimum layer activation values
        • max - Tuple with per-channel maximum layer activation values

    InputInfo Class

    This class contains the information about the network input layers

    Class Attributes

    • precision - Precision of the input data provided by user. Provides setter and getter interfaces to get and modify input layer precision.

      Applicable precisions: FP32 FP16, I32, I16, I8, U32, U16

      NOTE: Support of any calculation precision depends on the target plugin

    • layout - Layout of the input data provided by user. Provides setter and getter interfaces to get and modify input layer layout.

      Applicable layouts: NCHW, NHWC, OIHW, C, CHW, HW, NC, CN, BLOCKED

    • shape - Input layer data shape

    OutputInfo Class

    This class contains the information about the network output layers.

    Class Attributes

    • precision - Precision of output data. Provides setter and getter interfaces to get and modify output layer precision.
    • layout - Layout of the output data provided by user
    • shape - Input layer data shape

    IEPlugin Class

    This class is the main plugin interface and serves to initialize and configure the plugin.

    Class Constructor

    • __init__(device: str, plugin_dirs=None)
      • Parameters:
        • device - target device name. Supported devices: CPU, GPU, FPGA, MYRIAD, HETERO
        • plugin_dirs - list of paths to plugin directories

    Properties

    • device - a name of the device that was specified to initialize IEPlugin
    • version - a version of the plugin

    Instance Methods

    • load(network: IENetwork, num_requests: int=1, config=None)
      • Description:

        Loads a network that was read from the IR to the plugin and creates an executable network from a network object. You can create as many networks as you need and use them simultaneously (up to the limitation of the hardware resources).

      • Parameters:
        • network - A valid IENetwork instance
        • num_requests - A positive integer value of infer requests to be created. Number of infer requests may be limited by device capabilities.
        • config - A dictionary of plugin configuration keys and their values
      • Return value: None
      • Usage example:
        >>> net = IENetwork(model=<path_to_xml_file>, weights=<path_to_bin_file>)
        >>> plugin = IEPlugin(device="CPU")
        >>> exec_net = plugin.load(network=net, num_requests=2)
        >>> exec_net
        <inference_engine.ie_api.ExecutableNetwork object at 0x7f5140bbcd38>
    • set_initial_affinity(net: IENetwork)
      • Description:

        Sets initial affinity for model layers according to the HETERO plugin logic. Applicable only if IEPlugin was initialized for HETERO device.

      • Parameters:
        • net - A valid instance of IENetwork
      • Return value: None
      • Usage example:

        See affinity attribute of the IENetLayer class.

    • add_cpu_extension(extension_path: str)
      • Description:

        Loads extensions library to the plugin. Applicable only for CPU device and HETERO device with CPU.

      • Parameters:
        • extension_path - a full path to CPU extensions library
      • Return value: None
      • Usage example:
        >>> plugin = IEPlugin(device="CPU")
        >>> plugin.add_cpu_extenstions(ext_lib_path)
    • set_config(config: dict)
      • Description:

        Sets a configuration for the plugin. Refer to SetConfig() in Inference Engine C++ documentation for acceptable keys and values list.

      • Parameters:
        • config - a dictionary of keys and values of acceptable configuration parameters
      • Return value: None
      • Usage examples: See affinity attribute of the IENetLayer class.
    • get_supported_layers(net: IENetwork)
      • Description:

        Returns a set of layers supported by the plugin. Please note that in case of CPU plugin support of a layer may depends on extension loaded by add_cpu_extenstion() method.

      • Parameters:
        • net - a valid instance of IENetwork
      • Return value: Set of layers supported by the plugin
      • Usage example: See affinity attribute of the IENetLayer class.

    ExecutableNetwork Class

    This class represents a network instance loaded to plugin and ready for inference.

    Class Constructor

    There is no explicit class constructor. To make a valid instance of ExecutableNetwork, use load() method of the IEPlugin class.

    Class Attributes

    • requests - a tuple of InferRequest instances

      • Usage example:
        >>> net = IENetwork(model=path_to_xml_file, weights=path_to_bin_file)
        >>> plugin = IEPlugin(device="CPU")
        >>> exec_net = plugin.load(network=net, num_requsts=3)
        >>> exec_net.requests
        (<inference_engine.ie_api.InferRequest object at 0x7f66f56c57e0>,
        <inference_engine.ie_api.InferRequest object at 0x7f66f56c58b8>,
        <inference_engine.ie_api.InferRequest object at 0x7f66f56c5900>)
        

    Instance Methods

    • infer(inputs=None)
      • Description:

        Starts synchronous inference for the first infer request of the executable network and returns output data. Wraps infer() method of the InferRequest class

      • Parameters:
        • inputs - a dictionary of input layer name as a key and numpy.ndarray of proper shape with input data for the layer as a value
      • Return value: A dictionary of output layer name as a key and numpy.ndarray with output data of the layer as a value
      • Usage example:
        >>> net = IENetwork(model=<path_to_xml_file>, weights=<path_to_bin_file>)
        >>> plugin = IEPlugin(device="CPU")
        >>> exec_net = plugin.load(network=net, num_requests=2)
        >>> res = exec_net.infer({'data': img})
        >>> res
        {'prob': array([[[[2.83426580e-08]],
                         [[2.40166020e-08]],
                         [[1.29469613e-09]],
                         [[2.95946148e-08]]
                         ......
                      ]])}

      For illustration of input data preparation, please see samples (for example, classification_sample.py).

    • start_async(request_id, inputs=None)
      • Description:

        Starts asynchronous inference for specified infer request. Wraps async_infer() method of the InferRequest class

      • Parameters:
        • request_id - index of infer request to start inference
        • inputs - a dictionary of input layer name as a key and numpy.ndarray of proper shape with input data for the layer as a value
      • Return value: A handler of specified infer request, which is an instance of the InferRequest class.
      • Usage example:
        >>> infer_request_handle = exec_net.start_async(request_id=0, inputs={input_blob: image})
        >>> infer_status = infer_request_handle.wait()
        >>> res = infer_request_handle.outputs[out_blob]

        For more details about infer requests processing, see classification_sample_async.py (simplified case) and object_detection_demo_ssd_async.py (real synchronous use case) samples.

    InferRequest Class

    This class provides an interface to infer requests of ExecutableNetwork and serves to handle infer requests execution and to set and get output data.

    Class Constructor

    There is no explicit class constructor. To make a valid InferRequest instance, use load() method of the IEPlugin class with specified number of requests to get ExecutableNetwork instance which stores infer requests.

    Class Attributes

    • inputs - a dictionary of input layer name as a key and numpy.ndarray of proper shape with input data for the layer as a value
    • outputs - a dictionary of output layer name as a key and numpy.ndarray with output data of the layer as a value

    Usage example

    >>> exec_net.requests[0].inputs['data'][:] = image
    >>> exec_net.requests[0].infer()
    >>> res = exec_net.requests[0].outputs['prob']
    >>> np.flip(np.sort(np.squeeze(res)),0)
    array([4.85416055e-01, 1.70385033e-01, 1.21873841e-01, 1.18894853e-01,
    	   5.45198545e-02, 2.44456064e-02, 5.41366823e-03, 3.42589128e-03,
    	   2.26027006e-03, 2.12283316e-03 ...])
    

    Instance Methods

    It is not recommended to run inference directly on InferRequest instance. To run inference, please use simplified methods infer() and start_async() of ExecutableNetwork.

    • infer(inputs=None)
      • Description:

        Starts synchronous inference of the infer request and fill outputs array

      • Parameters:
        • inputs - a dictionary of input layer name as a key and numpy.ndarray of proper shape with input data for the layer as a value
      • Return value: None
      • Usage example:
        >>> exec_net = plugin.load(network=net, num_requests=2)
        >>> exec_net.requests[0].infer({input_blob: image})
        >>> res = exec_net.requests[0].outputs['prob']
        >>> np.flip(np.sort(np.squeeze(res)),0)
        array([4.85416055e-01, 1.70385033e-01, 1.21873841e-01, 1.18894853e-01,
               5.45198545e-02, 2.44456064e-02, 5.41366823e-03, 3.42589128e-03,
               2.26027006e-03, 2.12283316e-03 ...]) 
    • async_infer(inputs=None)
      • Description:

        Starts asynchronous inference of the infer request and fill outputs array

      • Parameters:
        • inputs - a dictionary of input layer name as a key and numpy.ndarray of proper shape with input data for the layer as a value
      • Return value: None
      • Usage example:
        >>> exec_net = plugin.load(network=net, num_requests=2)
        >>> exec_net.requests[0].async_infer({input_blob: image})
        >>> exec_net.requests[0].wait()
        >>> res = exec_net.requests[0].outputs['prob']
        >>> np.flip(np.sort(np.squeeze(res)),0)
        array([4.85416055e-01, 1.70385033e-01, 1.21873841e-01, 1.18894853e-01,
               5.45198545e-02, 2.44456064e-02, 5.41366823e-03, 3.42589128e-03,
               2.26027006e-03, 2.12283316e-03 ...]) 
    • wait(timeout=-1)
      • Description:

        Waits for the result to become available. Blocks until specified timeout elapses or the result becomes available, whichever comes first.

        There are special values of the timeout parameter:

        • 0 - immediately returns the inference status. It does not block or interrupt execution. To find statuses meaning, please refer to InferenceEngine::StatusCode in Inference Engine C++ documentation
        • -1 - waits until inference result becomes available (default value)
      • Parameters:
        • timeout - time to wait in milliseconds or special (0, -1) cases described above. If not specified, timeout value is set to -1 by default.
      • Usage example:

        See async_infer() method of the the InferRequest class.

    • get_perf_counts()
      • Description:

        Queries performance measures per layer to get feedback of what is the most time consuming layer.

        NOTE: Performance counters data and format depend on the plugin.

      • Parameters: None
      • Usage example:
        >>> exec_net = plugin.load(network=net, num_requests=2)
        >>> exec_net.requests[0].infer({input_blob: image})
        >>> exec_net.requests[0].get_perf_counts()
        {'Conv2D': {'exec_type': 'jit_avx2_1x1',
        		   'real_time': 154,
        		   'cpu_time': 154,
        		   'status': 'EXECUTED',
        		   'layer_type': 'Convolution'},
                 'Relu6':  {'exec_type': 'undef',
        			'real_time': 0,
        			'cpu_time': 0,
        			'status': 'NOT_RUN',
        			'layer_type': 'Clamp'}
        ...
        }
        	
    • set_batch(size)
      • Description:

        Sets new batch size for certain infer request when dynamic batching is enabled in executable network that created this request.

        NOTE: Support of dynamic batch size depends on the target plugin..

      • Parameters:
        • batch - new batch size to be used by all the following inference calls for this request.
      • Usage example:
        >>> plugin.set_config({"DYN_BATCH_ENABLED": "YES"})
        >>> exec_net = plugin.load(network=net)
        >>> exec_net.requests[0].set_batch(inputs_count)

        Please refer to dynamic_batch_demo.py to see the full usage example.

    Introduction to the Inference Engine Deep Neural Network Builder

    NOTE: It is a preview version of the Inference Engine Deep Neural Network Builder API for evaluation purpose only. Module structure and API itself may be changed in future releases.

    This API extends the Inference Engine functionality that allows to create and modify topologies in the source code.

    Network Builder

    InferenceEngine::Builder::Network allows to create and modify graphs. This class does not modify the original graph if it is used for graph modification. Instead, it creates a copy of the original graph and works with a copied object. Also the use of this class allows to avoid invalid graphs because it checks:

    • The structure of the graph
    • The absence of cycles in the graph
    • All parameters for each layer
    • All shapes and makes shape inference if it is required

    If a graph contains custom layers and shape inference is required, you should add Functions for shape inference to the Network builder from custom Context.

    Network builder contains the following methods for graph modification:

    • addLayer(...) allows to add new layer builders to the network builder. This method creates a copy of the original layer builder, puts the copy to the network builder, and returns the ID of the layer builder which was added to the network builder.
    • removeLayer(...) allows to remove layer builder from the network builder by ID.
    • connect(...) allows to connect two layer builders using layer builder IDs and port indexes.
    • disconnect(...) allows to remove connection from the network builder.
    • getLayer(...) allows to get the layer builder from the network builder by ID.
    • getLayerConnections(...) allows to get all connections for a layer builder by ID.
    • getLayers() allows to get all layer builders.
    • build() allows to generate Inference Engine Network. This method validates each layer builder and graph structure and creates INetwork.

    The function convertToICNNNetwork(...) converts INetwork to CNNNetwork.

    Layer Builder

    InferenceEngine::Builder::Layer class creates and modifies layers. This class allows you to modify all layer parameters, add new constant data, change type and name of the layer, and create a valid layer object.

    Builders for Standard layers

    Each default Inference Engine layer has a special builder added in order to simplify the process of layer creation. These builders hide all unnecessary methods for the specific layer and add new methods.

    Below you can see the list of builders for default layers:

    • InferenceEngine::Builder::ArgMax
    • InferenceEngine::Builder::BatchNormalization
    • InferenceEngine::Builder::Clamp
    • InferenceEngine::Builder::Concat
    • InferenceEngine::Builder::Const
    • InferenceEngine::Builder::Convolution
    • InferenceEngine::Builder::Crop
    • InferenceEngine::Builder::CTCGreedyDecoder
    • InferenceEngine::Builder::Deconvolution
    • InferenceEngine::Builder::DetectionOutput
    • InferenceEngine::Builder::Eltwise
    • InferenceEngine::Builder::ELU
    • InferenceEngine::Builder::FullyConnected
    • InferenceEngine::Builder::GEN
    • InferenceEngine::Builder::Input
    • InferenceEngine::Builder::Memory
    • InferenceEngine::Builder::MVN
    • InferenceEngine::Builder::Norm
    • InferenceEngine::Builder::Normalize
    • InferenceEngine::Builder::Output
    • InferenceEngine::Builder::Permute
    • InferenceEngine::Builder::Pooling
    • InferenceEngine::Builder::Power
    • InferenceEngine::Builder::PReLU
    • InferenceEngine::Builder::PriorBoxClustered
    • InferenceEngine::Builder::PriorBox
    • InferenceEngine::Builder::Proposal
    • InferenceEngine::Builder::PSROIPooling
    • InferenceEngine::Builder::RegionYolo
    • InferenceEngine::Builder::ReLU6
    • InferenceEngine::Builder::ReLU
    • InferenceEngine::Builder::ReorgYolo
    • InferenceEngine::Builder::Reshape
    • InferenceEngine::Builder::ROIPooling
    • InferenceEngine::Builder::ScaleShift
    • InferenceEngine::Builder::Sigmoid
    • InferenceEngine::Builder::SimplerNMS
    • InferenceEngine::Builder::SoftMax
    • InferenceEngine::Builder::Split
    • InferenceEngine::Builder::TanH
    • InferenceEngine::Builder::Tile

    Known Limitations

    The Inference Engine Deep Neural Network Builder API does not support the TensorIterator layer.

    How to Use

    To use the DNN Builder API, include ie_builders.hpp header which includes all Inference Engine builders.

    After that, all builders will be available to use.

    The DNN Builder can be created in different ways:

    // Get network from the reader
    InferenceEngine::CNNNetwork cnnNetwork = networkReader.getNetwork();
    
    // Create DNN builder with a name
    InferenceEngine::Builder::Network graph1("Example1");
    // Create DNN builder from CNNNetwork
    InferenceEngine::Builder::Network graph2(cnnNetwork);
    
    // Build a network
    InferenceEngine::INetwork::Ptr iNetwork = graph2.build();
    // Create DNN builder from INetwork
    InferenceEngine::Builder::Network graph3(*iNetwork);
    
    // Create an Inference Engine context
    InferenceEngine::Context customContext;
    // Add shape infer extension
    customContext.addExtension(customShapeInferExtension);
    
    // Create DNN builder with custom context (all other examples also allow to create graph with custom context)
    InferenceEngine::Builder::Network graph4(customContext, *iNetwork);
    

    You can modify a graph with the DNN Builder:

    // Create DNN builder with a name
    InferenceEngine::Builder::Network graph("Example1");
    
    // Add new layers
    
    // Add an input layer builder in place
    idx_t inputLayerId = graph.addLayer(Builder::InputLayer("in").setPort(Port({1, 3, 22, 22})));
    
    // Add a ReLU layer builder in place with a negative slope 0.1 and connect it with output port 0 of the Input layer builder
    // In this example, layerId is equal to new Input layer builder ID, port index is not set, because 0 is a default value ({layerId} == {layerId, 0})
    idx_t relu1Id = graph.addLayer({{inputLayerId}}, Builder::ReLULayer("relu1").setNegativeSlope(0.1f));
    
    // Add a ScaleShift layer builder in place
    InferenceEngine::Blob::Ptr blobWithScaleShiftBiases = make_shared_blob<float>(TensorDesc(Precision::FP32, {3}, Layout::C));
    blobWithScaleShiftBiases->allocate();
    auto *data = blobWithScaleShiftBiases->buffer().as< float *>();
    data[0] = 1;
    data[1] = 2;
    data[2] = 3;
    idx_t scaleShiftId = graph.addLayer(Builder::ScaleShiftLayer("scaleShift1").setBiases(blobWithScaleShiftBiases));
    
    // Connect ScaleShift layer in place with relu1
    graph.connect({relu1Id}, {scaleShiftId}); // Also port indexes could be defined (0 is default value) builder.connect({layerId, outPortIdx}, {scaleShiftId, inPortIdx});
    
    // Create a ReLU layer builder in place with a negative slope 0.2 using generic layer builder and connect it with scaleShift
    idx_t relu2Id = graph.addLayer({{scaleShiftId}}, Builder::Layer("ReLU", "relu2").setParameters({{"negative_slope", 0.2f}}).setOutputPorts({Port()}).setInputPorts({Port()}));
    
    // All branches in the graph should end with the Output layer. The following line creates the Output layer
    idx_t outId = graph.addLayer({{relu2Id, 0}}, Builder::OutputLayer("out"));
    
    // Build a network
    InferenceEngine::INetwork::Ptr finalNetwork = graph.build();
    std::shared_ptr<InferenceEngine::ICNNNetwork> cnnNetwork = InferenceEngine::Builder::convertToICNNNetwork(finalNetwork);
    
    // Remove the relu2 layer from the topology
    std::vector<InferenceEngine::Connection> connections = graph.getLayerConnections(relu2Id);
    for (const auto& connection : connections) {
        graph.disconnect(connection);
    }
    graph.removeLayer(relu2Id);
    
    // Connect scaleShift1 and out
    graph.connect({scaleShiftId}, {outId});
    // Build a network without relu2
    InferenceEngine::INetwork::Ptr changedNetwork = graph.build();
    

    Known Issues

    Multiple OpenMP Loadings

    If the application uses the Inference Engine with third-party components that depend on Intel® OpenMP, multiple loadings of the libiomp library may occur and cause OpenMP runtime initialization conflicts. This might happen if the application uses the Intel® Math Kernel Library (Intel® MKL) through the “Single Dynamic Library” (libmkl_rt.so) mechanism and calls Intel® MKL after loading the Inference Engine plugin.

    Error log report:

    OMP: Error #15: Initializing libiomp5.so, but found libiomp5.so already initialized.
    OMP: Hint: This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, see http://www.intel.com/software/products/support/.

    Possible workarounds:

    • Preload the OpenMP runtime using the LD_PRELOAD variable:
      This eliminates multiple loadings of libiomp, and makes all components use this specific version of OpenMP.
      LD_PRELOAD=<path_to_libiomp5.so] <path_to your_executable]
    • Set KMP_DUPLICATE_LIB_OK=TRUE. This option might result in performance degradation or incorrect results.

    Old proto Compiler Breaks protobuf Library

    With python protobuf library version 3.5.1 the following incompatibility can happen. The known case is for Cent OS 7.4.

    The error log looks as follows:

    File "../lib64/python3.5/site-packages/google/protobuf/descriptor.py", line 829, in _new_
    return _message.default_pool.AddSerializedFile(serialized_pb)
    TypeError: expected bytes, str found
    

    Possible workaround is to upgrade default protobuf compiler (libprotoc 2.5.0) to newer version, for example libprotoc 2.6.1.

    Dynamic Batching

    Refer to the Limitations section of the Dynamic Batching.

    Static Shape Inference

    Refer to the Limitations section of Using Shape Inference.

    Image Pre-Processing Performance Optimization Issue

    As described in Integrate the Inference Engine API with Your Application section, you can set an image blob of any size to an infer request using resizable input. Resize is executed during inference using configured resize algorithm.

    But currently resize algorithms are not completely optimized. So expect performance degradation if resizable input is specified and an input blob (to be resized) is set (SetBlob() is used). Required performance is met for CPU plugin only (because enabled OpenMP* provides parallelism).

    Another limitation is that currently, resize algorithms support NCHW layout only. So if you set NHWC layout for an input blob, NHWC is converted to NCHW before resize and back to NHWC after resize.

    Legal Information

    You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel products described herein. You agree to grant Intel a non-exclusive, royalty-free license to any patent claim thereafter drafted which includes subject matter disclosed herein.

    No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.

    All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest Intel product specifications and roadmaps.

    The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.

    Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn more at http://www.intel.com/ or from the OEM or retailer.

    No computer system can be absolutely secure.

    Intel, Arria, Core, Movidia, Movidius, Xeon, OpenVINO, and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.

    OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos

    *Other names and brands may be claimed as the property of others.

    Copyright © 2019 Intel Corporation. All rights reserved.

    For more complete information about compiler optimizations, see our Optimization Notice.