Inference Engine Developer Guide

Deployment Challenges

Deploying deep learning networks from the training environment to embedded platforms for inference is a complex task that introduces technical challenges, such as:

  • Several deep learning frameworks are widely used in the industry, such as Caffe*, TensorFlow*, MXNet*, among others
  • Training deep learning networks is typically performed in data centers or server farms and the inference often take place on embedded platforms that are optimized for performance and power consumption.
    These platforms are typically limited from the software perspective:
    • programming languages
    • third party dependencies
    • memory consumption
    • supported operating systems
    and the platforms are limited from the hardware perspective:
    • different data types
    • limited power envelope
    Because of these limitations, it is usually not recommended, and sometimes not possible, to use original training framework for inference. As an alternative, use dedicated inference APIs that are optimized for specific hardware platforms.

For these reasons, ensuring the accuracy of the transforms networks can be a complex task.

Deployment Workflow

The Inference Engine deployment process assumes you used the Model Optimizer to convert your trained model to an Intermediate Representation. The scheme below illustrates the typical workflow for deploying a trained deep learning model.

Intel Computer Vision Basic Workflow

A summary of the steps for optimizing and deploying a trained model:

  1. Configure the Model Optimizer for your framework.
  2. Convert a trained model to produce an optimized Intermediate Representation (IR) of the model based on the trained network topology, weights, and biases values.
  3. Test the model in the Intermediate Representation format using the Inference Engine in the target environment by the Validation application or the sample applications.
  4. Integrate the Inference Engine in your application to deploy the model in the target environment.

Introduction to the Inference Engine

After you have used the Model Optimizer to create an Intermediate Representation, use the Inference Engine to infer input data.

The Inference Engine is a C++ library with a set of C++ classes to infer input data (images) and get a result. The C++ library provides an API to read the Intermediate Representation, set the input and output formats, and execute the model on devices.

NOTES:

  • This section talks about API information. For more information about APIs, see the offline documentation that was included in your package. To locate the current API:
    1. Go to <INSTALL_DIR>/deployment_tools/documentation/ where <INSTALL_DIR> is the directory in which the OpenVINO toolkit is installed.
    2. Open index.html in an Internet browser.
    3. Select Integrating Inference Engine in Your Application (legacy API) from the contents.
  • This document refers to APIs from previous releases as "legacy" API. It is best to stop using the legacy API since it will be removed in a future product release. To locate the legacy API:
    1. Go to <INSTALL_DIR>/deployment_tools/documentation/ under the directory in which the OpenVINO toolkit is installed.
    2. Open index.html in an Internet browser.
    3. Select Integrating Inference Engine in Your Application (legacy API) from the contents.
  • Complete API documentation is also in the full offline package documentation.
    1. Go to <INSTALL_DIR>/deployment_tools/documentation/ under the directory in which the OpenVINO toolkit is installed.
    2. Open index.html in an Internet browser.
    3. Select Open Data Structures from the menu at the top of the screen.

Modules in the Inference Engine Package

Your application must link to the core Inference Engine library and to the C++ header files in the include directory.

The library contains the classes for:

  • Linux: libinference_engine.so
  • Windows: inference_engine.dll

Using Plugins, Depending on the Target

Each supported target device has a plugin. The Heterogeneous plugin lets you distribute a calculation workload across devices. Each plugin is a DLL/shared library. Make sure those libraries are in your computer's path or in the place you pointed to in the plugin loader. Make sure each plugin's related dependencies are in the:

  • Linux: LD_LIBRARY_PATH
  • Windows: PATH

On Linux, use the script bin/setupvars.sh to set the environment variables.

The table below shows the relationship between libraries and targets.

TargetLinux Library NameLinux Dependency LibrariesWindows Library NameWindows Dependency Libraries
CPUlibMKLDNNPlugin.solibmklml_tiny.so, libiomp5md.soMKLDNNPlugin.dllmklml_tiny.dll, libiomp5md.dll
Intel® Integrated GraphicslibclDNNPlugin.solibclDNN64.soclDNNPlugin.dllclDNN64.dll
FPGAlibdliaPlugin.solibdla.soNot supportedNot supported
Intel® Movidius™ Myriad™ 2 Vision Processing Unit (VPU)libmyriadPlugin.soNo dependenciesNot supportedNot supported
HeterogeneouslibHeteroPlugin.soSame as selected pluginsHeteroPlugin.dllSame as selected plugins

When using the Heterogeneous plugin, use the literal strings in the Target column in the getPluginByDevice method. For more information, see the getPluginByDevice API.

Common Workflow for Using the Inference Engine API

  1. Read the Intermediate Representation - Using the InferenceEngine::CNNNetReader class, read an Intermediate Representation file into a CNNNetwork class. This class represents the network in host memory.
  2. Prepare inputs and outputs format - After loading the network, specify input and output precision, and the layout on the network. For these specification, use the CNNNetwork::getInputInfo() and CNNNetwork::getOutputInfo()
  3. Select Plugin - Select the plugin on which to load your network. Create the plugin with the InferenceEngine::PluginDispatcher load helper class. Pass per device loading configurations specific to this device, and register extensions to this device.
  4. Compile and Load - Use the plugin interface wrapper class InferenceEngine::InferencePlugin to call the LoadNetwork() API to compile and load the network on the device. Pass in the per-target load configuration for this compilation and load operation.
  5. Set input data - With the network loaded, you have an ExecutableNetwork object. Use this object to create an InferRequest in which you signal the input buffers to use for input and output. Specify a device-allocated memory and copy it into the device memory directly, or tell the device to use your application memory to save a copy.
  6. Execute - With the input and output memory now defined, choose your execution mode:
    • Synchronously - Infer() method. Blocks until inference finishes.
    • Asynchronously - StartAsync() method. Check status with the wait() method (0 timeout), wait, or specify a completion callback.
  7. Get the output - After inference is completed, get the output memory or read the memory you provided earlier. Do this with the InferRequest GetBlob API.

For more information about integrating the Inference Engine in your your application, see How to integrate the Inference Engine in your application.

Using Inference Engine Samples

The Inference Engine sample applications are simple console applications that demonstrate how to use Intel's Deep Learning Inference Engine in your applications.

Samples in the Samples Directory

The following sample applications are available in the samples directory in the Inference Engine installation directory:

Sample Description
CPU ExtensionsLibrary with topology-specific layers, like DetectionOutput, used in the SSD
Image Classification SampleInference of image classification networks like AlexNet* and GoogLeNet*. This sample supports only images as inputs.
Image Classification Sample, pipelinedMaximize performance via pipelined execution, the sample supports only images as inputs
Security Barrier Camera SampleVehicle Detection followed by the Vehicle Attributes
Object Detection for Faster R-CNN SampleInference of object detection networks like Faster R-CNN. This sample supports only images as inputs.
Image Segmentation SampleInference of image segmentation networks like FCN8. This sample supports only images as inputs.
Object Detection for SSD Demonstration, Async API Performance ShowcaseDemonstration application for SSD-based Object Detection networks, new Async API performance showcase, and simple OpenCV interoperability. This sample supports video and camera inputs.
Object Detection for SSD SampleInference of object detection networks based on the SSD. This sample is a simplified version of Object Detection for SSD Demonstration. This sample supports only images as inputs
Automatic Speech Recognition SampleAcoustic model inference based on Kaldi neural networks and speech feature vectors.
Neural Style Transfer SampleStyle Transfer sample. This sample supports only images as inputs.
Hello Infer Request Classification SampleInference of image classification networks via Infer Request API. This sample supports only images as inputs.
Interactive Face Detection SampleFace Detection coupled with Age-Gender and Head-Pose. This sample supports both video and camera inputs.
Validation ApplicationInfers a pack of images, resulting in total accuracy. This sample supports only images as inputs.
Crossroad Camera SamplePerson Detection followed by the Person Attributes Recognition and Person Reidentification Retail. This sample supports images, videos, and camera inputs.

Samples That Support Pre-Trained Models Shipped With the Product

You are provided several pre-trained models. The table below shows the correlation between models and samples/plugins. The correlation between the plugins and supported devices see in the Supported Devices section. The samples are available in <INSTALL_DIR>/deployment_tools/inference_engine/samples.

ModelSample Supported on the ModelCPUGPUHETERO:FPGA,CPUMYRIAD
face-detection-adas-0001Interactive Face Detection SampleSupportedSupported Supported
age-gender-recognition-retail-0013Interactive Face Detection SampleSupportedSupportedSupportedSupported
head-pose-estimation-adas-0001Interactive Face Detection SampleSupportedSupportedSupportedSupported
vehicle-license-plate-detection-barrier-0007Security Barrier Camera SampleSupportedSupportedSupportedSupported
vehicle-attributes-recognition-barrier-0039Security Barrier Camera SampleSupportedSupportedSupportedSupported
license-plate-recognition-barrier-0001Security Barrier Camera SampleSupportedSupportedSupportedSupported
person-detection-retail-0001Object Detection SampleSupportedSupportedSupported 
person-detection-retail-0013Any sample that supports SSD-based modelsSupportedSupported Supported
face-detection-retail-0004Any sample that supports SSD-based modelsSupportedSupportedSupportedSupported
person-vehicle-bike-detection-crossroad-0078Crossroad Camera SampleSupportedSupported Supported
person-attributes-recognition-crossroad-0031Crossroad Camera SampleSupportedSupported  
person-reidentification-retail-0079Crossroad Camera SampleSupportedSupported Supported
person-reidentification-retail-0076Crossroad Camera SampleSupportedSupported Supported
face-person-detection-retail-0002Any sample that supports SSD-based modelsSupportedSupported Supported
pedestrian-detection-adas-0002Any sample that supports SSD-based modelsSupportedSupported Supported
vehicle-detection-adas-0002Any sample that supports SSD-based modelsSupportedSupported Supported
pedestrian-and-vehicle-detector-adas-0001Any sample that supports SSD-based modelsSupportedSupported Supported
emotions-recognition-retail-0003Interactive Face Detection SampleSupportedSupportedSupported 
road-segmentation-adas-0001Image Segmentation SampleSupportedSupported  
semantic-segmentation-adas-0001Image Segmentation SampleSupportedSupported  

Inferring Your Model with the Inference Engine Samples

Set Your Environment Variables  

Use these steps to make sure your application can find the Interface Engine libraries.

For Linux, execute the following command to set the environment variable:

source <INSTALL_DIR>/bin/setupvars.sh

where <INSTALL_DIR> is the OpenVINO toolkit installation directory.

NOTE: The OpenVINO™ environment variables are removed when you close the shell. Permanently setting the environment variables is optional and outside the scope of this document.

Building the Sample Applications on Linux

Supported Linux build environment:

  • Ubuntu* 16.04 LTS 64-bit or CentOS* 7.4 64-bit
  • GCC* 5.4.0 (for Ubuntu* 16.04) or GCC* 4.8.5 (for CentOS* 7.4)
  • CMake* version 2.8 or higher.
  • OpenCV* 3.3 or later is required for some samples and demonstrations. Use the OpenVINO toolkit installation download and instructions to complete this installation.

Use these steps to prepare your Linux computer for the samples:

NOTE: If you have installed the product as a root user, switch to root mode before you continue: sudo -i

NOTE: Make sure you have set environment variables before building the samples.

  1. Navigate to a directory that you have write access to and create a samples build directory. This example uses a directory named build:
    mkdir build

    NOTE: If you ran the Image Classification demo script, the samples build directory was already created: <INSTALL_DIR>/deployment_tools/inference_engine/samples/build/ .

  2. Go to the new directory:
    cd build
  3. Run CMake to generate the Make files with or without debug information:
    • Without debug information:
      cmake -DCMAKE_BUILD_TYPE=Release <INSTALL_DIR>/deployment_tools/inference_engine/samples/
    • With debug information:
      cmake -DCMAKE_BUILD_TYPE=Debug <INSTALL_DIR>/deployment_tools/inference_engine/samples/
  4. Build the application:
    make

The sample application binaries are in <path_to_build_directory>/intel64/Release/.

Building the Sample Applications on Windows*

Supported Windows build environment:

  • Microsoft Windows* 10
  • Microsoft Visual Studio* 2015. As an option, you can install the free Visual Studio 2015 Community
  • CMake* 2.8 or later
  • OpenCV* 3.4 or later. Use the OpenVINO toolkit installation download and instructions to complete this installation.
  • Intel C++ Compiler 2017 Redistributable package for Windows

Follow these steps to prepare your Windows computer for the samples:

  1. Go to the <INSTALL_DIR>\deployment_tools\inference_engine\samples\ directory.
  2. Double-click create_msvc_solution.bat
  3. Open Microsoft Visual Studio* 2015
  4. Build <INSTALL_DIR>\deployment_tools\inference_engine\samples\build\Samples.sln

Running the Samples

Image Classification Sample

Description

The Image Classification sample application does inference using image classification networks like AlexNet* and GoogLeNet*. The sample application reads command line parameters and loads a network and an image to the Inference Engine plugin. When inference is done, the application creates an output image and outputs data to the standard output stream.

Running the Application

Running the application with the -h option results in the message:

$ ./classification_sample -h
InferenceEngine: 
    API version ............ <version>
    Build .................. <number>
classification_sample [OPTION]
Options:
    -h                      
                            Print a usage message.
    -i "<path1>" "<path3>"
                            Required. Path to a directory with images or path to an image files: a .ubyte file for LeNet*
                            and a .bmp file for the other networks.
    -m "<path>"             
                            Required. Path to an .xml file with a trained model.
        -l "<absolute_path>"
                            Optional. Absolute path to library with MKL-DNN (CPU) custom layers (*.so).
        Or
        -c "<absolute_path>"
                            Optional. Absolute path to GPU custom layers config (*.xml).
    -pp "<path>"            
                            Path to a plugin directory.
    -d "<device>"           
                            Specify the target device to infer on: CPU, GPU, or MYRIAD. Sample will look for a suitable plugin for device specified
    -nt "<integer>"         
                            Number of top results (default 10)
    -ni "<integer>"         
                            Number of iterations (default 1)
    -pc                     
                            Enables per-layer performance report

Running the application with an empty list of options results in an error message and the usage list above.

To do inference on an image using a trained AlexNet network on Intel® processors:

./classification_sample -i <path_to_image>/cat.bmp -m <path_to_model>/alexnet_fp32.xml

Output Description

By default the application outputs the top-10 inference results. Add the -nt option to the previous command to modify the number of top output results. For example, to get the top-5 results on Intel® HD Graphics, use the command:

./classification_sample -i <path_to_image>/cat.bmp -m <path_to_model>/alexnet_fp32.xml

Image Classification Sample Async

Description

This sample demonstrates how to build and execute inference in pipelined mode on an example of classifications networks.

The pipelined mode might increase the throughput of the pictures. The latency of one inference is the same as for syncronous execution. The throughput is increased due to follow reasons:

  • Some plugins have heterogenity inside themselves. Transferring of data, execution on remote device, pre-processing and post-processing on the host
  • Using of explicit heterogenious plugin with execution of different parts of network on differnet devices

When two or more devices are involved in the inference process of one picture, creating several infer requests and starting asynchronous inference provides the most efficient way to utilize devices. If two devices are involved in execution, the number 2 is the optimal value for the -nireq option. To be effecient, the Classification Sample Async uses a round-robin algorithm for infer requests. the sample starts execution for the current infer request and switches to waiting for the results of the previous inference. After the wait time completes, the machine switches infer requests and repeats the procedure.

The number if iterations is a aspect for good throughput. With a large number of iterations you can emulate the real application work and see performance.

Batch mode is an independent attribute on the pipelined mode. The pipelined mode works efficiently with any batch size.

Upon the start-up the sample application reads the command line parameters and loads a network and an image to the Inference Engine plugin. Then, the application creates several infer requests pointed in -nireq parameter and loads pictures for inference.

Then, in the loop it starts inference for the current infer request and switch for waiting of another one. When the results are ready, infer requests are swapped.

When inference is done, the application outputs data to the standard output stream.

Running the Application

Running the application with the -h option results in the message:

./classification_sample_async -h
InferenceEngine:
    API version ............ <version>
    Build .................. 
classification_sample_async [OPTION]
Options:
    -h                      
                            Print a usage message.
    -i "<path1>" "<path2>"
                            Required. Path to a folder with images or path to an image files: a .ubyte file for LeNet
                            and a .bmp file for the other networks.
    -m "<path>"             
                            Required. Path to an .xml file with a trained model.
        -l "<absolute_path>"
                            Optional. Absolute path to library with MKL-DNN (CPU) custom layers (*.so).
        Or
        -c "<absolute_path>"
                            Optional. Absolute path to clDNN (GPU) custom layers config (*.xml).
    -pp "<path>"            
                            Path to a plugin folder.
    -d "<device>"           
                            Specify the target device to infer on; CPU, GPU, FPGA or MYRIAD is acceptable. Sample will look for a suitable plugin for device specified
    -nt "<integer>"         
                            Number of top results (default 10)
    -ni "<integer>"         
                            Number of iterations (default 1)
    -pc                     
                            Enables per-layer performance report
    -nireq "<integer>"
                            Number of infer request for pipelined mode (default 1)

Output Description

By default the application outputs top-10 inference results for each infer request. In addition to this information it will provide throughput value measured in frames per seconds.


Security Barrier Camera Sample

Description

Showcases Vehicle Detection, followed by Vehicle Attributes and License Plate Recognition applied on top of Vehicle Detection. The results are in the <INSTALL_DIR>/deployment_tools/intel_models/intel_models directory:

  • vehicle-license-plate-detection-barrier-0007: The primary detection network to find the vehicles and licence-plate
  • vehicle-attributes-recognition-barrier-0010: Executed on top of the results from vehicle-license-plate-detection-barrier-0007. The vehicle attributes execution barrier reports the general vehicle attributes, like the vehicle type and color, where type is something like car, van, or bus.
  • license-plate-recognition-barrier-0001: Executed on top of the results from vehicle-license-plate-detection-barrier-0007. The license plate recognition barrier network reports a string for each recognized license plate. For topology details, see the descriptions in the <INSTALL_DIR>/deployment_tools/intel_models/ directory.

Other demonstration objectives:

  • Show images/video/camera as inputs, via OpenCV*
  • Show an example of simple network pipelining: Attributes and LPR networks are executed on top of the Vehicle Detection results
  • Show vehicle attributes and licence plate information for each detected vehicle

How it Works

The application reads command line parameters and loads the specified networks. The Vehicle/License-Plate Detection network is required, and the other two are optional.

Upon getting a frame from the OpenCV's VideoCapture the app performs inference of Vehicles/License-Plates, then performs another two inferences using Vehicle Attributes and LPR detection networks (if those specified in command line) and displays the results.

Running the Application

Running the application with the -h option results in the message:

$ ./security_barrier_sample -h 
InferenceEngine:
        API version ............ 1.0
    [ INFO ] Parsing input parameters
    interactive_vehicle_detection [OPTION]
    Options:
        -h                         Print a usage message.
        -i "<path>"                Required. Path to a video or image file. Default value is "cam" to work with camera.
        -m "<path>"                Required. Path to the Vehicle/License-Plate Detection model (.xml) file.
        -m_va "<path>"             Optional. Path to the Vehicle Attributes model (.xml) file.
        -m_lpr "<path>"            Optional. Path to the License-Plate Recognition model (.xml) file.
          -l "<absolute_path>"     For Intel® MKL-DNN (CPU)-targeted custom layers, if any. Absolute path to a shared library with the kernels impl.
              Or
          -c "<absolute_path>"     For GPU-targeted custom kernels, if any. Absolute path to the xml file with the kernels desc.
        -d "<device>"              Specify the target device for Vehicle Detection (CPU, GPU, FPGA, MYRIAD, or HETERO).
        -d_va "<device>"           Specify the target device for Vehicle Attributes (CPU, GPU, FPGA, MYRIAD, or HETERO).
        -d_lpr "<device>"          Specify the target device for License Plate Recognition (CPU, GPU, FPGA, MYRIAD, or HETERO).
        -pc                        Enables per-layer performance statistics.
        -r                         Output Inference results as raw values.
        -t                         Probability threshold for Vehicle/Licence-Plate detections.

Running the application with an empty list of options results in an error message and the usage list above.

Demonstration Output

The demonstration uses OpenCV* to display the resulting frame with detections rendered as bounding boxes and text:

License plate detection


Object Detection for Faster R-CNN Sample

Description

VGG16-Faster-RCNN is a public CNN that can be easily obtained from GitHub. 

The sample application reads command line parameters and loads a network and an image to the Inference Engine plugin. When inference is done, the application creates an output image and outputs data to the standard output stream.

Downloading and Converting a Caffe* Model

  1. Download test.prototxt from https://raw.githubusercontent.com/rbgirshick/py-faster-rcnn/master/models/pascal_voc/VGG16/faster_rcnn_end2end/test.prototxt
  2. Download the pretrained models from https://dl.dropboxusercontent.com/s/o6ii098bu51d139/faster_rcnn_models.tgz?dl=0
  3. Unzip the archive and make sure you have the file named VGG16_faster_rcnn_final.Caffe*model.

For correctly converting the source model, run the Model Optimizer with the extension for the Python proposal layer. To convert the source model:

python3 ${MO_ROOT_PATH}/mo_Caffe*.py --input_model <path_to_model>/VGG16_faster_rcnn_final.Caffe*model --input_proto <path_to_model>/deploy.prototxt --extensions <path_to_object_detection_sample>/fasterrcnn_extensions

Running the Application

Running the application with the -h option results in the message:

$ ./object_detection_sample -h
InferenceEngine: 
    API version ............ <version>
    Build .................. <number>
object_detection_sample [OPTION]
Options:
    -h                      
                            Print a usage message.
    -i "<path>"
                            Required. Path to an image file.
    -m "<path>"             
                            Required. Path to an .xml file with a trained model.
        -l "<absolute_path>"    
                            Optional. Absolute path to library with MKL-DNN (CPU) custom layers (*.so).
        Or
        -c "<absolute_path>"
                            Optional. Absolute path to GPU custom layers config (*.xml).
    -pp "<path>"            
                            Path to a plugin directory.
    -d "<device>"           
                            Specify the target device to infer on; CPU or GPU is acceptable. The sample looks for a suitable plugin for the device specified
    -ni "<integer>"         
                            Number of iterations (default 1)
    -pc                     
                            Enables per-layer performance report

Running the application with an empty list of options results in an error message and the usage list above.

Use the following command to do inference on Intel® Processors on an image using a trained Faster R-CNN network:

$ ./object_detection_sample -i <path_to_image>/inputImage.bmp -m <path_to_model>/faster-rcnn.xml -d CPU

Output Description

The application outputs an image named out_0.bmp with detected objects enclosed in rectangles. It outputs the list of classes of the detected objects along with the respective confidence values and the coordinates of the rectangles to the standard output stream.

Using this Sample with the Intel Person Detection Model

This model has a non-default (for Faster-RCNN) output layer name. To score it correctly, add the option --bbox_name detector/bbox/ave_pred to the command line.

Usage example:

./object_detection_sample -i <path_to_image>/people.jpg -m /<INSTALL_DIR>/deployment_tools/intel_models/person-detection-retail-0001/FP32/person-detection-retail-0001.xml --bbox_name detector/bbox/ave_pred -d CPU

Object Detection SSD, Async API Performance Showcase Sample

Description

This demonstration showcases Object Detection with SSD and new Async API. Async API usage can improve overall frame-rate of the application, because rather than wait for inference to complete, the app can continue doing things on the host, while accelerator is busy. Specifically, this demonstration keeps two parallel infer requests and while the current is processed, the input frame for the next is being captured. This essentially hides the latency of capturing, so that the overall framerate is rather determined by the MAXIMUM(detection time, input capturing time) and not the SUM(detection time, input capturing time).

The technique can be generalized to any available parallel slack, such as doing inference while simultaneously encoding the resulting (previous) frames, or running further inference, like emotion detection on top of the face detection results.

Be aware of performance caveats though. When running tasks in parallel, avoid over-using shared compute resources. For example, if performing inference on the FPGA with a mostly idle CPU, perform parallel tasks on the CPU. When doing inference on Intel® Integrated Graphics device, you have little gain in tasks like having resulting video encoding on the same GPU in parallel because the device is already busy.

For more performance implications and tips for the Async API, see the Optimization Guide

Other demonstration objectives:

  • Video as input support via OpenCV*
  • Visualization of the resulting bounding boxes and text labels (from the .labels file) or class number (if no file is provided)
  • OpenCV* provides resulting bounding boxes, labels, and other information. You can copy and paste this code without pulling Inference Engine samples helpers into your application.
  • Demonstrate the Async API in action. For this, the demonstration features two modes with a Tab key toggle.
    • Old-style "Sync" way - The frame capturing with OpenCV* executes back-to-back with Detection
    • "Truly Async" way - The Detection is performed on the current frame, while the OpenCV* captures the next frame.

How it Works 

The application reads command line parameters and loads a network to the Inference Engine. Upon getting a frame from the OpenCV*'s VideoCapture it performs inference and displays the results.

New "Async API" operates with new notion of the "Infer Request" that encapsulates the inputs/outputs and separates scheduling and waiting for result, next section. And here what makes the performance look different:

  1. In the default ("Sync") mode the frame is captured and then immediately processed, below in pseudo-code:
    while(true) {
        capture frame
        populate CURRENT InferRequest
        start CURRENT InferRequest //this call is async and returns immediately
        wait for the CURRENT InferRequest
        display CURRENT result
    }
    This is a reference implementation in which the new Async API is used in a serialized/synch fashion.
  2. In "true" ASync mode, the frame is captured and then immediately processed:
    while(true) {
            capture frame
            populate NEXT InferRequest
            start NEXT InferRequest //this call is async and returns immediately
                wait for the CURRENT InferRequest (processed in a dedicated thread)
                display CURRENT result
            swap CURRENT and NEXT InferRequests
        }
    In this case, the NEXT request is populated in the main (app) thread, while the CURRENT request is processed. This is handled in the dedicated thread, internal to the Inference Engine runtime.

Async API

In this release, the Inference Engine offers a new API based on the notion of Infer Requests. With this API, requests encapsulate input and output allocation. You access the blob with the GetBlob method.

You can execute a request asynchronously in the background and wait until you need the result. In the meantime your application can continue:

// load plugin for the device as usual
  auto enginePtr = PluginDispatcher({"../../../lib/intel64", ""}).getSuitablePlugin(
                getDeviceFromStr("GPU"));
// load network
CNNNetReader network_reader;
network_reader.ReadNetwork("Model.xml");
network_reader.ReadWeights("Model.bin");
// populate inputs etc
auto input = async_infer_request.GetBlob(input_name);
...
// start the async infer request (puts the request to the queue and immediately returns)
async_infer_request->StartAsync();
// Continue execution on the host until you need the request results
//...
async_infer_request.Wait(IInferRequest::WaitMode::RESULT_READY);
auto output = async_infer_request.GetBlob(output_name);

You have no direct way to measure execution time of the infer request that is running asynchronously, unless you measure the Wait executed immediately after the StartAsync. But this essentially would mean the serialization and synchronous execution.

This is what sample does for the default "SYNC" mode and reports as a Detection time/fps message on the screen. In the truly asynchronous ("ASYNC") mode the host continues execution in the master thread, in parallel to the infer request. If the request is completed before than the Wait is called in the main thread (i.e. earlier than OpenCV* decoded a new frame), that reporting the time between StartAsync and Wait would obviously incorrect. That is why in the "ASYNC" mode the inference speed is not reported.

Running the Application

Running the application with the -h option results in the message:

$ ./object_detection_demo_ssd_async -h
InferenceEngine: 
    API version ............ <version>
    Build .................. <number>
object_detection_demo_ssd_async [OPTION]
Options:
    -h                      
                            Print a usage message.
    -i "<path>"
                            Required. Path to an video file. Use "cam" to capture input from the camera).
    -m "<path>"             
                            Required. Path to an .xml file with a trained model.
        -l "<absolute_path>"    
                            Optional. Absolute path to library with Intel® MKL-DNN (CPU) custom layers (*.so).
        Or
        -c "<absolute_path>"
                            Optional. Absolute path to GPU custom layers config (*.xml).
    -d "<device>"
                            Specify the target device to infer on; CPU, GPU, FPGA, and Intel® Movidius™ Myriad™ 2 Vision Processing Unit are accepted.
    -pc
                            Enables per-layer performance report.
    -t
                            Probability threshold for detections (default is 0.5).
    -r
                            Output inference results as raw values to the console.

Running the application with an empty list of options results in an error message and the usage list above.

Use the following command to do inference on Intel® Integrated Graphics with an example pre-trained GoogleNet based SSD* available at https://software.intel.com/file/609199/download

Command Description

After reading through this demonstration, use this command to perform inference on a GPU with the SSD you download from https://software.intel.com/file/609199/download

$ ./object_detection_demo_ssd_async -i <path_to_video>/inputVideo.mp4 -m <path_to_model>/ssd.xml -d GPU

The network must be converted from the Caffe* (*.prototxt + *.model) to the Inference Engine format (*.xml + *bin) before using this command. See the Model Optimizer Developer Guide.

The only GUI knob is using 'Tab' to switch between the synchronized execution and the true Async mode.

Output Description

The output uses OpenCV* to display the resulting frame with detections rendered as bounding boxes and labels, if provided. In default mode, the sample reports:

  • OpenCV* time: Frame decoding + time to render the bounding boxes, labels, and display of the results.
  • Detection time: Inference time for the objection network. This is reported in SYNC mode.
  • Wallclock time: The combined application-level performance.

Object Detection with SSD-VGG Sample

Description

How to run the Object Detection sample application, which does inference using object detection networks like SSD-VGG on Intel® Processors and Intel® HD Graphics.

The sample application reads command line parameters and loads a network and an image to the Inference Engine plugin. When inference is done, the application creates an output image and outputs data to the standard output stream.

Running the Application

Running the application with the -h option results in the message:

$./object_detection_sample_ssd -h
InferenceEngine: 
    API version ............ <version>
    Build .................. <number>
object_detection_sample_ssd [OPTION]
Options:
    -h                      
                            Print a usage message.
    -i "<path>"
                            Required. Path to an image file.
    -m "<path>"             
                            Required. Path to an .xml file with a trained model.
        -l "<absolute_path>"    
                            Optional. Absolute path to library with MKL-DNN (CPU) custom layers (*.so).
        Or
        -c "<absolute_path>"
                            Optional. Absolute path to GPU custom layers config (*.xml).
    -pp "<path>"            
                            Path to a plugin directory.
    -d "<device>"           
                            Specify the target device to infer on; CPU, GPU or MYRIAD is acceptable. The sample looks for a suitable plugin for the specified device.
    -ni "<integer>"         
                            Number of iterations (default 1)
    -pc                     
                            Enables per-layer performance report

Running the application with an empty list of options results in an error message and the usage list above.

Use the following command to do inference on Intel® Processors on an image using a trained SSD network:

$ ./object_detection_sample_ssd -i <path_to_image>/inputImage.bmp -m <path_to_model>/VGG_ILSVRC2016_SSD.xml -d CPU

Output Description

The application outputs an image named out_0.bmp with detected objects enclosed in rectangles. It outputs the list of classes of the detected objects along with the respective confidence values and the coordinates of the rectangles to the standard output stream.


Automatic Speech Recognition Sample

This topic shows how to run the speech sample application, which demonstrates acoustic model inference based on Kaldi neural networks and speech feature vectors.

Running

Usage

Running the application with the -h option yields the following usage message:

$ ./speech_sample -h
InferenceEngine: 
    API version ............ <version>
    Build .................. <number>
speech_sample [OPTION]
Options:
    -h                      Print a usage message.
    -i "<path>"             Required. Path to an .ark file.
    -m "<path>"             Required. Path to an .xml file with a trained model (required if -rg is missing).
    -o "<path>"             Output file name (default name is scores.ark).
    -l "<absolute_path>"    Required for MKLDNN (CPU)-targeted custom layers.Absolute path to a shared library with the kernels impl.
    -d "<device>"           Specify the target device to infer on; CPU, GPU, GNA_AUTO, GNA_HW, GNA_SW, GNA_SW_EXACT are acceptable. Sample will look for a suitable plugin for device specified
    -p                      Plugin name. For example MKLDNNPlugin. If this parameter is pointed, the sample will look for this plugin only
    -pp                     Path to a plugin folder.
    -pc                     Enables performance report
    -q "<mode>"             Input quantization mode:  static (default), dynamic, or user (use with -sf).
    -qb "<integer>"         Weight bits for quantization:  8 or 16 (default)
    -sf "<double>"          Optional user-specified input scale factor for quantization (use with -q user).
    -bs "<integer>"         Batch size 1-8 (default 1)
    -r "<path>"             Read reference score .ark file and compare scores.
    -rg "<path>"            Read GNA model from file using path/filename provided (required if -m is missing).
    -wg "<path>"            Write GNA model to file using path/filename provided.
    -we "<path>"            Write GNA embedded model to file using path/filename provided.

Running the application with the empty list of options yields the usage message given above and an error message.

Model Preparation

You can use the following Model Optimizer command to convert a Kaldi nnet1 or nnet2 neural network to Intel IR format:

$ python3 mo.py --framework kaldi --input_model wsj_dnn5b_smbr.nnet --counts wsj_dnn5b_smbr.counts --remove_output_softmax

Assuming that the Model Optimizer (mo.py), Kaldi-trained neural network (wsj_dnn5b_smbr.nnet), and Kaldi class counts file (wsj_dnn5b_smbr.counts) are in the working directory, this command produces the Intel IR network consisting of wsj_dnn5b_smbr.xml and wsj_dnn5b_smbr.bin.

NOTE: wsj_dnn5b_smbr.nnet and other sample Kaldi models and data will be available in July 2018 in the OpenVINO Open Model Zoo.

Speech Inference

Once the IR is created, you can use the following command to do inference on Intel® Processors with the GNA co-processor (or emulation library):

$ ./speech_sample -d GNA_AUTO -bs 2 -i wsj_dnn5b_smbr_dev93_10.ark -m wsj_dnn5b_smbr_fp32.xml -o scores.ark -r wsj_dnn5b_smbr_dev93_scores_10.ark

Here, the floating point Kaldi-generated reference neural network scores (wsj_dnn5b_smbr_dev93_scores_10.ark) corresponding to the input feature file (wsj_dnn5b_smbr_dev93_10.ark) are assumed to be available for comparison.

Sample Output

The acoustic log likelihood sequences for all utterances are stored in the Kaldi ARK file, scores.ark. If the -r option is used, a report on the statistical score error is generated for each utterance such as the following:

Utterance 0: 4k0c0301
   Average inference time per frame: 6.26867 ms
         max error: 0.0667191
         avg error: 0.00473641
     avg rms error: 0.00602212
       stdev error: 0.00393488

How It Works

Upon the start-up, the speech_sample application reads command line parameters and loads a Kaldi-trained neural network along with Kaldi ARK speech feature vector file to the Inference Engine plugin. It then performs inference on all speech utterances stored in the input ARK file. Context-windowed speech frames are processed in batches of 1-8 frames according to the -bs parameter. Batching across utterances is not supported by this sample. When inference is done, the application creates an output ARK file. If the -r option is given, error statistics are provided for each speech utterance as shown above.

GNA-Specific Details

Quantization

If the GNA device is selected (for example, using the -d GNA_AUTO flag), the GNA Inference Engine plugin quantizes the model and input feature vector sequence to integer representation before performing inference. Several parameters control neural network quantization:

  • The -q flag determines the quantization mode. Three modes are supported:
    • Static - In static quantization mode, the first utterance in the input ARK file is scanned for dynamic range. The scale factor (floating point scalar multiplier) required to scale the maximum input value of the first utterance to 16384 (15 bits) is used for all subsequent inputs. The neural network is quantized to accommodate the scaled input dynamic range.
    • Dynamic - In dynamic quantization mode, the scale factor for each input batch is computed just before inference on that batch. The input and network are (re)quantized on-the-fly using an efficient procedure.
    • User-defined - In user-defined quantization mode, the user may specify a scale factor via the -sf flag that will be used for static quantization.
  • The -qb flag provides a hint to the GNA plugin regarding the preferred target weight resolution for all layers. For example, when -qb 8 is specified, the plugin will use 8-bit weights wherever possible in the network. Note that it is not always possible to use 8-bit weights due to GNA hardware limitations. For example, convolutional layers always use 16-bit weights (GNA harware verison 1 and 2). This limitation will be removed in GNA hardware version 3 and higher.

Execution Modes

Several execution modes are supported via the -d flag:

  • If the device is set to CPU and the GNA plugin is selected, the GNA device is emulated in fast-but-not-bit-exact mode.
  • If the device is set to GNA_AUTO, the GNA hardware is used if available and the driver is installed. Otherwise, the GNA device is emulated in fast-but-not-bit-exact mode.
  • If the device is set to GNA_HW, the GNA hardware is used if available and the driver is installed. Otherwise, an error will occur.
  • If the device is set to GNA_SW, the GNA device is emulated in fast-but-not-bit-exact mode.
  • If the device is set to GNA_SW_EXACT, the GNA device is emulated in bit-exact mode.

Loading and Saving Models

The GNA plugin supports loading and saving the GNA-optimized model (non-IR) via the -rg and -wg flags. Thereby, it is possible to avoid the cost of full model quantization at run time. The GNA plugin also supports export of firmware-compatible embedded model images for the Intel® Speech Enabling Developer Kit and Amazon Alexa Premium Far-Field Voice Development Kit via the -we flag (save only).

In addition to performing inference directly from a GNA model file, these options make it possible to:

  • Convert from IR format to GNA format model file (-m, -wg)
  • Convert from IR format to embedded format model file (-m, -we)
  • Convert from GNA format to embedded format model file (-rg, -we)

Use of Sample in Kaldi Speech Recognition Pipeline

The Wall Street Journal DNN model used in this example was prepared using the Kaldi s5 recipe and the Kaldi Nnet (nnet1) framework. It is possible to recognize speech by substituting the speech_sample for Kaldi's nnet-forward command. Since the speech_sample does not yet use pipes, it is necessary to use temporary files for speaker- transformed feature vectors and scores when running the Kaldi speech recognition pipeline. The following operations assume that feature extraction was already performed according to the s5 recipe and that the working directory within the Kaldi source tree is egs/wsj/s5.

  1. Prepare a speaker-transformed feature set given the feature transform specified in final.feature_transform and the feature files specified in feats.scp:
    nnet-forward --use-gpu=no final.feature_transform "ark,s,cs:copy-feats scp:feats.scp ark:- |" ark:feat.ark
  2. Score the feature set using the speech_sample:
    ./speech_sample -d GNA_AUTO -bs 8 -i feat.ark -m wsj_dnn5b_smbr_fp32.xml -o scores.ark
  3. Run the Kaldi decoder to produce n-best text hypotheses and select most likely text given the WFST (HCLG.fst), vocabulary (words.txt), and TID/PID mapping (final.mdl):
    latgen-faster-mapped --max-active=7000 --max-mem=50000000 --beam=13.0 --lattice-beam=6.0 --acoustic-scale=0.0833 --allow-partial=true --word-symbol-table=words.txt final.mdl HCLG.fst ark:scores.ark ark:-| lattice-scale --inv-acoustic-scale=13 ark:- ark:- | lattice-best-path --word-symbol-table=words.txt ark:- ark,t:-  > out.txt &
  4. Run the word error rate tool to check accuracy given the vocabulary (words.txt) and reference transcript (test_filt.txt):
    cat out.txt | utils/int2sym.pl -f 2- words.txt | sed s:<UNK>::g | compute-wer --text --mode=present ark:test_filt.txt ark,p:-

Neural Style Transfer Sample

Description

How to build and run the Neural Style Transfer sample (NST sample) application, which does inference using models of style transfer topology.

Running the Application

Running the application with the -h option results in the message:

$ ./style_transfer_sample --h
InferenceEngine:
    API version ............ <version>
    Build .................. <number>
style_transfer_sample [OPTION]
Options:
    -h
                            Print a usage message.
    -i "<path1>" "<path3>"
                            Required. Path to a directory with images or path to an image files: a .ubyte file for LeNet
                            and a .bmp file for the other networks.
    -m "<path>"
                            Required. Path to an .xml file with a trained model.
        -l "<absolute_path>"
                            Optional. Absolute path to library with MKL-DNN (CPU) custom layers (*.so).
        Or
        -c "<absolute_path>"
                            Optional. Absolute path to GPU custom layers config (*.xml).
    -pp "<path>"
                            Path to a plugin directory.
    -p "<name>"
                            Plugin name. For example Intel® MKL-DNN. If this parameter is pointed, the sample looks for this plugin only
    -d "<device>"
                            Specify the target device to infer on; CPU or GPU is acceptable. The sample looks for a suitable plugin for the specified device.
    -nt "<integer>"
                            Number of top results (default 10)
    -ni "<integer>"
                            Number of iterations (default 1)
    -pc
                            Enables per-layer performance report

Running the application with an empty list of options results in an error message and the usage list above.

To do inference on an image using a trained model of NST network on Intel® Processors using the following command:

$ ./style_transfer_sample -i <path_to_image>/cat.bmp -m <path_to_model>/1_decoder_FP32.xml

Output Description

The application outputs one or more styled image, starting with named out1.bmp, which were redrawn in style of model which used for inference. Style of output images depend on models which use for sample.


Hello Infer Request Classification

Description

How to run the Hello Infer Classification sample application. The sample is simplified version of the Image Classification Sample. It's intended to demonstrate using of new Infer Request API of Inference Engine in applications. See Integrate with customer application New Request API for details.

Running the Application

To do inference on an image using a trained AlexNet network on Intel® Processors:

$ ./hello_request_classification <path_to_model>/alexnet_fp32.xml <path_to_image>/cat.bmp CPU

Output Description

The top-10 inference results


Interactive Face Detection

Description

This sample showcases the Object Detection task applied to face recognition using a sequence of neural networks.

Async API can improve the overall frame-rate of the application, because rather than wait for inference to complete, the application can continue operating on the host while accelerator is busy. This sample maintains three parallel infer requests for the Age/Gender Recognition, Head Pose Estimation, and Emotions Recognition that run simultaneously.

Other sample objectives:

  • Video as input support via OpenCV*
  • Visualization of the resulting face bounding boxes from Face Detection network
  • Visualization of age/gender, head pose, and emotion information for each detected face
  • OpenCV* provides resulting bounding boxes, labels, and other information. You can copy and paste this code without pulling Inference Engine sample helpers into your application

How it Works

  1. The application reads command line parameters loads up to four networks, depending on -d... options family to the Inference Engine.
  2. The application gets a frame from the OpenCV's VideoCapture.
  3. The application performs inference on the frame detection network.
  4. The application performs three simultaneous inferences, using the Age/Gender, Head Pose and Emotions detection networks if those specified in command line.
  5. The application displays the results.

The new Async API operates with new notion of the Infer Request that encapsulates the inputs/outputs and separates scheduling and waiting for result. For more information about Async API and the difference between Sync and Async modes performance, refer to Object Detection SSD, Async API Performance Showcase Sample.

Running the Application

Running the application with the -h option results in the following usage message:

./interactive_face_detection -h
InferenceEngine: 
    API version ............ <version>
    Build .................. <number>
interactive_face_detection [OPTION]
Options:
    -h                               Print a usage message.
    -i "<path>"                Optional. Path to an video file. Default value is "cam" to work with camera.
    -m "<path>"                Required. Path to an .xml file with a trained face detection model.
    -m_ag "<path>"             Optional. Path to an .xml file with a trained age gender model.
    -m_hp "<path>"             Optional. Path to an .xml file with a trained head pose model.
    -m_em "<path>"             Optional. Path to an .xml file with a trained emotions model.
      -l "<absolute_path>"     Required for MKLDNN (CPU)-targeted custom layers.Absolute path to a shared library with the kernels impl.
          Or
      -c "<absolute_path>"     Required for clDNN (GPU)-targeted custom kernels. Absolute path to the xml file with the kernels desc.
    -d "<device>"              Specify the target device for Face Detection (CPU, GPU, FPGA, or MYRIAD). The sample looks for a suitable plugin for a specified device.
    -d_ag "<device>"           Specify the target device for Age Gender Detection (CPU, GPU, FPGA, or MYRIAD). The sample will look for a suitable plugin for a specified device.
    -d_hp "<device>"           Specify the target device for Head Pose Detection (CPU, GPU, FPGA, or MYRIAD). The sample will look for a suitable plugin for a specified device.
    -d_em "<device>"           Specify the target device for Emotions Detection (CPU, GPU, FPGA, or MYRIAD). The sample will look for a suitable plugin for device specified.
    -n_ag "<num>"              Specify number of maximum simultaneously processed faces for Age Gender Detection (default is 16).
    -n_hp "<num>"              Specify number of maximum simultaneously processed faces for Head Pose Detection (default is 16).
    -n_em "<num>"              Specify number of maximum simultaneously processed faces for Emotions Detection (default is 16).
    -no_wait                         No wait for key press in the end.
    -no_show                         No show processed video.
    -pc                              Enables per-layer performance report.
    -r                               Inference results as raw values.
    -t                               Probability threshold for detections.

Running the application with an empty list of options results in an error message and the usage list above.

You can use the following command to do inference on a GPU with an example pre-trained GoogleNet based SSD*:

NOTE:The network should be first converted from the Caffe* (*.prototxt + *.model) to the Inference Engine format (*.xml + *.bin) using the Model Optimizer tool.

./interactive_face_detection -i <path_to_video>/inputVideo.mp4 -m <path_to_model>/ssd.xml -d GPU

Sample Output

The sample uses OpenCV* to display the resulting frame with detections rendered as bounding boxes with lables if provided. In default mode, the sample reports:

  • OpenCV* time: frame decoding + time to render the bounding boxes, labels, and displaying the results
  • Face Detection time: inference time for the face Detection network

If Age/Gender recognition, Head Pose estimation, or Emotions recognition are enabled, the additional information is reported:

  • Age/Gender + Head Pose + Emotions Detection time: combined inference time of simultaneously executed age gender, head pose and emotion recognition networks.

Image Segmentation Sample

Description

How to run the Image Segmentation sample application, which does inference using image segmentation networks like FCN8.

The sample application reads command line parameters and loads a network and an image to the Inference Engine plugin. When inference is done, the application creates an output image.

Running the Application

Running the application with the -h option results in the message:

$ ./segmentation_sample -h
InferenceEngine: 
    API version ............ <version>
    Build .................. <number>
segmentation_sample [OPTION]
Options:
    -h                      
                            Print a usage message.
    -i "<path1>" "<path3>"
                            Required. Path to a directory with images or path to an image files: a .ubyte file for LeNet
                            and a .bmp file for the other networks.
    -m "<path>"             
                            Required. Path to an .xml file with a trained model.
        -l "<absolute_path>"    
                            Optional. Absolute path to library with MKL-DNN (CPU) custom layers (*.so).
        Or
        -c "<absolute_path>"
                            Optional. Absolute path to GPU custom layers config (*.xml).
    -pp "<path>"            
                            Path to a plugin directory.
    -d "<device>"           
                            Specify the target device to infer on; CPU or GPU is acceptable. The sample looks for a suitable plugin for the specified device.
    -ni "<integer>"         
                            Number of iterations (default 1)
    -pc                     
                            Enables per-layer performance report

Running the application with an empty list of options results in an error message and the usage list above.

To do inference on Intel® Processors using an image from a trained FCN8 network:

$ ./segmentation_sample -i <path_to_image>/inputImage.bmp -m <path_to_model>/fcn8.xml

Output Description

The application outputs are a segmented image named out.bmp.


Crossroad Camera Sample

This sample provides an inference pipeline for persons' detection, recognition and reidentification. The sample uses Person Detection network followed by the Person Attributes Recognition and Person Reidentification Retail networks applied on top of the detection results. The corresponding pre-trained models are delivered with the product:

  • person-vehicle-bike-detection-crossroad-0078, which is a primary detection network for finding the persons (and other objects if needed)
  • person-attributes-recognition-crossroad-0031, which is executed on top of the results from the first network and reports person attributes like gender, has hat, has long-sleeved clothes
  • person-reidentification-retail-0079, which is executed on top of the results from the first network and prints a vector of features for each detected person. This vector is used to conclude if it is already detected person or not.

For details on the models, please refer to the descriptions in the deployment_tools/intel_models folder of the OpenVINO™ toolkit installation directory.

Other sample objectives are:

  • Images/Video/Camera as inputs, via OpenCV*
  • Example of simple networks pipelining: Person Attributes and Person Reidentification networks are executed on top of the Person Detection results
  • Visualization of Person Attributes and Person Reidentification (REID) information for each detected person

How It Works

On the start-up, the application reads command line parameters and loads the specified networks. The Person Detection network is required, the other two are optional.

Upon getting a frame from the OpenCV VideoCapture, the application performs inference of Person Detection network, then performs another two inferences of Person Attributes Recognition and Person Reidentification Retail networks if they were specified in the command line, and displays the results. In case of the Person Reidentification Retail network, the resulting vector is generated for each detected person. This vector is compared one-by-one with all previously detected persons vectors using cosine similarity algorithm. If comparison result is greater than the specified (or default) threshold value, it is concluded that the person was already detected and a known REID value is assigned. Otherwise, the vector is added to a global list, and a new REID value is assigned.

Running

Running the application with the -h option yields the following usage message:

./crossroad_camera_sample -h
InferenceEngine:
    API version ............ 1.0
crossroad_camera_sample [OPTION]
Options:
    -h                           Print a usage message.
    -i "<path>"                  Required. Path to a video or image file. Default value is "cam" to work with camera.
    -m "<path>"                  Required. Path to the Person/Vehicle/Bike Detection Crossroad model (.xml) file.
    -m_pa "<path>"               Optional. Path to the Person Attributes Recognition Crossroad model (.xml) file.
    -m_reid "<path>"             Optional. Path to the Person Reidentification Retail model (.xml) file.
      -l "<absolute_path>"       For MKLDNN (CPU)-targeted custom layers, if any. Absolute path to a shared library with the kernels impl.
          Or
      -c "<absolute_path>"       For clDNN (GPU)-targeted custom kernels, if any. Absolute path to the xml file with the kernels desc.
    -d "<device>"                Specify the target device for Person/Vehicle/Bike Detection (CPU, GPU, FPGA, MYRIAD, or HETERO).
    -d_pa "<device>"             Specify the target device for Person Attributes Recognition (CPU, GPU, FPGA, MYRIAD, or HETERO).
    -d_reid "<device>"           Specify the target device for Person Reidentification Retail (CPU, GPU, FPGA, MYRIAD, or HETERO).
    -no_show                     No show processed video.
    -pc                          Enables per-layer performance statistics.
    -r                           Output Inference results as raw values.
    -t                           Probability threshold for person/vehicle/bike crossroad detections.
    -t_reid                      Cosine similarity threshold between two vectors for person reidentification.

Sample Output

The sample uses OpenCV to display the resulting frame with detections rendered as bounding boxes and text. In the default mode, the sample reports Person Detection time - inference time for the Person/Vehicle/Bike Detection network.

If Person Attributes Recognition or Person Reidentification Retail are enabled, the additional info below is reported also:

  • Person Attributes Recognition time - Inference time of Person Attributes Recognition averaged by the number of detected persons.
  • Person Reidentification time - Inference time of Person Reidentification averaged by the number of detected persons.

How to Integrate the Inference Engine in Your Application

  • This section talks about API information. For more information about APIs, see the offline documentation that was included in your package. To locate the current API Developer Guide topics:
    1. Go to <INSTALL_DIR>/deployment_tools/documentation/ where <INSTALL_DIR> is the directory in which the OpenVINO toolkit is installed.
    2. Open index.html in an Internet browser.
    3. Select Integrating Inference Engine in Your Application (current API) from the contents.
  • This document refers to APIs from previous releases as "legacy" API. It is best to stop using the legacy API since it will be removed in a future product release. To locate the legacy API Developer Guide topics:
    1. Go to <INSTALL_DIR>/deployment_tools/documentation/ under the directory in which the OpenVINO toolkit is installed.
    2. Open index.html in an Internet browser.
    3. Select Integrating Inference Engine in Your Application (legacy API) from the contents.
  • Complete API documentation is also in the full offline package documentation.
    1. Go to <INSTALL_DIR>/deployment_tools/documentation/ under the directory in which the OpenVINO is installed.
    2. Open index.html in an Internet browser.
    3. Select Open Data Structures from the menu at the top of the screen.

Integration With the API

This section provides a high-level description of the process of integrating the Inference Engine into your application. See Using Inference Engine Samples for examples of using the Inference Engine in applications.

Using the Inference Engine API in Your Code

The core libinference_engine.so library implements loading and parsing a model Intermediate Representation, and triggers inference using a specified plugin. The core library has the following API:

  • InferenceEngine::PluginDispatcher - This class allows find suitable plugin for specified device in given directories.
  • InferenceEngine::BlobInferenceEngine::TBlob
  • InferenceEngine::BlobMap
  • InferenceEngine::InputInfoInferenceEngine::InputsDataMap
  • InferenceEngine::OuputsDataMap

​C++ Inference Engine API wraps the capabilities of the core library:

  • InferenceEngine::CNNNetReader
  • InferenceEngine::CNNNetwork
  • InferenceEngine::IInferencePlugin - The main plugin interface. Every Inference Engine plugin implements this method. Use it through an InferenceEngine::InferenceEnginePluginPtr instance.
  • InferenceEngine::ExecutableNetwork
  • InferenceEngine::InferRequest

The Integration Process

Integration process consists of the following steps:

1. Load a Plugin

Load a plugin by creating an instance of InferenceEngine::InferenceEnginePluginPtr. Wrap it by creating an instance of InferenceEngine::InferencePlugin> from C++ Inference Engine API. Specify the plugin or let the Inference Engine choose it with InferenceEngine::PluginDispatcher.

InferenceEnginePluginPtr engine_ptr = PluginDispatcher(pluginDirs).getSuitablePlugin(TargetDevice::eGPU);
InferencePlugin plugin(engine_ptr);
2. Read the Model Intermediate Representation (IR)

Create an Intermediate Representation reader by creating an instance of InferenceEngine::CNNNetReader and read a model Intermediate Representation:

CNNNetReader network_reader;
network_reader.ReadNetwork("Model.xml");
network_reader.ReadWeights("Model.bin");
3. Configure Input and Output

Request input and output information using the InferenceEngine::CNNNetReader::getNetwork(), InferenceEngine::CNNNetwork::getInputsInfo(), and InferenceEngine::CNNNetwork::getOutputsInfo() methods:

auto network = network_reader.getNetwork();
/** Taking information about all topology inputs **/
InferenceEngine::InputsDataMap input_info(network.getInputsInfo());
/** Taking information about all topology outputs **/
InferenceEngine::OutputsDataMap output_info(network.getOutputsInfo());

Optionally set the number format (precision) and memory layout for inputs and outputs. Refer to the Supported Devices section to choose the relevant configuration:

/** Iterating over all input info**/
for (auto &item : input_info) {
    auto input_data = item.second;
    input_data->setPrecision(Precision::U8);
    input_data->setLayout(Layout::NCHW);
}
/** Iterating over all output info**/
for (auto &item : output_info) {
    auto output_data = item.second;
    output_data->setPrecision(Precision::FP32);
    output_data->setLayout(Layout::NC);
}

Skipping of this step sets default values:

  • Input and output precision - Precision::FP32
  • Input layout - Layout::NCHW
  • Output layout depends on number of its dimensions:
    Number of Dimensions4321
    LayoutNCHWCHWNCC
4. Load the Model

Load the model to the plugin using InferenceEngine::InferencePlugin::LoadNetwork():

auto executable_network = plugin.LoadNetwork(network, {});
}

It creates an executable network from a network object. The executable network is associated with single hardware device. It's possible to create as many networks as needed and to use them simultaneously (up to the limitation of the hardware resources). Second parameter is a configuration for plugin. It's map of pairs: (parameter name, parameter value). Choose device from the Supported Devices section for more details about supported configuration parameters:

/** Optional config. E.g. this enables profiling of performance counters. **/ 
std::map config = {{ PluginConfigParams::KEY_PERF_COUNT, PluginConfigParams::YES }};
auto executable_network = plugin.LoadNetwork(network, config);
5. Create Infer Request

Create infer request using the InferenceEngine::ExecutableNetwork::CreateInferRequest() method:

auto infer_request = executable_network.CreateInferRequest();
6. Prepare Input

There are three options to prepare input:

  • Optimal way for single network. Get blobs allocated by infer request using InferenceEngine::InferRequest::GetBlob() and feed an image and the input data to the blobs:
    /** Iterating over all input blobs **/
    for (auto & item : inputInfo) {
        auto input_name = item->first;
        /** Getting input blob **/
        auto input = infer_request.GetBlob(input_name);
        /** Fill input tensor with planes. First b channel, then g and r channels **/
        ...
    }
    
  • Optimal way for cascade of network (output of one network is input for another one). Get output blob from the first request using InferenceEngine::InferRequest::GetBlob() and set as input for the second request using InferenceEngine::InferRequest::SetBlob():
    auto output = infer_request1->GetBlob(output_name);
    infer_request2->SetBlob(input_name, output);
    
  • Allocate input blobs of the appropriate types, feed an image and the input data to the blobs and call InferenceEngine::InferRequest::SetBlob() to set these blobs for infer request:
    /** Iterating over all input blobs **/
    for (auto & item : inputInfo) {
        auto input_data = item->second;
        /** Creating input blob **/
        InferenceEngine::TBlob::Ptr input;
        // assuming input precision was asked to be U8 in prev step
        input = InferenceEngine::make_shared_blob(InferenceEngine::Precision:U8, input_data->getDims());
        input->allocate();
        infer_request->SetBlob(item.first, input);
        /** Fill input tensor with planes. First b channel, then g and r channels **/
        ...
    }
    

SetBlob() method compares precision and layout of blob with corresponding precision and layout defined on step 3 and throws exception if they does not match. Blob can be filled before and after SetBlob().

7. Perform Inference

Do inference by calling the InferenceEngine::InferRequest::StartAsync and InferenceEngine::InferRequest::Wait methods for asynchronous request:

infer_request->StartAsync();
infer_request.Wait(IInferRequest::WaitMode::RESULT_READY);

or by calling the InferenceEngine::InferRequest::Infer method for synchronous request:

sync_infer_request->Infer();

StartAsync returns immediately and starts inference without blocking main thread, Infer blocks main thread and returns when inference is completed. Call Wait for waiting result to become available for asynchronous request.

There are three ways to use it:

  • Specify maximum duration in milliseconds to block for. The method is blocked until the specified timeout has elapsed, or the result becomes available, whichever comes first.
  • InferenceEngine::IInferRequest::WaitMode::RESULT_READY - Waits until inference result becomes available
  • InferenceEngine::IInferRequest::WaitMode::STATUS_ONLY - Immediately returns request status. It doesn't block or interrupts current thread.

Both requests are thread-safe: can be called from different threads without fearing corruption and failures.

Multiple requests for single ExecutableNetwork are executed sequentially one by one in FIFO order.

While request is ongoing all its methods except InferenceEngine::InferRequest::Wait would throws exception.

Process Output

Go over the output blobs and process the results. Note that casting Blob to TBlob via std::dynamic_pointer_cast is not a recommended way, better to access data via buffer() and as() methods as follows:

for (auto &item : output_info) {
    auto output_name = item.first;
    auto output = infer_request.GetBlob(output_name);
    {
        auto const memLocker = output->cbuffer(); // use const memory locker
        // output_buffer is valid as long as the lifetime of memLocker
        const float *output_buffer = memLocker.as();
        /** output_buffer[] - accessing output blob data **/

Building Your Application

For details about building your application, see the CMake files for the sample applications. All samples reside in the samples directory in the Inference Engine installation directory: <INSTALL_DIR>/deployment_tools/inference_engine/samples

Running the Application

Before running compiled binary files, make sure your application can find the Inference Engine libraries. On Linux* operating systems, including Ubuntu* and CentOS*, the LD_LIBRARY_PATH environment variable is usually used to specify directories to be looked for libraries. You can update the LD_LIBRARY_PATH with paths to the directories in the Inference Engine installation directory where the libraries reside.

  • Add a path the directory containing the core and plugin libraries:
    • For Inference Engine installed within the OpenVINO package:
      export LD_LIBRARY_PATH=<INSTALL_DIR>/inference_engine/lib/<linux_version>/intel64:$LD_LIBRARY_PATH
      
  • Add paths to the directories containing the required third-party libraries:
    • For Inference Engine installed within the OpenVINO toolkit package:
      export LD_LIBRARY_PATH=<INSTALL_DIR>/inference_engine/external/mklml_lnx/lib:$LD_LIBRARY_PATH
      export LD_LIBRARY_PATH=<INSTALL_DIR>/inference_engine/external/cldnn/lib:$LD_LIBRARY_PATH
      

As an alternative, use the following scripts in the Inference Engine directory of the OpenVINO™ toolkit installation directory:

<INSTALL_DIR>/bin/setupvars.sh

To run compiled applications on Microsoft* Windows* OS, make sure that Microsoft* Visual C++ 2015 Redistributable and Intel® C++ Compiler 2017 Redistributable packages are installed and <INSTALL_DIR>\deployment_tools\inference_engine\bin\intel64\Release\*.dll files are placed to the application directory or accessible via the PATH environment variable.


Integration With the Legacy API

NOTE: The subject of this section is Legacy APIs. Legacy APIs are deprecated and will be removed in a future release. It is best to use the current APIs.

This section provides a high-level description of the process of integrating the Inference Engine into your application. See Using Inference Engine Samples for examples of using the Inference Engine in applications.

Using the Inference Engine API in Your Code

The core libinference_engine.so library implements loading and parsing a model Intermediate Representation, and triggers inference using a specified plugin. The core library has the following API:

  • InferenceEngine::IInferencePlugin - The main plugin interface. Every Inference Engine plugin implements this method. Use it through an InferenceEngine::InferenceEnginePluginPtr instance.
  • InferenceEngine::PluginDispatcher - This class finds the suitable plugin for a specified device in given directories.
  • InferenceEngine::CNNNetReader
  • InferenceEngine::CNNNetwork
  • InferenceEngine::Blob, InferenceEngine::TBlob
  • InferenceEngine::BlobMap
  • InferenceEngine::InputInfo, InferenceEngine::InputsDataMap

The Integration Process

  1. Load a plugin by creating an instance of InferenceEngine::InferenceEnginePluginPtr.
  2. Specify the plugin or let the Inference Engine choose it with InferenceEngine::PluginDispatcher. See the selectPlugin() function in the samples.
    InferenceEngine::PluginDispatcher dispatcher(pluginDirs);
    InferenceEngine::InferenceEnginePluginPtr enginePtr (dispatcher.getSuitablePlugin(TargetDevice::eCPU);
  3. Create an Intermediate Representation reader by creating an instance of InferenceEngine::CNNNetReader and read a model Intermediate Representation:
    auto netBuilder = new InferenceEngine::CNNNetReader();
    netBuilder->ReadNetwork("Model.xml");
    netBuilder->ReadWeights("Model.bin");
  4. Request information about inputs (an image and any other input data required) using the InferenceEngine::CNNNetReader::getNetwork() and InferenceEngine::CNNNetwork::getInputsInfo() methods. Set the input number format (precision) using InferenceEngine::InputInfo::setInputPrecision to match the input data format (precision). Allocate input blobs of the appropriate types and feed an image and the input data to the blobs:
    /** Taking information about all topology inputs **/
    InferenceEngine::InputsDataMap inputInfo(netBuilder.getNetwork().getInputsInfo());
    /** Stores all input blobs data **/
    InferenceEngine::BlobMap inputBlobs;
    /** Iterating over all input blobs **/
    for (auto & item : inputInfo) {
        /** Creating input blob **/
        item.second->setInputPrecision(Precision::U8);
        InferenceEngine::TBlob[unsigned char]::Ptr input;
        input = InferenceEngine::make_shared_blob[unsigned char, InferenceEngine::SizeVector](Precision::U8, item.second->getDims());
        input->allocate();
        inputBlobs[item.first] = input;
        /** Fill input tensor with planes. First b channel, then g and r channels **/
        ...
    }
  5. Request information about outputs, using the InferenceEngine::CNNNetReader::getNetwork() and InferenceEngine::CNNNetwork::getOutputsInfo() methods. Allocate output blobs of the appropriate types:
    InferenceEngine::OutputsDataMap outputInfo(netBuilder.getNetwork().getOutputsInfo());
    InferenceEngine::BlobMap outputBlobs;
    for (auto & item : outputInfo) {
        InferenceEngine::TBlob[float]::Ptr output;
        output = InferenceEngine::make_shared_blob[float, InferenceEngine::SizeVector](Precision::FP32, item.second->dims);
        output->allocate();
        outputBlobs[item.first] = output;
    }
  6. Load the model to the plugin using InferenceEngine::IInferencePlugin::LoadNetwork():
    InferenceEngine::StatusCode status = enginePtr->LoadNetwork(netBuilder.getNetwork(), &resp);
    if (status != InferenceEngine::OK) {
        throw std::logic_error(resp.msg);
    }
  7. Do inference by calling the InferenceEngine::IInferencePlugin::Infer method:
    enginePtr->Infer(inputBlobs, outputBlobs, &resp);
    
  8. Go over the output blobs and process the results.
    /** Pointer to the output blob **/
    const TBlob[float]::Ptr fOutput = std::dynamic_pointer_cast[TBlob[float]](outputBlobs.begin()->second);
    /** fOutput->data()[] - accessing output blob data **/

Building Your Application

For details about building your application, see the CMake files for the sample applications. All samples reside in the samples directory in the Inference Engine installation directory.

Running the Application

Before running compiled binary files:

Make sure your application can find the Inference Engine libraries. On Linux* operating systems, the LD_LIBRARY_PATH environment variable specifies the library directories.

Update LD_LIBRARY_PATH with directory paths under the Inference Engine installation directory in which the libraries reside.

Add a path the directory containing the core and plugin libraries:

  • For Inference Engine installed within the OpenVINO toolkit package:
    export LD_LIBRARY_PATH=<INSTALL_DIR>/inference_engine/lib/<linux_version>/intel64:$LD_LIBRARY_PATH
    </linux_version>

Add paths the directories containing the required third-party libraries:

  • For Inference Engine installed within the OpenVINO toolkit package:
    export LD_LIBRARY_PATH=<INSTALL_DIR>/inference_engine/external/mklml_lnx/lib:$LD_LIBRARY_PATH
    export LD_LIBRARY_PATH=<INSTALL_DIR>/inference_engine/external/cldnn/lib:$LD_LIBRARY_PATH
    

As an alternative, use scripts under the Inference Engine directory for the OpenVINO toolkit installation:

<INSTALL_DIR>/bin/setupvars.sh

To run compiled applications on Microsoft* Windows* OS, make sure that Microsoft* Visual C++ 2015 Redistributable and Intel® C++ Compiler 2017 Redistributable packages are installed and <INSTALL_DIR>\deployment_tools\inference_engine\bin\intel64\Release\*.dll files are in the application directory or accessible through the PATH environment variable.

Adding Your Own Kernels in the Inference Engine

A Layer is a CNN building block is implemented in the training framework, such as "Convolution" in Caffe*. Kernel is defined as the corresponding implementation in Inference Engine.

Plug your kernel implementations into the Inference Engine and map them to the layers in the original framework. See the Model Optimizer Developer Guide for information about how a mapping between framework's layers and Inference Engine kernels is registered.

The rest of the section covers custom kernels and how to integrate them into the Inference Engine.

Example of Custom Kernels Support in the Samples

Every sample uses the Inference Engine API to load custom kernels depending on the device type. Specifically, for the CPU this is a shared library that exports certain interface that registers the kernels. For GPU, it is an xml file that lists the kernels along with params that the kernels accept, and how these map to the specific Intermediate Representation values.

Example Custom Kernels

The "extension" directory in the "samples" dir comes with few real example of CPU-targeted kernels, like DetectionOutput (used in SSD*), etc.

Bunch the GPU-targeted kernels to the binaries upon compiling the samples so the samples' applications can easily load them. See the cldnn_global_custom_kernels directory in the GPU plugin installation directory.

How to Implement Custom Intel® Integrated Graphics Layers

You must provide the kernel code in the OpenCL C, and the configuration file that connects the kernel and its parameters to the params of the layer.

You have two options for using the custom layer configuration file.

  • Include a section with your kernels into the global auto-loading file cldnn_global_custom_kernels/cldnn_global_custom_kernels.xml
  • Second one is to provide a separate configuration file and load it using IInferencePlugin::SetConfig() method with the PluginConfigParams::KEY_CONFIG_FILE key and the configuration file name as the value, before loading the network that features the custom layers:
    // Load the Intel® Integrated Graphics plugin InferenceEngine::InferenceEnginePluginPtr plugin_ptr(selectPlugin({…, “GPU”)); InferencePlugin plugin(plugin_ptr); 
    // Load the Intel® Integrated Graphics Extensions plugin.SetConfig({{PluginConfigParams::KEY_CONFIG_FILE, ”<path to the xml file>”}});

For details about the configuration parameters and OpenCL kernel see the tutorial.

How to Implement Custom CPU Layers

The instructions below are a brief summary of the Custom Layers tutorial.

For more details, see the sample source.

  1. Create a custom layer factory CustomLayerFactory class.
    // custom_layer.h
    // A CustomLayerFactory class is an example layer which make exponentiation by 2 for the input and doesn't change dimensions
    class CustomLayerFactory {
    };
  2. Inherit it from the abstract class InferenceEngine::ILayerImplFactory:
    // custom_layer.h
    class CustomLayerFactory: public InferenceEngine::ILayerImplFactory {
    };
  3. Create constructor and virtual destructor, and a data member to keep the layer info
    // custom_layer.h
    class CustomLayerFactory: public InferenceEngine::ILayerImplFactory {
    public:
        explicit CustomLayerFactory(const CNNLayer *layer): cnnLayer(*layer) {}
    private:
        CNNLayer cnnLayer;
    };
  4. Overload and implement the abstract methods (getShapes, getImplementations) of the InferenceEngine::ILayerImplFactory class
    // custom_layer.h
    class CustomLayerFactory: public InferenceEngine::ILayerImplFactory {
    public:
        // ... constructor and destructor
        StatusCode getShapes(const std::vector& inShapes, std::vector& outShapes, ResponseDesc *resp) noexcept override {
            if (cnnLayer == nullptr) {
                std::string errorMsg = "Cannot get cnn layer!";
                errorMsg.copy(resp->msg, sizeof(resp->msg) - 1);
                return GENERAL_ERROR;
            }
            if (inShapes.size() != 1) {
                std::string errorMsg = "Incorrect input shapes!";
                errorMsg.copy(resp->msg, sizeof(resp->msg) - 1);
                return GENERAL_ERROR;
            }
            outShapes.clear();
            outShapes.emplace_back(inShapes[0]);
            return OK;
        }
        StatusCode getImplementations(std::vector& impls, ResponseDesc *resp) noexcept override {
            // You can put cnnLayer to implimentation if it is necessary.
            impls.push_back(ILayerImpl::Ptr(new CustomLayerImpl()));
            return OK;
        }
    };
  5. Create your custom layer implementation CustomLayerImpl class:
    // custom_layer.h
    // A CustomLayerImpl class is an example implementation
    class CustomLayerImpl {
    };
  6. Because the layer uses the execute method to change data, inherit it from the abstract class InferenceEngine::ILayerExecImpl, and overload and implement the abstract methods of this class.
    // custom_layer.h
    // A CustomLayerImpl class is an example implementation
    class CustomLayerImpl: public ILayerExecImpl {
    public:
        explicit CustomLayerImpl(const CNNLayer *layer): cnnLayer(*layer) {}
        StatusCode getSupportedConfigurations(std::vector& conf, ResponseDesc *resp) noexcept override;
        StatusCode init(LayerConfig& config, ResponseDesc *resp) noexcept override;
        StatusCode execute(std::vector& inputs, std::vector& outputs, ResponseDesc *resp) noexcept override;
    private:
        CNNLayer cnnLayer;
    };
  7. Implement the getSupportedConfigurations to return all supported configurations for this implementation. To specify formats of data use InferenceEngine::TensorDesc:
    // custom_layer.cpp
    virtual StatusCode CustomLayerImpl::getSupportedConfigurations(std::vector& conf, ResponseDesc *resp) noexcept {
        try {
            // This layer can be in-place but not constant!!!
            if (cnnLayer == nullptr)
                THROW_IE_EXCEPTION << "Cannot get cnn layer";
            if (cnnLayer->insData.size() != 1 || cnnLayer->outData.empty())
                THROW_IE_EXCEPTION << "Incorrect number of input/output edges!";
            LayerConfig config;
            DataPtr dataPtr = cnnLayer->insData[0].lock();
            if (!dataPtr)
                THROW_IE_EXCEPTION << "Cannot get input data!";
            DataConfig dataConfig;
            dataConfig.inPlace = -1;
            dataConfig.constant = false;
            SizeVector order;
            for (size_t i = 0; i < dataPtr->getTensorDesc().getDims().size(); i++) {
                order.push_back(i);
            }
            // Planar formats for N dims
            dataConfig.desc = TensorDesc(dataPtr->getTensorDesc().getPrecision(),
                                         dataPtr->getTensorDesc().getDims(),
                                         {dataPtr->getTensorDesc().getDims(), order});
            config.inConfs.push_back(dataConfig);
            DataConfig outConfig;
            outConfig.constant = false;
            outConfig.inPlace = 0;
            order.clear();
            for (size_t i = 0; i < cnnLayer->outData[0]->getTensorDesc().getDims().size(); i++) {
                order.push_back(i);
            }
            outConfig.desc = TensorDesc(cnnLayer->outData[0]->getTensorDesc().getPrecision(),
                                        cnnLayer->outData[0]->getDims(),
                                        {cnnLayer->outData[0]->getDims(), order});
            config.outConfs.push_back(outConfig);
            config.dynBatchSupport = 0;
            conf.push_back(config);
            return OK;
        } catch (InferenceEngine::details::InferenceEngineException& ex) {
            std::string errorMsg = ex.what();
            errorMsg.copy(resp->msg, sizeof(resp->msg) - 1);
            return GENERAL_ERROR;
        }
    }
  8. Implement init and execute methods. init is necessary to get selected configuration and check parameters.
    // custom_layer.cpp
    virtual StatusCode CustomLayerImpl::init(LayerConfig& config, ResponseDesc *resp) noexcept {
        StatusCode rc = OK;
        if (config.dynBatchSupport) {
            config.dynBatchSupport = 0;
            rc = NOT_IMPLEMENTED;
        }
        for (auto& input : config.inConfs) {
            if (input.inPlace >= 0) {
                input.inPlace = -1;
                rc = NOT_IMPLEMENTED;
            }
            for (auto& offset : input.desc.getBlockingDesc().getOffsetPaddingToData()) {
                if (offset) {
                    return GENERAL_ERROR;
                }
            }
            if (input.desc.getBlockingDesc().getOffsetPadding()) {
                return GENERAL_ERROR;
            }
            for (size_t i = 0; i < input.desc.getBlockingDesc().getOrder().size(); i++) {
                if (input.desc.getBlockingDesc().getOrder()[i] != i) {
                    if (i != 4 || input.desc.getBlockingDesc().getOrder()[i] != 1)
                        return GENERAL_ERROR;
                }
            }
        }
        for (auto& output : config.outConfs) {
            if (output.inPlace < 0) {
                // NOT in-place
            }
            for (auto& offset : output.desc.getBlockingDesc().getOffsetPaddingToData()) {
                if (offset) {
                    return GENERAL_ERROR;
                }
            }
            if (output.desc.getBlockingDesc().getOffsetPadding()) {
                return GENERAL_ERROR;
            }
            for (size_t i = 0; i < output.desc.getBlockingDesc().getOrder().size(); i++) {
                if (output.desc.getBlockingDesc().getOrder()[i] != i) {
                    if (i != 4 || output.desc.getBlockingDesc().getOrder()[i] != 1)
                        return GENERAL_ERROR;
                }
            }
        }
        return rc;
    }
    virtual StatusCode CustomLayerImpl::execute(std::vector& inputs, std::vector& outputs, ResponseDesc *resp) noexcept {
        if (inputs.size() != 1 || outputs.empty()) {
            std::string errorMsg = "Incorrect number of input or output edges!";
            errorMsg.copy(resp->msg, sizeof(resp->msg) - 1);
            return GENERAL_ERROR;
        }
        const float* src_data = inputs[0]->buffer();
        float* dst_data = outputs[0]->buffer();
        for (size_t o = 0; o < outputs->size(); o++) {
            if (dst_data == src_data) {
                dst_data[o] *= dst_data[o];
            } else {
                dst_data[o] = src_data[o]*src_data[o];
            }
        }
    }
  9. Create a factory for your own primitives, inherited from the abstract class InferenceEngine::IExtension
    // custom_extension.h
    class CustomExtention : public InferenceEngine::IExtension {
    }; 
    Implement the utility methods Unload, Release, SetLogCallback:
    // custom_extension.h
    class CustomExtention : public InferenceEngine::IExtension {
    public:
        // could be used to cleanup resources
        void Unload() noexcept override {
        }
        // is used when destruction happens
        void Release() noexcept override {
            delete this;
        }
        // logging is used to track what is going on inside
        void SetLogCallback(InferenceEngine::IErrorListener &listener) noexcept override {}
    };
  10. Implement the utility method GetVersion:
    // custom_extension.h
    class CustomExtention : public InferenceEngine::IExtension {
    private:
        static InferenceEngine::Version ExtensionDescription = {
            {1, 0},             // extension API version
            "1.0",              
            "CustomExtention"   // extension description message
        };
    public:
        // gets extension version information
        void GetVersion(const InferenceEngine::Version *& versionInfo) const noexcept override {
            versionInfo = &ExtensionDescription;
        }
    }; 
    Implement main extension methods:
    // custom_extension.h
    class CustomExtention : public InferenceEngine::IExtension {
    public:
        // ... utility methods
        StatusCode getPrimitiveTypes(char**& types, unsigned int& size, ResponseDesc* resp) noexcept override {
            std::string type_name = "CustomLayer";
            types = new char *[1];
            size = 1;
            types[0] = new char[type_name.size() + 1];
            std::copy(type_name.begin(), type_name.end(), types[0]);
            types[0][type_name.size()] = '\0';
            return OK;
        }
        StatusCode getFactoryFor(ILayerImplFactory *&factory, const CNNLayer *cnnLayer, ResponseDesc *resp) noexcept override {
            if (cnnLayer->type != "CustomLayer") {
                std::string errorMsg = std::string("Factory for ") + cnnLayer->type + " wasn't found!";
                errorMsg.copy(resp->msg, sizeof(resp->msg) - 1);
                return NOT_FOUND;
            }
            factory = new CustomLayerFactory(cnnLayer);
            return OK;
        }
    };
  11. To use your custom layers, compile the code as the shared library, and then use the AddExtension method of the general plugin interface to load your primitives:
    auto extension_ptr = make_so_pointer<inferenceengine::iextension>(“<shared lib path>”);
    // Add extension to the plugin’s list
    plugin.AddExtension(extension_ptr);</inferenceengine::iextension>

Using the Validation Application to Check Accuracy on a Dataset

The Inference Engine Validation application lets you score common topologies with standard inputs and outputs configuration. These topologies include AlexNet and SSD. The Validation application allows the user to collect simple validation metrics for the topologies. It supports Top-1/Top-5 counting for classification networks and 11-points mAP calculation for object detection networks.

Possible Validation application uses:

  • Check if Inference Engine scores the public topologies well
  • Verify if the user's custom topology compatible with the default input/output configuration and compare its accuracy with the public ones
  • Using Validation application as another sample: although the code is much more complex than in classification and object detection samples, it's still open and could be re-used

The application loads a network to the Inference Engine plugin. Then:

  1. The application reads the validation set (the -i option):
    • If -i specifies a directory. The application tries to load labels first. To do so, the application searches for a file with the same base name as the model, but with a .labels extension. The application then searches the specified directory and adds all images from sub-directories whose names are equal to a known label to the validation set. If there are no sub-directories whose names are equal to known labels, the validation set is considered empty.
    • If -i specifies a .txt file. The application reads the .txt file, considering every line that has the format: <relative_path_from_txt_to_img] <ID] where ID is the image number that the network should classify.
  2. The application reads the number of images specified by -b and loads the images to the plugin. When all images are loaded, the plugin does inference and the Validation application collects the statistics.

NOTE: Image load time is not part of of the inference time reported by the application.

As an option, use the -dump option to retrieve the inference results. This option creates an inference report with the name in as dumpfileXXXX.csv. in this format, using semicolon separated values:

  • Image_path
  • Flag representing correctness of prediction
  • ID of the Top-1 class
  • Probability that the image belongs to the Top-1 class
  • ID of the Top-2 class
  • Probability that the image belongs to the Top-x class, where x is an integer

CLI Options

Usage: validation_app [OPTION]
Available options:
    -h                        Print a usage message
    -t                  Type of the network being scored ("C" by default)
      -t "C" for classification
      -t "OD" for object detection
    -i [path]                 Required. Directory with validation images, directorys grouped by labels or a .txt file list for classification networks or a VOC-formatted dataset for object detection networks
    -m [path]                 Required. Path to an .xml file with a trained model
    -l [absolute_path]        Required for Intel® MKL-DNN (CPU)-targeted custom layers.Absolute path to a shared library with the kernel implementations
    -c [absolute_path]        Required for GPU-targeted custom kernels.Absolute path to the xml file with the kernel descriptions
    -d [device]               Specify the target device to infer on; CPU, GPU, FPGA or MYRIAD is acceptable. The sample looks for a suitable plugin for the specified device. The plugin is CPU by default.
    -b N                      Batch size value. If not specified, the batch size value is determined from IR
    -ppType             Preprocessing type. One of "None", "Resize", "ResizeCrop"
    -ppSize N                 Preprocessing size (used with ppType="ResizeCrop")
    -ppWidth W                Preprocessing width (overrides -ppSize, used with ppType="ResizeCrop")
    -ppHeight H               Preprocessing height (overrides -ppSize, used with ppType="ResizeCrop")
    --dump                    Dump filenames and inference results to a csv file

    Classification-specific options:
      -Czb true               "Zero is a background" flag. Some networks are trained with a modified dataset where the class IDs are enumerated from 1, but 0 is an undefined "background" class (which is never detected)

    Object detection-specific options:
      -ODkind           Kind of an object detection network: SSD
      -ODa [path]             Required for OD networks. Path to the directory containing .xml annotations for images
      -ODc              Required for OD networks. Path to the file containing classes list
      -ODsubdir         Directory between the image path (-i) and image name, specified in the .xml. Use JPEGImages for VOC2007

Option Categories

  • Common options are usually named with a single letter or word, such as -b or –dump. These options have a common sense in all validation_app modes.
  • Network type-specific options are named as an acronym of the network type (such as C or OD, followed by a letter or a word addendum. These options are specific for the network type. For instance, ODa makes sense only for an object detection network.

The next section shows how to use the Validation application in classification mode to score a classification CNN on a pack of images.

Running the Application in Classification Mode

This section demonstrates how to run the Validation application in classification mode to score a classification CNN on a pack of images.

To do inference of a chosen pack of images:

$ ./validation_app -t C -i <path to images main directory or .txt file] -m <model to use for classification] -d <CPU|GPU]

Source dataset format: directories as classes

A correct list of files looks similar to:

<path>/dataset
    /apron
        /apron1.bmp
        /apron2.bmp
    /collie
        /a_big_dog.jpg
    /coral reef
        /reef.bmp
    /Siamese
        /cat3.jpg

To score this dataset put the -i <path>/dataset option in the command line.

Source dataset format: a list of images

This example uses a single list file in the format image_name-tabulation-class_index. The correct list of files:

<path>/dataset
    /apron1.bmp
    /apron2.bmp
    /a_big_dog.jpg
    /reef.bmp
    /cat3.jpg
    /labels.txt

where labels.txt:

apron1.bmp 411
apron2.bmp 411
cat3.jpg 284
reef.bmp 973
a_big_dog.jpg 231

To score this dataset put the -i <path>/dataset/labels.txt option in the command line.

Output Description

A progress bar shows the inference progress. Upon completion, the common information is displayed.

Network load time: time spent on topology load in ms
Model: path to chosen model
Model Precision: precision of a chosen model
Batch size: specified batch size
Validation dataset: path to a validation set
Validation approach: Classification networks
Device: device type

You see statistics such as the average inference time, and top-1 and top-5 accuracy:

Average infer time (ms): 588.977 (16.98 images per second with batch size = 10)

Top1 accuracy: 70.00% (7 of 10 images were detected correctly, top class is correct)
Top5 accuracy: 80.00% (8 of 10 images were detected correctly, top five classes contain required class)

Using Object Detection with the Validation Application

Description

Running the Validation application in object detection mode to score an object detection on the SSD CNN pack of images.

Running SSD on the VOC Dataset

Use these steps to score SSD on the original dataset that was used to test it during its training.

./validation_app -d CPU -t OD -ODa "<...>/VOCdevkit/VOC2007/Annotations" -i "<...>/VOCdevkit" -m "<...>/vgg_voc0712_ssd_300x300.xml" -ODc "<...>/VOC_SSD_Classes.txt" -ODsubdir JPEGImages
  1. Go to the SSD author's github page to select the pre-trained SSD-300.
  2. From the same page, download the VOC2007 test dataset:
    $wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar
    tar -xvf VOCtest_06-Nov-2007.tar
  3. Use the Model Optimizer to convert the model. For help, see the Model Optimizer Developer Guide.
  4. Create a proper class file (made from the original labelmap_voc.prototxt) none_of_the_above 0 aeroplane 1 bicycle 2 bird 3 boat 4 bottle 5 bus 6 car 7 cat 8 chair 9 cow 10 diningtable 11 dog 12 horse 13 motorbike 14 person 15 pottedplant 16 sheep 17 sofa 18 train 19 tvmonitor 20
  5. Save it as VOC_SSD_Classes.txt
  6. Score the model on the dataset:

  7. You see a progress bar followed by your data:
    Progress: [....................] 100.00% done    
    [ INFO ] Processing output blobs
    Network load time: 27.70ms
    Model: /home/user/models/ssd/withmean/vgg_voc0712_ssd_300x300/vgg_voc0712_ssd_300x300.xml
    Model Precision: FP32
    Batch size: 1
    Validation dataset: /home/user/Data/SSD-data/testonly/VOCdevkit
    Validation approach: Object detection network
    
    Average infer time (ms): 166.49 (6.01 images per second with batch size = 1)
    Average precision per class table: 
    
    Class   AP
    1   0.796
    2   0.839
    3   0.759
    4   0.695
    5   0.508
    6   0.867
    7   0.861
    8   0.886arXiv
    9   0.602
    10  0.822
    11  0.768
    12  0.861
    13  0.874
    14  0.842
    15  0.797
    16  0.526
    17  0.792
    18  0.795
    19  0.873
    20  0.773
    Mean Average Precision (mAP): 0.7767

The Mean Value Precision is in a table on the SSD author's page and in the arXiv paper.

Advanced Topics

Key terms in this section

Acronym/TermDescription
DLDeep Learning
FP16 formatHalf-precision floating-point format
FP32 formatSingle-precision floating-point format
I16 format2-byte signed integer format
I8 format1-byte signed integer format
U16 format2-byte unsigned integer format
U8 format1-byte unsigned integer format
NCHW, NHWC

Image data layout. Refers to the representation of batches of images.

  • N - Number of images in a batch
  • H - Number of pixels in the vertical dimension
  • W - Number of pixels in the horizontal dimension
  • C - Channels
C, CHW, NCTensor memory layout. For example, the CHW value at index (c,h,w) is physically located at index (c * H + h) * W = w, for others by analogy.

Understanding Inference Engine Memory Primitives

Blobs

InferenceEngine::Blob is the main class intended for working with memory. This class lets you read and write memory and get information about the memory structure, among other tasks.

To create Blob objects with a specific layout, use constructors with InferenceEngine::TensorDesc.

InferenceEngige::TensorDesc tdesc(FP32, {1, 3, 227, 227}, InferenceEngine::Layout::NCHW);
InferenceEngine::Blob::Ptr blob = InferenceEngine::make_shared_blob(tdesc);

Layouts

InferenceEngine::TensorDesc is a special class that provides layout format description.

This class allows to create planar layouts using the standard formats, like InferenceEngine::Layout::NCHW, InferenceEngine::Layout::NC, InferenceEngine::Layout::C, and non-planar layouts using InferenceEngine::BlockingDesc.

To create a complex layout, use InferenceEngine::BlockingDesc, which allows to define the blocked memory with offsets and strides.

Examples

  • Define a blob with dimensions, {N: 1, C: 25, H: 20, W: 20}, and format, NHWC:
    InferenceEngine::BlockingDesc({1, 20, 20, 25}, {0, 2, 3, 1}); // or
    InferenceEngine::BlockingDesc({1, 20, 20, 25}, InferenceEngine::Layout::NHWC);
  • If you have memory with real dimensions {N: 1, C: 25, H: 20, W: 20}, but with channels that are blocked by 8, define the memory with parameters:
    InferenceEngine::BlockingDesc({1, 4, 20, 20, 8}, {0, 1, 2, 3, 1})
  • Set strides and offsets if the layout contains them. If your blob layout is complex and you don't want to calculate the real offset to data, use InferenceEngine::TensorDesc::offset(size_t l) or InferenceEngine::TensorDesc::offset(SizeVector v).
    For example:
    InferenceEngine::BlockingDesc blk({1, 4, 20, 20, 8}, {0, 1, 2, 3, 1});
    InferenceEngine::TensorDesc tdesc(FP32, {1, 25, 20, 20}, blk);
    tdesc.offset(0); // = 0
    tdesc.offset(1); // = 8
    tdesc.offset({0, 0, 0, 2}); // = 16
    tdesc.offset({0, 1, 0, 2}); // = 17
  • If you want to create a TensorDesc with a planar format and for N dimensions (N can be different 1, 2, 4 and etc), use:
    InferenceEngine::TensorDesc::getLayoutByDims.
    InferenceEngine::TensorDesc::getLayoutByDims({1}); // InferenceEngine::Layout::C
    InferenceEngine::TensorDesc::getLayoutByDims({1, 2}); // InferenceEngine::Layout::NC
    InferenceEngine::TensorDesc::getLayoutByDims({1, 2, 3, 4}); // InferenceEngine::Layout::NCHW
    InferenceEngine::TensorDesc::getLayoutByDims({1, 2, 3}); // InferenceEngine::Layout::BLOCKED
    InferenceEngine::TensorDesc::getLayoutByDims({1, 2, 3, 4, ...}); // InferenceEngine::Layout::BLOCKED
    Documentation for Intel(R) Deep Learning Deployment Toolkit Developer Guide

Supported Devices 

The Inference Engine can infer models in different formats with various input and output formats. This section provides supported and optimal configurations per device.

The Inference Engine provides unique capabilities to infer deep learning models on the following device types with corresponding plugins:

PluginDevice type
GPU pluginIntel® Processor Graphics, including Intel® HD Graphics and Intel® Iris® Graphics
CPU pluginIntel® Xeon® with Intel® AVX2 and AVX512, Intel® Core™ Processors with Intel® AVX2, Intel® Atom® Processors with Intel® SSE
FPGA pluginIntel® Arria® 10 GX FPGA Development Kit, Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA
MYRIAD pluginIntel® Movidius™ Myriad™ 2 Vision Processing Unit
GNA pluginIntel® Speech Enabling Developer Kit, Amazon Alexa* Premium Far-Field Developer Kit, Intel® Pentium® Silver processor J5005, Intel® Celeron® processor J4005, Intel® Core™ i3-8121U processor
Heterogeneous pluginEnables computing for inference on one network on several Intel® devices

Supported Configurations

The Inference Engine can inference models in different formats with various input and output formats. This chapter provides supported and optimal configurations for each plugin.

Supported Model Formats

PluginFP32FP16
CPU pluginSupported and preferredNot supported
GPU pluginSupportedSupported and preferred
FPGA pluginSupportedSupported
MYRIAD pluginNot supportedSupported
GNA pluginSupportedNot supported

Supported Input Precision

PluginFP32FP16U8U16I8I16
CPU pluginSupportedNot supportedSupportedSupportedNot supportedSupported
GPU pluginSupportedSupported* 
See the NOTE below
Supported*Supported*Not supportedSupported*
FPGA pluginSupportedSupported*
See the NOTE below
SupportedSupportedNot supportedSupported
MYRIAD pluginSupportedSupportedSupported and preferredNot supportedNot supportedNot supported
GNA pluginSupportedNot SupportedNot supportedNot supportedSupportedSupported

NOTE: Supported through SetBLob only. GetBlob returns FP32. Supported without mean image.

Supported Output Precision

PluginFP32FP16
CPU pluginSupportedNot supported
GPU pluginSupportedSupported
FPGA pluginSupportedSupported
MYRIAD pluginSupportedSupported and preferred
GNA pluginSupportedNot supported

 

Supported Input Layout

PluginNCHWNHWCNC
CPU pluginSupportedNot SupportedNot supported
GPU pluginSupportedNot SupportedNot supported
FPGA pluginSupportedNot SupportedNot supported
MYRIAD pluginSupportedSupported and PreferredNot supported
GNA pluginNot supportedNot supportedSupported

 

Supported Output Layout

Number of Dimension4321
LayoutNCHWCHWNCC

CPU Plugin 

The CPU plugin provides an opportunity for high-performance scoring of neural networks on the Intel® CPU devices using the Intel® Math Kernel Library for Deep Neural Networks (Intel® MKL-DNN).

The CPU plugin uses OpenMP* to parallelize calculations.

Supported Layers

  • BatchNorm
  • Clamp
  • Concat
  • Convolution
  • Crop
  • Deconvolution
  • Eltwise
  • ELU
  • FullyConnected
  • Logistic
  • LRN
  • Permute
  • Pooling
  • Power
  • ReLU
  • Reshape
  • ROIPooling
  • ScaleShift
  • Softmax
  • Split
  • TanH
  • Tile

The set of supported layers can be expanded with the extensibility library. To add a new layer in this library, use the extensibility mechanism.

Supported Platforms

The OpenVINO toolkit is supported and validated on these platforms:

Host64-bit OS
Development
  • Ubuntu* 16.04
  • CentOS 7.4/MS
  • Windows* 10
Target
  • Ubuntu* 16.04
  • CentOS 7.4/MS
  • Windows* 10

The CPU plugin supports inference on Intel® Xeon® with Intel® AVX2 and AVX512, Intel® Core™ Processors with Intel® AVX2, Intel Atom® Processors with Intel® SSE.

Use the -pc flag for samples to learn which configuration is used by a layer. -pc shows execution statistics to use for information about the layer name, execution status, layer type, execution time, and the type of the execution primitive.

Internal CPU Plugin Optimizations

The CPU Plugin supports several graph optimization algorithms:

  • Merging of group convolutions. If topology contains the next pipeline. The CPU Plugin merges it into the one Convolution with the group parameter (Convolutions should have the same parameters).
    Merging of group convolution
  • Fusing Convolution with ReLU or ELU. CPU plugin is fusing all Convolution with ReLU or ELU layers if these layers are located after the Convolution layer.
  • Removing the power layer. CPU plugin removes Power layer from topology if it has the following parameters: power = 1, scale = 1, offset = 0.
  • Fusing Convolution + Sum or Convolution + Sum + ReLu. To improve performance, the CPU plugin fuses the following structure:
     Fusing Convolution + Sum or Convolution + Sum + Relu
    This fuse allows you to upgrade the graph to the following structure: 
    Upgrade the graph optimization algorithm graph

Supported Configuration Parameters

The plugin supports the configuration parameters listed below. All parameters must be set before calling InferenceEngine::IInferencePlugin::LoadNetwork().

Parameter NameParameter ValuesDefaultDescription
KEY_CPU_BIND_THREADYES/NOYESAllows to bind OpenMP threads. It means that the number of OpenMP threads are equal to the number of HW cores if the value is YES.
KEY_DYN_BATCH_LIMITnumberNetwork batch sizeAllows to set the batch size to all following Infer calls. If the input blob has sizes 32x3x224x224 after applying plugin.SetConfig({KEY_DYN_BATCH_LIMIT, 10}) Inference Engine primitives process only beginner sub blobs with size 10x3x224x224. This value can be changed before any Infer call to specify a new limit
EXCLUSIVE_ASYNC_REQUESTSYES/NONOEnables exclusive mode for async requests of different executable networks and the same plugin.
PERF_COUNTYES/NONOEnables performance counters option
CPU Extensions

The CPU extensions library contains code of important layers that do not come with the CPU plugin. You should compile this library and use the AddExtension method in your application to load the extensions when for models featuring layers from this library. See other samples for AddExtension code examples.

When you compile the entire list of the samples, the cpu_extension library is also compiled.

For performance, the library's cmake script detects your computer configuration and enables platform optimizations. Alternatively, you can explicitly use cmake flags: -DENABLE_AVX2=ON, -DENABLE_AVX512F=ON or -DENABLE_SSE42=ON when cross-compiling this library for another platform.

List of layers that come in the library:

  • ArgMax
  • CTCGreedyDecoder
  • DetectionOutput
  • GRN
  • Interp
  • MVN
  • Normalize
  • PowerFile
  • PReLU
  • PriorBox
  • PriorBoxClustered
  • Proposal
  • PSROIPooling
  • Resample
  • SimplerNMS
  • SpatialTransformer

Use the extensibility mechanism to add a layer. For information, see Adding Your Own Kernels in the Inference Engine.


GPU Plugin 

The GPU plugin uses the Intel® Compute Library for Deep Neural Networks to infer deep neural networks. This is an open source performance library for Deep Learning applications intended for acceleration of deep learning inference on Intel® Processor Graphics, including Intel® HD Graphics and Intel® Iris® Graphics.

Supported Layers

  • Activation (ReLU, Sigmoid, Logistic, TanH, ELU, Clamp)
  • BatchNormalization
  • Concatenate
  • Convolution
  • Copy
  • Crop
  • Deconvolution
  • DetectionOutput
  • Eltwise
  • Flatten
  • FullyConnected
  • LRN
  • Normalize
  • Permute
  • Pooling
  • Power
  • PReLU
  • PriorBox
  • Proposal
  • PSROIPooling
  • Reshape
  • ROIPooling
  • ScaleShift
  • SimplerNMS
  • SoftMax
  • Split
  • Upsampling

Supported Optimizations

  • Fused layers:
    • Convolution - Activation
    • Deconvolution - Activation
    • Eltwise - Activation
    • Fully Connected - Activation
  • Layers optimized out when conditions allow:
    • Crop
    • Concatenate
    • Reshape
    • Flatten
    • Split
    • Copy
  • Layers executed during load time (not during inference):
    • PriorBox

CPU Executed Layers

The CPU plugin does not accelerate the following layers. They are executed on the host CPU instead.

  • Proposal
  • SimplerNMS
  • PriorBox
  • DetectionOutput

Supported Configuration Parameters

The plugin supports the configuration parameters listed below. All parameters must be set before calling InferenceEngine::IInferencePlugin::LoadNetwork().

NameValueDefaultDescription
KEY_PERF_COUNTYES / NONOCollect performance counters during inference
KEY_CONFIG_FILE"file1 [file2 ...]"""Load custom layer configuration files
KEY_DUMP_KERNELSYES / NONODump the final kernels used for custom layers
KEY_TUNING_MODE

TUNING_DISABLED

TUNING_CREATE

TUNING_USE_EXISTING

TUNING_DISABLED

Disable inference kernel tuning

Create tuning file (expect much longer runtime)

Use an existing tuning layer

KEY_TUNING_FILE"filename"""Tuning file to create / use
KEY_PLUGIN_PRIORITY<0-3>0OpenCL queue priority
KEY_PLUGIN_THROTTLE<0-3>0OpenCL queue throttling

Debug Capabilities in the GPU Plugin

The GPU plugin provides the possibility to dump the user custom OpenCL™ kernels to a file to allow you to  debug compilation issues in your custom kernels.

The application can use the SetConfig() function with the key PluginConfigParams::KEY_DUMP_KERNELS and value: PluginConfigParams::YES. Then during network loading, all custom layers print their OpenCL kernels with the JIT instrumentation added by the plugin. The kernels are stored in the working directory under files named in the format: clDNN_program0.cl, clDNN_program1.cl

The Debug option is disabled by default. Additionally, the application can call the SetConfig() function with the PluginConfigParams::KEY_DUMP_KERNELS key and value: PluginConfigParams::NO before network loading.

To verify that Debug option is disabled:

  1. Delete all clDNN_program*.cl files from the current directory
  2. Run your application to load a network
  3. Examine the working directory for the presence of any kernel file, such as clDNN_program0.cl

FPGA Plugin 

The FPGA plugin was developed for high performance scoring of neural networks on Intel® FPGA devices.

NOTE: It is assumed that you have already set up either the Intel® Arria® 10 GX FPGA Development Kit or the Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA with instructions from Installing the OpenVINO™ Toolkit for Linux* with FPGA Beta Support.

Supported Platforms

The OpenVINO™ toolkit is officially supported and validated on the following FPGA setup:

HostOS (64-bit)Platform
Development
  • Ubuntu* 16.04
  • CentOS* 7.4
6th-8th Generation Intel® Core™ Processors, Intel® Xeon® v5 family, Xeon® v6 family
Target
  • Ubuntu* 16.04
  • CentOS* 7.4
Intel® Arria® 10 GX FPGA Development Kit or Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA

Supported Layers

  • BatchNorm (being converted to ScaleShift layer by Model Optimizer)
  • Concat
  • Convolution (dilated convolutions are supported, depthwise are not supported)
  • Eltwise (operation sum is supported)
  • Fully Connected
  • LRN Normalization
  • Pooling
  • Power (scale and offset parameters are supported)
  • ReLu (with negative slope)
  • ScaleShift

NOTE: Support is limited to the specific parameters (depending on the bitstream).

Heterogeneous Execution

If topology contains layers that are not supported on FPGA, you should use the Heterogeneous plugin with dedicated fallback device.

If network has layers not supported neither in the FPGA plugin nor in a fallback plugin, you can implement custom layer on CPU/GPU and use the extensibility mechanism described in the Adding Your Own Kernels in the Inference Engine section.
In addition to implementing custom kernels, you still have to point the CPU plugin or GPU plugin as fallback devices for the heterogeneous plugin.

Supported Networks

NetworkBitstreams (Intel® Arria® 10 GX DevKit)Bitstreams (Programmable Acceleration Card with Intel® Arria® 10 GX FPGA)
AlexNetarch4, arch6, arch7, arch8, arch16arch2, arch3, arch10, arch11, arch12, arch16, arch17, arch23, arch25, arch26
GoogleNet v1arch6, arch7, arch8, arch16arch2, arch3, arch11, arch12, arch16, arch17, arch23, arch25, arch26
VGG-16arch6, arch16arch2, arch3, arch16
VGG-19arch6, arch16arch2, arch3, arch16
SqueezeNet v 1.0arch6, arch7, arch8, arch14arch2, arch3, arch9, arch11, arch12, arch16, arch17, arch18, arch20, arch23, arch25, arch26
SqueezeNet v 1.1arch6, arch7, arch8, arch14arch2, arch3, arch9, arch11, arch12, arch16, arch17, arch18, arch20, arch23, arch25, arch26
ResNet-18arch8, arch9, arch19arch2, arch3, arch9, arch16, arch20
ResNet-50arch9, arch19arch3, arch9, arch20
ResNet-101arch9, arch19arch3, arch9, arch20
ResNet-18arch8, arch9, arch19arch2, arch3, arch9, arch16, arch20
ResNet-152arch9, arch19N/A
SqueezeNet-based variant of the SSD*arch9N/A
GoogLeNet-based variant of SSDarch6, arch7, arch8, arch9N/A
VGG-based variant of SSDarch6N/A

Translation from Architecture to FPGA Bitstream Files 

In addition to the network topologies listed above, arbitrary topologies with big continues subgraph consisting of layers supported by the FPGA plugin are recommended to be executed on the FPGA plugin.

Various FPGA bitstreams that support CNN are available in the OpenVINO™ toolkit package for FPGA.

To select the correct bistream (.aocx) file for an architecture, follow the steps below:

  • Select a network (for example, Resnet-18) from the table above for either the Intel Arria 10 GX Development Kit or the Intel Programmable Acceleration Card with Intel Arria 10 GX FPGA and note the corresponding architecture. For example, for the Intel Arria 10 GX Development Kit, the suitable architectures for Resnet-18 are arch8, arch9 and arch19.
  • Pick a bitsream to program. A few bitstreams are available in the package:
    BoardLocation of Bitstreams
    Intel® Arria® 10 GX FPGA Development Kit<INSTALL_DIR>/a10_devkit_bitstreams/
    Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA<INSTALL_DIR>/rush_creek_bitstreams/

    The rest of the bitstreams can be downloaded from the Web:

    BoardLocation of Bitstreams
    Intel® Arria® 10 GX FPGA Development Kithttp://registrationcenter-download.intel.com/akdlm/irc_nas/12954/A10_DevKit_bitstreams.zip
    Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGAhttp://registrationcenter-download.intel.com/akdlm/irc_nas/12954/A10_bitstreams.zip

Programming the Bitstream into the FPGA 

After picking the right bitstream, program the bitstream to the FPGA. The FPGA must be set up prior to this step. Please, refer to the FPGA installation page for instructions on how to set up the Intel® Arria® 10 GX FPGA Development Kit or the Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA.

Run the following commands:

source /opt/altera/aocl-pro-rte/aclrte-linux64/init_opencl.sh
aocl program <bitstream file from above section>.aocx

Environment for Running the FPGA Plugin

To make the FPGA plugin running directly or through heterogeneous plugin, you need to set up an environment:

  • Set up an environment to access Intel® FPGA Linux* x86-64 Runtime Environment for OpenCL™:

    source /opt/altera/aocl-pro-rte/aclrte-linux64/init_opencl.sh
  • Set additional environment variables for the FPGA plugin from the table below:
    VariableSetting
    DLA_AOCXPath to the bitstream that can be programmed to the card. See the section Translation from Architecture to FPGA Bitstream Files for choosing a bitstream for your network and board. Note that it is not advised to program the bitstream during runtime, but it should be programmed beforehand. If you want to program the bitstream during runtime, you also need to set CL_CONTEXT_COMPILER_MODE_INTELFPGA=1
    CL_CONTEXT_COMPILER_MODE_INTELFPGATo prevent the host application from programming the FPGA, set this variable to a value of 3. Please, program the bitstream in advance. Refer to Programming the Bitstream into the FPGA and FPGA installation page.
    ACL_PCIE_USE_JTAG_PROGRAMMINGSet this variable to a value of 1 to force FPGA reprogramming using JTAG

How to Interpret Performance Counters

As a result of collecting performance counters using InferenceEngine::IInferencePlugin::GetPerformanceCounts, you can find performance data about execution on FPGA, preprocessing and post-processing data, and data transferring from/to FPGA card.

If network is divided into two parts executed on CPU, you can find performance data about Intel® MKL-DNN kernels, their types, and other useful information.

FPGA Support Limitations for CNN

The FPGA Beta release implies certain limitations for the network topologies, kernel parameters and batch size.

  • Depending on the bitstream loaded on the target device, FPGA actually performs calculations with precision rates ranging from FP11 to FP16. This may have potential accuracy implications. Use the Validation App to verify the network accuracy on validation data set.
  • If networks have many non-supported layers on FPGA that stay in topologies between supported layers, it divides the graph into many subgraphs that might lead to CL_OUT_OF_HOST_MEMORY error. These topologies are not FPGA friendly for this release.
  • During the using of heterogeneous plugin, the affinity and distribution of nodes by devices depend on the bitstream. Some layers might not be supported by bitstream or parameters of the layer are not supported by bitstreams.
  • Any Fully-Connected layer can only be followed by another Fully-Connected (possibly with the ReLU) layer. No Convolution layer can follow a Fully-Connected layer, otherwise the graph verification fails and returns an error message.
  • Single output from a Fully-connected layer (potentially coupled with ReLU) is supported.
  • Several outputs from Convolution (and other layers except Fully-Connected) layer are supported, but this output cannot be passed to the other layers on the FPGA.
  • When executing on the FPGA, the first iteration is almost exclusively much slower than the next iterations. You can perform multiple iterations when assessing the performance of the inference.
  • Always consider batching for performance conclusions. Note that depending on the bitstream loaded on the FPGA, the batch size is typically limited to 96.

MYRIAD Plugin 

The MYRIAD plugin was developed for high performance scoring of neural networks on Intel® Movidius™ Myriad™ 2 Vision Processing Unit.

Supported Layers

  • BatchNormalization
  • Bias
  • Concatenate
  • Convolution
  • Copy
  • Crop
  • CTCDecoder
  • Deconvolution
  • DepthwiseConvolution
  • DetectionOutput
  • Eltwise (SUM, MAX, MUL)
  • ELU
  • Flatten
  • FullyConnected
  • Leaky ReLU
  • LRN
  • Normalize
  • Permute
  • Pooling (MAX, AVG)
  • Power
  • PReLU
  • PriorBox
  • PriorBoxClustered
  • ReLU
  • Reshape
  • Scale
  • ScaleShift
  • Sigmoid
  • Slice
  • SoftMax
  • Split
  • TanH
  • Tile

Installing USB Rules

To do inference on the Intel® Movidius™ Myriad 2™ Vision Processing Unit, install USB rules by running the commands:

cat <<EOF > 97-usbboot.rules
SUBSYSTEM=="usb", ATTRS{idProduct}=="2150", ATTRS{idVendor}=="03e7", GROUP="users", MODE="0666", ENV{ID_MM_DEVICE_IGNORE}="1"
SUBSYSTEM=="usb", ATTRS{idProduct}=="f63b", ATTRS{idVendor}=="03e7", GROUP="users", MODE="0666", ENV{ID_MM_DEVICE_IGNORE}="1"
EOF
sudo cp 97-usbboot.rules /etc/udev/rules.d/
sudo udevadm control --reload-rules
sudo udevadm trigger
sudo ldconfig
rm 97-usbboot.rules

Supported Configuration Parameters

NameValuesDefaultDescription
KEY_VPU_LOG_LEVELLOG_WARNING LOG_INFO LOG_DEBUGLOG_NONESet log level for devices
KEY_VPU_INPUT_NORMreal number1.0Normalization coefficient for the network input
KEY_VPU_INPUT_BIASreal number0.0Bias value that is added to each element of the network input
KEY_VPU_PRINT_RECEIVE_TENSOR_TIMEYES/NONOAdd device-side time spent to receive input to PerformanceCounts

 

Heterogeneous Plugin

The Heterogeneous plugin enables computing for inference on one network on several devices. Purposes to execute networks in Heterogeneous mode:

  • To utilize accelerators power and calculate heaviest parts of network on accelerator and execute not supported layers on fallback devices like CPU
  • To utilize all available hardware more efficiently during one inference

The execution through the Heterogeneous plugin can be divided into two steps:

  • Setting of affinity to layers (binding them to devices in InferenceEngine::ICNNNetwork)
  • Loading the network to the Heterogeneous plugin, splitting the network into parts and their execution through dedicated plugin.

These steps are decoupled. The setting of affinity can be done automatically using fallback policy or in manual mode.

The fallback automatic policy means greedy behavior and assigns all layers which can be executed on certain device on that device follow priorities.

Some topologies are not friendly or cannot be executed in heterogeneous execution on some devices. These networks might be have activation layers that are't supported on the primary device. If transmitting data from one part of the network to another in heterogeneous mode is time-consuming, then it does not make sense to execute these data in heterogeneous mode on these devices. Instead, define the heaviest part manually and set affinity to avoid sending data back and forth several times in one inference.

Annotation of Layers per Device and Default Fallback Policy

Default fallback policy decides which layer goes to which device automatically according to the support in dedicated plugins (FPGA, GPU, CPU).

Another way to annotate a network is setting affinity manually using CNNLayer::affinity field. This field accepts string values of devices like "CPU" or "FPGA".

The fallback policy does not work if even one layer has initialized affinity. The sequence should be calling of automating affinity settings and then fix manually.

// This example demonstrate how to do default affinity initialization and then
// correct affinity manually for some layers
InferenceEngine::PluginDispatcher dispatcher({ FLAGS_pp, archPath , "" });
InferenceEngine::InferenceEnginePluginPtr enginePtr;
enginePtr = dispatcher.getPluginByDevice("HETERO:FPGA,CPU");
HeteroPluginPtr hetero(enginePtr);
hetero->SetAffinity(network, { }, &resp);
network.getLayerByName("qqq")->affinity = "CPU";
InferencePlugin plugin(enginePtr);
auto executable_network = plugin.LoadNetwork(network, {});

If you rely on default affinity distribution, you can avoid calling IHeteroInferencePlugin::SetAffinity by calling ICNNNetwork::LoadNetwork instead:

InferenceEngine::PluginDispatcher dispatcher({ FLAGS_pp, archPath , "" });
InferenceEngine::InferenceEnginePluginPtr enginePtr;
enginePtr = dispatcher.getPluginByDevice("HETERO:FPGA,CPU");
InferencePlugin plugin(enginePtr);
CNNNetReader reader;
reader.ReadNetwork("Model.xml");
reader.ReadWeights("Model.bin");
auto executable_network = plugin.LoadNetwork(network, {});

Splitting the Network and Execution

While loading to the Heterogeneous plugin, network is divided to several parts and loaded to dedicated plugins. Intermediate blobs between these subgraphs are allocated automatically in the most efficient way.

Execution Precision

Precision for inference in the Heterogeneous plugin is defined by:

  • Precision of the Intermediate Representation
  • Ability of final plugins to execute in precision defined in the Intermediate Representation

Examples:

  • To execute Intel® Integrated Graphics with a CPU fallback with the FP16 on Intel® Integrated Graphics, use only FP16 for the Intermediate Representation. The Heterogeneous plugin converts the weight from FP16 to FP32 for execution on the CPU.
  • To execute on FPGA with a CPU fallback, use any precision for the Intermediate Representation. The execution on FPGA is defined by bitstream, the execution on CPU happens in FP32.

Use these samples with the command:

 ./object_detection_sample_ssd -m <path_to_model>/ModelSSD.xml -i <path_to_pictures>/picture.jpg -d HETERO:FPGA,CPU

where:

  • HETERO is the Heterogeneous plugin 
  • FPGA,CPU is the fallback policy with the priority on FPGA and the fallback to the CPU

You can point more than two devices: for example, -d HETERO:FPGA,GPU,CPU

Analyzing With the Heterogeneous Execution

After enabling the KEY_HETERO_DUMP_GRAPH_DOT config key, dump the GraphViz* .dot files with annotations of devices per layer.

The Heterogeneous plugin can generate two files:

  • hetero_affinity.dot - annotation of affinities per layer. This file is written to the disk only if the default fallback policy is executed.
  • hetero_subgraphs.dot - annotation of affinities per graph. This file is written to the disk during the execution of ICNNNetwork::LoadNetwork() for the heterogeneous plugin.
    #include "ie_plugin_config.hpp"
    #include "hetero/hetero_plugin_config.hpp"
    using namespace InferenceEngine::PluginConfigParams;
    using namespace InferenceEngine::HeteroConfigParams;
    ...
    enginePtr = dispatcher.getPluginByDevice("HETERO:FPGA,CPU");
    InferencePlugin plugin(enginePtr);
    plugin.SetConfig({ {KEY_HETERO_DUMP_GRAPH_DOT, YES} });

Use the GraphViz* utility or converters to create .png formats. On Ubuntu* operating system, you can use the following utilities:

  • sudo apt-get install xdot
  • xdot hetero_subgraphs.dot

Use option -pc, in sample data to get performance data on each subgraph.

Output example for Googlenet v1 running on FPGA with a fallback to the CPU:

subgraph1: 1. input preprocessing (mean data/FPGA):EXECUTED       layerType:                    realTime: 129        cpu: 129            execType:
subgraph1: 2. input transfer to DDR:EXECUTED       layerType:                    realTime: 201        cpu: 0              execType:
subgraph1: 3. FPGA execute time:EXECUTED       layerType:                    realTime: 3808       cpu: 0              execType:
subgraph1: 4. output transfer from DDR:EXECUTED       layerType:                    realTime: 55         cpu: 0              execType:
subgraph1: 5. FPGA output postprocessing:EXECUTED       layerType:                    realTime: 7          cpu: 7              execType:
subgraph1: 6. softmax/copy:   EXECUTED       layerType:                    realTime: 2          cpu: 2              execType:
subgraph2: out_prob:          NOT_RUN        layerType: Output             realTime: 0          cpu: 0              execType: unknown
subgraph2: prob:              EXECUTED       layerType: SoftMax            realTime: 10         cpu: 10             execType: ref
Total time: 4212     microseconds

GNA Plugin 

The GNA plugin is developed for low power scoring of neural networks on the Intel® Speech Enabling Developer Kit, the Amazon Alexa* Premium Far Field Developer Kit, Intel® Pentium® Silver Processor J5005, Intel® Celeron® Processor J4005, Intel® Core™ i3-8121U Processor, and others.

Supported Layers

The following layers are supported by the plugin:

  • Bias
  • Convolution
  • Copy
  • Eltwise
  • FullyConnected
  • Leaky ReLU
  • Pooling
  • PReLU
  • Recurrent
  • ScaleShift
  • Sigmoid
  • TanH

Supported Networks

The following networks have been tested in this release:

  • Kaldi* Nnet framework:
    • wsj_dnn5b_smbr
    • wsj_cnn4b_smbr
    • rm_lstm4f
    • rm_cnn4a_smbr
    • tedlium_dnn4_smbr
    • tedlium_lstm4f
  • TensorFlow* framework: Not tested in this release

BIOS, Library, and Drivers

This release was tested on Intel® NUC7CJYH with BIOS Update [JYGLKCPX.86A] Version: 0037, GNA library version 01.00.00.1317, and GNA driver version 01.00.00.1310 (for Windows* and Linux* OSs).

Supported Configuration Parameters

The plugin supports the configuration parameters listed below. The parameters are passed as std::map<std::string, std::string> on InferenceEngine::InferencePlugin::LoadNetwork.

Parameter NameParameter ValuesDefaultDescription
GNA_COMPACT_MODEYES/NOYESReuse I/O buffers to save space (makes debugging harder)
GNA_SCALE_FACTORFP32 number1.0Scale factor to use for input quantization
KEY_GNA_DEVICE_MODECPU/GNA_AUTO/GNA_HW/GNA_SW/GNA_SW_EXACTGNA_AUTOExecution mode (CPU, GNA, and emulation modes)
KEY_GNA_FIRMWARE_MODEL_IMAGEstring""Name for embedded model binary dump file
KEY_GNA_PRECISIONI16/I8I16Hint to GNA plugin: preferred integer weight resolution for quantization
KEY_PERF_COUNTYES/NONOTurn on performance counter reporting

Overview of Inference Engine Python* API

NOTE: This is a preview version of the Inference Engine Python* API for evaluation purpose only. Module structure and API itself will be changed in future releases.

This API provides a simplified interface for the Inference Engine functionality that allows you to:

  • Handle the models
  • Load and configure Inference Engine plugins based on device names
  • Perform inference in synchronous and asynchronous modes with arbitrary number of infer requests (the number of infer requests may be limited by target device capabilities)

Supported OSes

Currently the Inference Engine Python* API is supported on Ubuntu* 16.04 and Microsoft Windows* 10 64-bit OSes.
Supported Python* versions:

  • On Ubuntu 16.04: 2.7, 3.5
  • On Windows 10: 2.7, 3.5, 3.6

Setting Up the Environment

To configure the environment for the Inference Engine Python* API, run:

  • On Ubuntu 16.04: source <INSTALL_DIR>/bin/setupvars.sh
  • On Windows 10: call <INSTALL_DIR>\python\setenv.bat

The script automatically detects the latest installed Python* version and configures the environment if the latest installed Python version is supported.

If you want to use a specific supported Python* version, set the environment variable manually after you run the environment configuration script. Use the command for your operating system and replace python<python_version> with python2.7 or python3.5 or python3.6 for Windows or python2.7 or python3.5 for Linux.

  • Ubuntu 16.04:
    set PYTHONPATH=<INSTALL_DIR>/python/python<python_version>:<INSTALL_DIR>/deployment_tools/model_optimizer:$PYTHONPATH
    set PATH=<INSTALL_DIR>/deployment_tools/model_optimizer:$PATH
  • Windows: 
    set PYTHONPATH=<INSTALL_DIR>\python\python<python_version>
    set PATH=<INSTALL_DIR>\deployment_tools\inference_engine\bin\intel64\Release;<INSTALL_DIR>\opencv\x64\vc14\bin;%PATH%

IENetwork Class

This class contains the information about the network model read from IR and allows you to manipulate with some model parameters such as layers affinity and output layers.

Class Constructor

There is no explicit class constructor. Use from_ir class method to read the Intermediate Representation (IR) and initialize a correct instance of the IENetwork class.

Class Methods

  • from_ir(model: str, weights: str)

    • Description:

      The class method serves to read the model from the .xml and .bin files of the IR.

    • Parameters:

      • model - path to .xml file of the IR
      • weights - path to .bin file of the IR
    • Return value:

      An instance of the IENetwork class

    • Usage example:

      >>> net = IENetwork.from_ir(model=path_to_xml_file, weights=path_to_bin_file)
      >>> net
      <inference_engine.ie_api.IENetwork object at 0x7fd7dbce54b0>

Instance Methods

  • get_layers()

    • Description:

      Gets all layers in the model, layers attributes, such as type, precision, name, affinity, and layer specific parameters

    • Parameters:

      None

    • Return value:

      Returns a dictionary with a layer name as the key and a dictionary of the layer attributes as the value

    • Usage example:

      >>> net = IENetwork.from_ir(model=path_to_xml_file, weights=path_to_bin_file) 
      >>> net.get_layers() 
      {'conv5_1/dwise': {'type': 'Convolution', 'affinity': '', 'precision': 'FP32', 'name': 'conv5_1/dwise', 'params': {'pad-x': '1', 'output': '576', 'group': '576', 'pad-y': '1', 'stride-x': '1', 'stride': '1,1,1,1', 'dilation-y': '1', 'kernel-x': '3', 'kernel-y': '3', 'dilation-x': '1', 'stride-y': '1'} } }
  • add_outputs(outputs):

    • Description:

      The method serves to mark any intermediate layer as output layer to retrieve the inference results from the specified layers.

    • Parameters:

      • outputs - a list of layer names to be set as model outputs. In case of setting one layer as output, string with one layer can be provided.
    • Return value:

      None

    • Usage example:

      >>> net = IENetwork.from_ir(model=path_to_xml_file, weights=path_to_bin_file)
      >>> net.add_outputs(["conv5_1/dwise', conv2_1/expand'])]
      >>> net.outputs
      ['prob', 'conv5_1/dwise', 'conv2_1/expand']

      Note that the last layers (nodes without successors in graph representation of the model) are set as output by default. In the case above, prob layer is a default output and conv5_1/dwise, conv2_1/expand are user-defined outputs.

  • set_affinity(types_affinity_map: dict={}, layers_affinity_map: dict={})

    • Description:

      The method defines an affinity (target execution device) of certain layers based on the layer type or name. Affinity set by name has higher priority than affinity set by type and overrides it. Affinity setting is applicable only in case of using HETERO plugin (see IEPlugin description).

    • Parameters:

      • types_affinity_map - dictionary of layer types as key and target affinity as value
      • layers_affinity_map - dictionary of layer names as key and target affinity as value
    • Return value:

      None

    • Usage example:

      >>> net = IENetwork.from_ir(model=path_to_xml_file, weights=path_to_bin_file)
      >>> plugin = IEPlugin(device="HETERO:FPGA,CPU")
      >>> plugin.set_config({"TARGET_FALLBACK": "HETERO:FPGA,CPU"})
      >>> plugin.set_initial_affinity(net) 
      >>> net.set_affinity(types_affinity_map={"Convolution": "CPU", "Concat": "CPU"}, layers_affinity_map={"fire4/expand3x3/Conv2D": "FPGA"})

      To correctly set affinity for the network, you must first initialize and properly configure the HETERO plugin. set_config({"TARGET_FALLBACK": "HETERO:FPGA,GPU"}) function configures the plugin fallback devices and their order. plugin.set_initial_affinity(net) function sets affinity parameter of model layers according to its support on specified devices.

      After default affinity is set by the plugin, override the default values by calling net.set_affinity(). All Convolution and Concat layers will be offloaded to CPU despite their support on FPGA. But the layer with name fire4/expand3x3/Conv2D will still be processed on FPGA, since setting layer affinity by layer name has higher priority than setting affinity by layer type.

      To understand how default and non-default affinities are set: 1. Call net.get_layers() function right after model loading and check layer affinity parameters in the output. 2. Call plugin.set_default_affinity(net).
      1. Call net.set_affinity().
        1. Call net.get_layers() and check layer affinity parameters to see how plugin set default affinity
        2. Call net.set_affinity(types_affinity_map={"Convolution": "CPU", "Concat": "CPU"})
        3. Call net.get_layers() again and check layer affinity parameters to see how it was changed after net.set_affinity() call

      Please refer to affinity_setting_sample.py to see the full usage pipeline.

Properties

  • inputs - a dictionary of input layer name as a key and input data shape as a value

    • Usage example:

      >>> net = IENetwork.from_ir(model=path_to_xml_file, weights=path_to_bin_file)
      >>> net.inputs
      {'data': [1, 3, 224, 224]}
  • outputs - a list of output layer names

    • Usage example:

      >>> net = IENetwork.from_ir(model=path_to_xml_file, weights=path_to_bin_file)
      >>> net.outputs
      ['prob']

IEPlugin Class

This class is the main plugin interface and serves to initialize and configure the plugin.

Class Constructor

  • __init__(device: str, plugin_dirs=None)

    • Parameters:

      • device - target device name. Supported devices: CPU, GPU, FPGA, MYRIAD, HETERO
      • plugin_dirs - list of paths to plugin directories

Instance Methods

  • load(network: IENetwork, num_requests: int=1, config=None)

    • Description:

      Loads a network that was read from the IR to the plugin and creates an executable network from a network object. You can create as many networks as you need and use them simultaneously (up to the limitation of the hardware resources).

    • Parameters:

      • network - a valid IENetwork instance created by IENetwork.from_ir() method
      • num_requests - a positive integer value of infer requests to be created. Number of infer requests may be limited by device capabilities.
      • config - a dictionary of plugin configuration keys and their values
    • Return value:

      None

    • Usage example:

      >>> net = IENetwork.from_ir(model=path_to_xml_file, weights=path_to_bin_file)
      >>> plugin = IEPlugin(device="CPU")
      >>> exec_net = plugin.load(network=net, num_requsts=2)
      >>> exec_net
      <inference_engine.ie_api.ExecutableNetwork object at 0x7f5140bbcd38>
  • set_initial_affinity(net: IENetwork)
    • Description:

      Sets initial affinity for model layers according to the HETERO plugin logic. Applicable only if IEPlugin was initialized for HETERO device.

    • Parameters:

      • net - a valid instance of IENetwork
    • Return value:

      None

    • Usage example:

      See set_affinity method of the IENetwork class.

  • add_cpu_extension(extension_path: str)

    • Description:

      Loads extensions library to the plugin. Applicable only for CPU device and HETERO device with CPU

    • Parameters:

      • extension_path - a full path to CPU extensions library
    • Return value:

      None

    • Usage example:

      >>> plugin = IEPlugin(device="CPU")
      >>> plugin.add_cpu_extenstions(ext_lib_path)
  • set_config(config: dict)

    • Description:

      Sets a configuration for the plugin. Refer to SetConfig() in Inference Engine C++ documentation for acceptable keys and values list.

    • Parameters:
      • config - a dictionary of keys and values of acceptable configuration parameters
    • Return value:

      None

    • Usage examples:

      See set_affinity method of the IENetwork class.

Properties

  • device - a name of the device that was specified to initialize IEPlugin

ExecutableNetwork Class

This class represents a network instance loaded to plugin and ready for inference.

Class Constructor

To make a valid instance of ExecutableNetwork, use load() method of the IEPlugin class.

Instance Methods

  • infer(inputs=None)

    • Description:

      Starts synchronous inference for the first infer request of the executable network and returns output data. Wraps infer() method of the InferRequest class

    • Parameters:
      • inputs - a dictionary of input layer name as a key and numpy.ndarray of proper shape with input data for the layer as a value
    • Return value:

      A dictionary of output layer name as a key and numpy.ndarray with output data of the layer as a value

    • Usage example:

      >>> net = IENetwork.from_ir(model=path_to_xml_file, weights=path_to_bin_file)
      >>> plugin = IEPlugin(device="CPU")
      >>> exec_net = plugin.load(network=net, num_requsts=2)
      >>> res = exec_net.infer({'data': img})
      >>> res
      {'prob': array([[[[2.83426580e-08]],
                       [[2.40166020e-08]],
                       [[1.29469613e-09]],
                       [[2.95946148e-08]]
                       ......
                    ]])}

    For illustration of input data preparation, please see samples (for example, classification_sample.py).

  • start_async(request_id, inputs=None)

    • Description:

      Starts asynchronous inference for specified infer request. Wraps async_infer() method of the InferRequest class

    • Parameters:

      • request_id - index of infer request to start inference
      • inputs - a dictionary of input layer name as a key and numpy.ndarray of proper shape with input data for the layer as a value
    • Return value:

      A handler of specified infer request, which is an instance of the InferRequest class.

    • Usage example:

      >>> infer_request_handle = exec_net.start_async(request_id=0, inputs={input_blob: image})
      >>> infer_status = infer_request_handle.wait()
      >>> res = infer_request_handle.outputs[out_blob]

      For more details about infer requests processing, see classification_sample_async.py (simplified case) and object_detection_demo_ssd_async.py (real synchronous use case) samples.

Properties

  • requests - a tuple of InferRequest instances

    • Usage example:

      >>> net = IENetwork.from_ir(model=path_to_xml_file, weights=path_to_bin_file)
      >>> plugin = IEPlugin(device="CPU")
      >>> exec_net = plugin.load(network=net, num_requsts=2)
      >>> exec_net.requests
      (<inference_engine.ie_api.InferRequest object at 0x7f66f56c57e0>, <inference_engine.ie_api.InferRequest object at 0x7f66f56c58b8>, <inference_engine.ie_api.InferRequest object at 0x7f66f56c5900>)

InferRequest Class

This class provides an interface to infer requests of ExecutableNetwork and serves to handle infer requests execution and to set and get output data.

Class Constructor

To make a valid InferRequest instance, use load() method of the IEPlugin class with specified number of requests.

Instance Methods

It is not recommended to run inference directly on InferRequest instance. To run inference, please use simplified methods infer() and start_async() of ExecutableNetwork.

  • infer(inputs=None)

    • Description:

      Starts synchronous inference of the infer request and fill outputs array

    • Parameters:

      • inputs - a dictionary of input layer name as a key and numpy.ndarray of proper shape with input data for the layer as a value
    • Return value:

      None

    • Usage example:

      >>> exec_net = plugin.load(network=net, num_requests=2)
      >>> exec_net.requests[0].infer({input_blob: image})
      >>> res = exec_net.requests[0].outputs['prob']
      >>> np.flip(np.sort(np.squeeze(res)),0) 
      array([4.85416055e-01, 1.70385033e-01, 1.21873841e-01, 1.18894853e-01,
             5.45198545e-02, 2.44456064e-02, 5.41366823e-03, 3.42589128e-03,
             2.26027006e-03, 2.12283316e-03 ...]) 
  • async_infer(inputs=None)

    • Description:

      Starts asynchronous inference of the infer request and fill outputs array

    • Parameters:

      • inputs - a dictionary of input layer name as a key and numpy.ndarray of proper shape with input data for the layer as a value
    • Return value:

      None

    • Usage example:

      >>> exec_net = plugin.load(network=net, num_requests=2)
      >>> exec_net.requests[0].async_infer({input_blob: image})
      >>> exec_net.requests[0].wait()
      >>> res = exec_net.requests[0].outputs['prob']
      >>> np.flip(np.sort(np.squeeze(res)),0) 
      array([4.85416055e-01, 1.70385033e-01, 1.21873841e-01, 1.18894853e-01,
             5.45198545e-02, 2.44456064e-02, 5.41366823e-03, 3.42589128e-03,
             2.26027006e-03, 2.12283316e-03 ...]) 
  • wait(timeout=None)

    • Description:

      Waits for the result to become available. Blocks until specified timeout elapses or the result becomes available, whichever comes first.

      Note:

      There are special values of the timeout parameter:
      • 0 - immediately returns the inference status. It does not block or interrupt execution. To find statuses meaning, please refer to InferenceEngine::StatusCode in Inference Engine C++ documentation
      • -1 - waits until inference result becomes available
    • Parameters:

      • timeout - time to wait in milliseconds or special (0, -1) cases described above. If not specified, timeout value is set to -1 by default.
    • Usage example:

      See async_infer() method of the the InferRequest class.

Properties

  • inputs - a dictionary of input layer name as a key and numpy.ndarray of proper shape with input data for the layer as a value

  • outputs - a dictionary of output layer name as a key and numpy.ndarray with output data of the layer as a value

  • Usage example:

    >>> exec_net.requests[0].inputs['data'][:] = image
    >>> exec_net.requests[0].infer()
    >>> res = exec_net.requests[0].outputs['prob']
    >>> np.flip(np.sort(np.squeeze(res)),0) 
    array([4.85416055e-01, 1.70385033e-01, 1.21873841e-01, 1.18894853e-01,
           5.45198545e-02, 2.44456064e-02, 5.41366823e-03, 3.42589128e-03,
           2.26027006e-03, 2.12283316e-03 ...])

Known Issues

Multiple OpenMP Loadings

If the application uses the Inference Engine with third-party components that depend on Intel® OpenMP, multiple loadings of the libiomp library may occur and cause OpenMP runtime initialization conflicts. This might happen if the application uses the Intel® Math Kernel Library (Intel® MKL) through the “Single Dynamic Library” (libmkl_rt.so) mechanism and calls Intel® MKL after loading the Inference Engine plugin.

Error log report:

OMP: Error #15: Initializing libiomp5.so, but found libiomp5.so already initialized.
OMP: Hint: This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, see http://www.intel.com/software/products/support/.

Possible workarounds:

  • Preload the OpenMP runtime using the LD_PRELOAD variable:
    This eliminates multiple loadings of libiomp, and makes all components use this specific version of OpenMP.
    LD_PRELOAD=<path_to_libiomp5.so] <path_to your_executable]
  • Set KMP_DUPLICATE_LIB_OK=TRUE. This option might result in performance degradation or incorrect results.

Legal Information

You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel products described herein. You agree to grant Intel a non-exclusive, royalty-free license to any patent claim thereafter drafted which includes subject matter disclosed herein.

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.

All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest Intel product specifications and roadmaps.

The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.

Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn more at http://www.intel.com/ or from the OEM or retailer.

No computer system can be absolutely secure.

Intel, Arria, Core, Movidia, Movidius, Xeon, OpenVINO, and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.

OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos

*Other names and brands may be claimed as the property of others.

Copyright © 2018, Intel Corporation. All rights reserved.

For more complete information about compiler optimizations, see our Optimization Notice.