Inference Engine Samples

Image Classification Sample

Description

The Image Classification sample application does inference using image classification networks, like AlexNet* and GoogLeNet*. The sample application reads command line parameters and loads a network and an image to the Inference Engine plugin. When inference is done, the application creates an output image and outputs data to the standard output stream.

Running the Application

Running the application with the -h option results in the message:

$ ./classification_sample -h
InferenceEngine: 
    API version ............ <version>
    Build .................. <number>
classification_sample [OPTION]
Options:
    -h                      
                            Print a usage message.
    -i "<path1>" "<path3>"
                            Required. Path to a directory with images or path to an image files: a .ubyte file for LeNet*
                            and a .bmp file for the other networks.
    -m "<path>"             
                            Required. Path to an .xml file with a trained model.
        -l "<absolute_path>"
                            Optional. Absolute path to library with MKL-DNN (CPU) custom layers (*.so).
        Or
        -c "<absolute_path>"
                            Optional. Absolute path to GPUs custom layers config (*.xml).
    -pp "<path>"            
                            Path to a plugin directory.
    -d "<device>"           
                            Specify the target device to infer on: CPU, GPU, or Myriad. Sample will look for a suitable plugin for device specified
    -nt "<integer>"         
                            Number of top results (default 10)
    -ni "<integer>"         
                            Number of iterations (default 1)
    -pc                     
                            Enables per-layer performance report

Running the application with an empty list of options results in an error message and the usage list above.

To do inference on an image using a trained AlexNet network on Intel® Processors:

$ ./classification_sample -i <path_to_image>/cat.bmp -m <path_to_model>/alexnet_fp32.xml

Output Description

By default the application outputs top-10 inference results. Add the -nt option to the previous command to modify the number of top output results. For example, to get the top-5 results on Intel® HD Graphics, use the command:

$ ./classification_sample -i <path_to_image>/cat.bmp -m <path_to_model>/alexnet_fp32.xml

Image Classification Sample Async

Description

This sample demonstrates how to build and execute inference in pipelined mode on example of classifications networks.

The pipelined mode might increase the throughput of the pictures. The latency of one inference will be the same as for syncronous execution. The throughput is increased due to follow reasons:

  • Some plugins have heterogenity inside themselves. Transferring of data, execution on remote device, pre-processing and post-processing on the host
  • Using of explicit heterogenious plugin with execution of different parts of network on differnet devices

When two or more devices are involved in the inference process of one picture, creating several infer requests and starting asynchronous inference provides the most efficient way to utilize devices. If two devices are involved in execution, the number 2 is the optimal value for the -nireq option. To be effecient, the Classification Sample Async uses a round-robin algorithm for infer requests. the sample starts execution for the current infer request and switches to waiting for the results of the previous inference. After the wait time completes, the machine switches infer requests and repeats the procedure.

The number if iterations is a aspect for good throughput. With a large number of iterations you can emulate the real application work and see performance.

Batch mode is an independent attribute on the pipelined mode. The pipelined mode works efficiently with any batch size.

Upon the start-up the sample application reads the command line parameters and loads a network and an image to the Inference Engine plugin. Then, the application creates several infer requests pointed in -nireq parameter and loads pictures for inference.

Then, in the loop it starts inference for the current infer request and switch for waiting of another one. When the results are ready, infer requests are swapped.

When inference is done, the application outputs data to the standard output stream.

Running the Application

Running the application with the -h option results in the message:

./classification_sample_async -h
InferenceEngine:
    API version ............ 
    Build .................. 
classification_sample_async [OPTION]
Options:
    -h                      
                            Print a usage message.
    -i "" ""
                            Required. Path to a folder with images or path to an image files: a .ubyte file for LeNet
                            and a .bmp file for the other networks.
    -m ""             
                            Required. Path to an .xml file with a trained model.
        -l ""
                            Optional. Absolute path to library with MKL-DNN (CPU) custom layers (*.so).
        Or
        -c ""
                            Optional. Absolute path to clDNN (GPU) custom layers config (*.xml).
    -pp ""            
                            Path to a plugin folder.
    -d ""           
                            Specify the target device to infer on; CPU, GPU, FPGA or MYRIAD is acceptable. Sample will look for a suitable plugin for device specified
    -nt ""         
                            Number of top results (default 10)
    -ni ""         
                            Number of iterations (default 1)
    -pc                     
                            Enables per-layer performance report
    -nireq ""
                            Number of infer request for pipelined mode (default 1)

Output Description

By default the application outputs top-10 inference results for each infer request. In addition to this information it will provide throughput value measured in frames per seconds.


Security Barrier Camera Sample

Description

Showcases Vehicle Detection, followed by Vehicle Attributes and License Plate Recognition applied on top of Vehicle Detection. The results are in the <INSTALL_DIR>/deployment_tools/intel_models directory:

  • vehicle-license-plate-detection-barrier-0007: The primary detection network to find the vehicles and licence-plate
  • vehicle-attributes-recognition-barrier-0010: Executed on top of the results from vehicle-license-plate-detection-barrier-0007. The vehicle attributes execution barrier reports the general vehicle attributes, like the vehicle type and color, where type is something like car, van, or bus.
  • license-plate-recognition-barrier-0001: Executed on top of the results from vehicle-license-plate-detection-barrier-0007. The license plate recognition barrier network reports a string for each recognized license plate. For topology details, see the descriptions in the <INSTALL_DIR>/deployment_tools/intel_models

Other demonstration objectives:

  • Show images/video/camera as inputs, via OpenCV*
  • Show an example of simple network pipelining: Attributes and LPR networks are executed on top of the Vehicle Detection results
  • Show vehicle attributes and licence plate information for each detected vehicle

How it Works

The application reads command line parameters and loads the specified networks. The Vehicle/License-Plate Detection network is required, and the other two are optional.

Upon getting a frame from the OpenCV's VideoCapture the app performs inference of Vehicles/License-Plates, then performs another two inferences using Vehicle Attributes and LPR detection networks (if those specified in command line) and displays the results.

Running the Application

Running the application with the -h option results in the message:

$ ./security_barrier_sample -h 
InferenceEngine:
        API version ............ 1.0
    [ INFO ] Parsing input parameters
    interactive_vehicle_detection [OPTION]
    Options:
        -h                         Print a usage message.
        -i "<path>"                Required. Path to a video or image file. Default value is "cam" to work with camera.
        -m "<path>"                Required. Path to the Vehicle/License-Plate Detection model (.xml) file.
        -m_va "<path>"             Optional. Path to the Vehicle Attributes model (.xml) file.
        -m_lpr "<path>"            Optional. Path to the License-Plate Recognition model (.xml) file.
          -l "<absolute_path>"     For Intel® MKL-DNN (CPU)-targeted custom layers, if any. Absolute path to a shared library with the kernels impl.
              Or
          -c "<absolute_path>"     For GPU-targeted custom kernels, if any. Absolute path to the xml file with the kernels desc.
        -d "<device>"              Specify the target device for Vehicle Detection (CPU, GPU, FPGA, MYRYAD, or HETERO).
        -d_va "<device>"           Specify the target device for Vehicle Attributes (CPU, GPU, FPGA, MYRYAD, or HETERO).
        -d_lpr "<device>"          Specify the target device for License Plate Recognition (CPU, GPU, FPGA, MYRYAD, or HETERO).
        -pc                        Enables per-layer performance statistics.
        -r                         Output Inference results as raw values.
        -t                         Probability threshold for Vehicle/Licence-Plate detections.

Running the application with an empty list of options results in an error message and the usage list above.

Demonstration Output

The demonstration uses OpenCV* to display the resulting frame with detections rendered as bounding boxes and text:

License plate detection


Object Detection for Faster R-CNN Sample

Description

VGG16-Faster-RCNN is a public CNN that can be easily obtained from GitHub. 

The sample application reads command line parameters and loads a network and an image to the Inference Engine plugin. When inference is done, the application creates an output image and outputs data to the standard output stream.

Downloading and Converting a Caffe* Model

  1. Download test.prototxt from https://raw.githubusercontent.com/rbgirshick/py-faster-rcnn/master/models/pascal_voc/VGG16/faster_rcnn_end2end/test.prototxt
  2. Download the pretrained models from https://dl.dropboxusercontent.com/s/o6ii098bu51d139/faster_rcnn_models.tgz?dl=0
  3. Unzip the archive and make sure you have the file named VGG16_faster_rcnn_final.Caffe*model.

For correctly converting the source model, run the Model Optimizer with the extension for the Python proposal layer. To convert the source model:

python3 ${MO_ROOT_PATH}/mo_Caffe*.py --input_model <path_to_model]/VGG16_faster_rcnn_final.Caffe*model --input_proto <path_to_model]/deploy.prototxt --extensions <path_to_object_detection_sample]/fasterrcnn_extensions

Running the Application

Running the application with the -h option results in the message:

$ ./object_detection_sample -h
InferenceEngine: 
    API version ............ <version>
    Build .................. <number>
object_detection_sample [OPTION]
Options:
    -h                      
                            Print a usage message.
    -i "<path>"
                            Required. Path to an image file.
    -m "<path>"             
                            Required. Path to an .xml file with a trained model.
        -l "<absolute_path>"    
                            Optional. Absolute path to library with MKL-DNN (CPU) custom layers (*.so).
        Or
        -c "<absolute_path>"
                            Optional. Absolute path to GPU custom layers config (*.xml).
    -pp "<path>"            
                            Path to a plugin directory.
    -d "<device>"           
                            Specify the target device to infer on; CPU or GPU is acceptable. The sample looks for a suitable plugin for the device specified
    -ni "<integer>"         
                            Number of iterations (default 1)
    -pc                     
                            Enables per-layer performance report

Running the application with an empty list of options results in an error message and the usage list above.

Use the following command to do inference on Intel® Processors on an image using a trained Faster R-CNN network:

$ ./object_detection_sample -i <path_to_image>/inputImage.bmp -m <path_to_model>/faster-rcnn.xml -d CPU

Output Description

The application outputs an image named out_0.bmp with detected objects enclosed in rectangles. It outputs the list of classes of the detected objects along with the respective confidence values and the coordinates of the rectangles to the standard output stream.

Using this Sample with the Intel Person Detection Model

This model has a non-default (for Faster-RCNN) output layer name. To score it correctly, add the option --bbox_name detector/bbox/ave_pred to the command line.

Usage example:

./object_detection_sample -i /home/user/people.jpg -m <INSTALL_DIR>/deployment_tools/intel_models/person-detection-retail-0001/FP32/person-detection-retail-0001.xml --bbox_name detector/bbox/ave_pred -d CPU

Object Detection SSD, Async API Performance Showcase Sample

Description

This demonstration showcases Object Detection with SSD and new Async API. Async API usage can improve overall frame-rate of the application, because rather than wait for inference to complete, the app can continue doing things on the host, while accelerator is busy. Specifically, this demonstration keeps two parallel infer requests and while the current is processed, the input frame for the next is being captured. This essentially hides the latency of capturing, so that the overall framerate is rather determined by the MAXIMUM(detection time, input capturing time) and not the SUM(detection time, input capturing time).

The technique can be generalized to any available parallel slack, such as doing inference while simultaneously encoding the resulting (previous) frames, or running further inference, like emotion detection on top of the face detection results.

Be aware of performance caveats though. When running tasks in parallel, avoid over-using shared compute resources. For example, if performing inference on the FPGA with a mostly idle CPU, perform parallel tasks on the CPU. When doing inference on Intel® Integrated Graphics, you have little gain in tasks like having resulting video encoding on the same GPU in parallel because the device is already busy.

For more performance implications and tips for the Async API, see the Optimization Guide

Other demonstration objectives:

  • Video as input support via OpenCV*
  • Visualization of the resulting bounding boxes and text labels (from the .labels file) or class number (if no file is provided)
  • OpenCV* provides resulting bounding boxes, labels, and other information. You can copy and paste this code without pulling Inference Engine samples helpers into your application.
  • Demonstrate the Async API in action. For this, the demonstration features two modes with a Tab key toggle.
    • Old-style "Sync" way - The frame capturing with OpenCV* executes back-to-back with Detection
    • "Truly Async" way - The Detection is performed on the current frame, while the OpenCV* captures the next frame.

How it Works

The application reads command line parameters and loads a network to the Inference Engine. Upon getting a frame from the OpenCV*'s VideoCapture it performs inference and displays the results.

New "Async API" operates with new notion of the "Infer Request" that encapsulates the inputs/outputs and separates scheduling and waiting for result, next section. And here what makes the performance look different:

  1. In the default ("Sync") mode the frame is captured and then immediately processed, below in pseudo-code:
    while(true) {
        capture frame
        populate CURRENT InferRequest
        start CURRENT InferRequest //this call is async and returns immediately
        wait for the CURRENT InferRequest
        display CURRENT result
    }
    This is a reference implementation in which the new Async API is used in a serialized/synch fashion.
  2. In "true" Async mode, the frame is captured and then immediately processed:
    while(true) {
            capture frame
            populate NEXT InferRequest
            start NEXT InferRequest //this call is async and returns immediately
                wait for the CURRENT InferRequest (processed in a dedicated thread)
                display CURRENT result
            swap CURRENT and NEXT InferRequests
        }
    In this case, the NEXT request is populated in the main (app) thread, while the CURRENT request is processed. This is handled in the dedicated thread, internal to the Inference Engine runtime.

Async API

In this release, the Inference Engine offers a new API based on the notion of Infer Requests. With this API, requests encapsulate input and output allocation. You access the blob with the GetBlob method.

You can execute a request asynchronously in the background and wait until you need the result. In the meantime your application can continue:

// load plugin for the device as usual
  auto enginePtr = PluginDispatcher({"../../../lib/intel64", ""}).getSuitablePlugin(
                getDeviceFromStr("GPU"));
// load network
CNNNetReader network_reader;
network_reader.ReadNetwork("Model.xml");
network_reader.ReadWeights("Model.bin");
// populate inputs etc
auto input = async_infer_request.GetBlob(input_name);
...
// start the async infer request (puts the request to the queue and immediately returns)
async_infer_request->StartAsync();
// Continue execution on the host until you need the request results
//...
async_infer_request.Wait(IInferRequest::WaitMode::RESULT_READY);
auto output = async_infer_request.GetBlob(output_name);

You have no direct way to measure execution time of the infer request that is running asynchronously, unless you measure the Wait executed immediately after the StartAsync. But this essentially would mean the serialization and synchronous execution.

This is what sample does for the default "SYNC" mode and reports as a Detection time/fps message on the screen. In the truly asynchronous ("ASYNC") mode the host continues execution in the master thread, in parallel to the infer request. If the request is completed before than the Wait is called in the main thread (i.e. earlier than OpenCV* decoded a new frame), that reporting the time between StartAsync and Wait would obviously incorrect. That is why in the "ASYNC" mode the inference speed is not reported.

Running the Application

Running the application with the -h option results in the message:

$ ./object_detection_demo_ssd_async -h
InferenceEngine: 
    API version ............ [version]
    Build .................. 
object_detection_demo_ssd_async [OPTION]
Options:
    -h                      
                            Print a usage message.
    -i "[path]"
                            Required. Path to an video file. Use "cam" to capture input from the camera).
    -m "[path]"             
                            Required. Path to an .xml file with a trained model.
        -l "[absolute_path]"    
                            Optional. Absolute path to library with Intel® MKL-DNN (CPU) custom layers (*.so).
        Or
        -c "[absolute_path]"
                            Optional. Absolute path to GPU custom layers config (*.xml).
    -d "[device]"
                            Specify the target device to infer on; CPU, GPU, FPGA, and Intel® Movidius™ Myriad™ 2 Vision Processing Unit are accepted.
    -pc
                            Enables per-layer performance report.
    -t
                            Probability threshold for detections (default is 0.5).
    -r
                            Output inference results as raw values to the console.

Running the application with an empty list of options results in an error message and the usage list above.

Use the following command to do inference on Intel® Integrated Graphics device with an example pre-trained GoogleNet based SSD* available at https://software.intel.com/file/609199/download

Command Description

After reading through this demonstration, use this command to perform inference on a GPU with the SSD you download from https://software.intel.com/file/609199/download

$ ./object_detection_demo_ssd_async -i <path_to_video>/inputVideo.mp4 -m <path_to_model>/ssd.xml -d GPU

The network must be converted from the Caffe* (*.prototxt + *.model) to the Inference Engine format (*.xml + *bin) before using this command. See the Model Optimizer Developer Guide.

The only GUI knob is using 'Tab' to switch between the synchronized execution and the true Async mode.

Output Description

The output uses OpenCV* to display the resulting frame with detections rendered as bounding boxes and labels, if provided. In default mode, the sample reports:

  • OpenCV* time: Frame decoding + time to render the bounding boxes, labels, and display of the results.
  • Detection time: Inference time for the objection network. This is reported in SYNC mode.
  • Wallclock time: The combined application-level performance.

Object Detection with SSD-VGG Sample

Description

How to run the Object Detection sample application, which does inference using object detection networks like SSD-VGG on Intel® Processors and Intel® HD Graphics.

The sample application reads command line parameters and loads a network and an image to the Inference Engine plugin. When inference is done, the application creates an output image and outputs data to the standard output stream.

Running the Application

Running the application with the -h option results in the message:

$./object_detection_sample_ssd -h
InferenceEngine: 
    API version ............ <version>
    Build .................. <number>
object_detection_sample_ssd [OPTION]
Options:
    -h                      
                            Print a usage message.
    -i "<path>"
                            Required. Path to an image file.
    -m "<path>"             
                            Required. Path to an .xml file with a trained model.
        -l "<absolute_path>"    
                            Optional. Absolute path to library with MKL-DNN (CPU) custom layers (*.so).
        Or
        -c "<absolute_path>"
                            Optional. Absolute path to GPU custom layers config (*.xml).
    -pp "<path>"            
                            Path to a plugin directory.
    -d "<device>"           
                            Specify the target device to infer on; CPU, GPU or MYRIAD is acceptable. The sample looks for a suitable plugin for the specified device.
    -ni "<integer>"         
                            Number of iterations (default 1)
    -pc                     
                            Enables per-layer performance report

Running the application with an empty list of options results in an error message and the usage list above.

Use the following command to do inference on Intel® Processors on an image using a trained SSD network:

$ ./object_detection_sample_ssd -i <path_to_image>/inputImage.bmp -m <path_to_model>/VGG_ILSVRC2016_SSD.xml -d CPU

Output Description

The application outputs an image named out_0.bmp with detected objects enclosed in rectangles. It outputs the list of classes of the detected objects along with the respective confidence values and the coordinates of the rectangles to the standard output stream.


TensorFlow* Object Detection Mask R-CNNs Segmentation Sample

Description

This topic demonstrates how to run the Segmentation sample application, which does inference using image segmentation networks created with Object Detection API. Note that batch size 1 is supported only.

The sample has a post-processing part that gathers masks arrays corresponding to bounding boxes with high probability taken from the Detection Output layer. Then the sample produces picture with identified masks.

Running the Application

Running the application with the -h option yields the following usage message:

./mask_rcnn_sample -h
InferenceEngine: 
    API version ............ <version>
    Build .................. <number>

mask_rcnn_sample [OPTION]
Options:

    -h                      
                            Print a usage message.
    -i "<path1>"
                            Required. Path to a folder with an image or path to an image file.
    -m "<path>"             
                            Required. Path to an .xml file with a trained model.
        -l "<absolute_path>"    
                            Optional. Absolute path to library with MKL-DNN (CPU) custom layers (*.so).
        Or
        -c "<absolute_path>"
                            Optional. Absolute path to clDNN (GPU) custom layers config (*.xml).
    -pp "<<path>"            
                            Path to a plugin folder.
    -d "<device>"           
                            Specify the target device to infer on; CPU, GPU, FPGA or MYRIAD is acceptable. Sample will look for a suitable plugin for device specified

    -ni "<integer>"         
                            Number of iterations (default 1)
    -detection_output_name "<string>" 
                            Optional. The name of detection output layer (default: detection_output)
    -masks_name "<string>" 
                            Optional. The name of masks layer (default: masks)
    -pc                     
                            Enables per-layer performance report

Running the application with the empty list of options yields the usage message given above and an error message.

You can use the following command to do inference on Intel® Processors on an image using a trained network:

./mask_rcnn_sample -i /inputImage.bmp -m /faster_rcnn.xml

Output Description

The application output is a segmented image (out.png).

How it works

Upon the start-up the sample application reads command line parameters and loads a network and an image to the Inference Engine plugin. When inference is done, the application creates an output image.


Automatic Speech Recognition Sample

This topic shows how to run the speech sample application, which demonstrates acoustic model inference based on Kaldi neural networks and speech feature vectors.

Running

Usage

Running the application with the -h option yields the following usage message:

./speech_sample -h
InferenceEngine: 
    API version ............ <version>
    Build .................. <number>
speech_sample [OPTION]
Options:
    -h                      Print a usage message.
    -i "<path>"             Required. Path to an .ark file.
    -m "<path>"             Required. Path to an .xml file with a trained model (required if -rg is missing).
    -o "<path>"             Output file name (default name is scores.ark).
    -l "<absolute_path>"    Required for MKLDNN (CPU)-targeted custom layers.Absolute path to a shared library with the kernels impl.
    -d "<device>"           Specify the target device to infer on; CPU, GPU, GNA_AUTO, GNA_HW, GNA_SW, GNA_SW_EXACT are acceptable. Sample will look for a suitable plugin for device specified
    -p                      Plugin name. For example MKLDNNPlugin. If this parameter is pointed, the sample will look for this plugin only
    -pp                     Path to a plugin folder.
    -pc                     Enables performance report
    -q "<mode>"             Input quantization mode:  static (default), dynamic, or user (use with -sf).
    -qb "<integer>"         Weight bits for quantization:  8 or 16 (default)
    -sf "<double>"          Optional user-specified input scale factor for quantization (use with -q user).
    -bs "<integer>"         Batch size 1-8 (default 1)
    -r "<path>"             Read reference score .ark file and compare scores.
    -rg "<path>"            Read GNA model from file using path/filename provided (required if -m is missing).
    -wg "<path>"            Write GNA model to file using path/filename provided.
    -we "<path>"            Write GNA embedded model to file using path/filename provided.

Running the application with the empty list of options yields the usage message given above and an error message.

Model Preparation

You can use the following Model Optimizer command to convert a Kaldi nnet1 or nnet2 neural network to Intel IR format:

python3 mo.py --framework kaldi --input_model wsj_dnn5b_smbr.nnet --counts wsj_dnn5b_smbr.counts --remove_output_softmax

Assuming that the Model Optimizer (mo.py), Kaldi-trained neural network (wsj_dnn5b_smbr.nnet), and Kaldi class counts file (wsj_dnn5b_smbr.counts) are in the working directory, this command produces the Intel IR network consisting of wsj_dnn5b_smbr.xml and wsj_dnn5b_smbr.bin.

NOTE: wsj_dnn5b_smbr.nnet and other sample Kaldi models and data will be available in July 2018 in the OpenVINO Open Model Zoo.

Speech Inference

Once the IR is created, you can use the following command to do inference on Intel® Processors with the GNA co-processor (or emulation library):

./speech_sample -d GNA_AUTO -bs 2 -i wsj_dnn5b_smbr_dev93_10.ark -m wsj_dnn5b_smbr_fp32.xml -o scores.ark -r wsj_dnn5b_smbr_dev93_scores_10.ark

Here, the floating point Kaldi-generated reference neural network scores (wsj_dnn5b_smbr_dev93_scores_10.ark) corresponding to the input feature file (wsj_dnn5b_smbr_dev93_10.ark) are assumed to be available for comparison.

Sample Output

The acoustic log likelihood sequences for all utterances are stored in the Kaldi ARK file, scores.ark. If the -r option is used, a report on the statistical score error is generated for each utterance such as the following:

Utterance 0: 4k0c0301
   Average inference time per frame: 6.26867 ms
         max error: 0.0667191
         avg error: 0.00473641
     avg rms error: 0.00602212
       stdev error: 0.00393488

How It Works

Upon the start-up, the speech_sample application reads command line parameters and loads a Kaldi-trained neural network along with Kaldi ARK speech feature vector file to the Inference Engine plugin. It then performs inference on all speech utterances stored in the input ARK file. Context-windowed speech frames are processed in batches of 1-8 frames according to the -bs parameter. Batching across utterances is not supported by this sample. When inference is done, the application creates an output ARK file. If the -r option is given, error statistics are provided for each speech utterance as shown above.

GNA-Specific Details

Quantization

If the GNA device is selected (for example, using the -d GNA_AUTO flag), the GNA Inference Engine plugin quantizes the model and input feature vector sequence to integer representation before performing inference. Several parameters control neural network quantization:

  • The -q flag determines the quantization mode. Three modes are supported:
    • Static - In static quantization mode, the first utterance in the input ARK file is scanned for dynamic range. The scale factor (floating point scalar multiplier) required to scale the maximum input value of the first utterance to 16384 (15 bits) is used for all subsequent inputs. The neural network is quantized to accommodate the scaled input dynamic range.
    • Dynamic - In dynamic quantization mode, the scale factor for each input batch is computed just before inference on that batch. The input and network are (re)quantized on-the-fly using an efficient procedure.
    • User-defined - In user-defined quantization mode, the user may specify a scale factor via the -sf flag that will be used for static quantization.
  • The -qb flag provides a hint to the GNA plugin regarding the preferred target weight resolution for all layers. For example, when -qb 8 is specified, the plugin will use 8-bit weights wherever possible in the network. Note that it is not always possible to use 8-bit weights due to GNA hardware limitations. For example, convolutional layers always use 16-bit weights (GNA harware verison 1 and 2). This limitation will be removed in GNA hardware version 3 and higher.
Execution Modes

Several execution modes are supported via the -d flag:

  • If the device is set to CPU and the GNA plugin is selected, the GNA device is emulated in fast-but-not-bit-exact mode.
  • If the device is set to GNA_AUTO, the GNA hardware is used if available and the driver is installed. Otherwise, the GNA device is emulated in fast-but-not-bit-exact mode.
  • If the device is set to GNA_HW, the GNA hardware is used if available and the driver is installed. Otherwise, an error will occur.
  • If the device is set to GNA_SW, the GNA device is emulated in fast-but-not-bit-exact mode.
  • If the device is set to GNA_SW_EXACT, the GNA device is emulated in bit-exact mode.
Loading and Saving Models

The GNA plugin supports loading and saving the GNA-optimized model (non-IR) via the -rg and -wg flags. Thereby, it is possible to avoid the cost of full model quantization at run time. The GNA plugin also supports export of firmware-compatible embedded model images for the Intel® Speech Enabling Developer Kit and Amazon Alexa Premium Far-Field Voice Development Kit via the -we flag (save only).

In addition to performing inference directly from a GNA model file, these options make it possible to:

  • Convert from IR format to GNA format model file (-m, -wg)
  • Convert from IR format to embedded format model file (-m, -we)
  • Convert from GNA format to embedded format model file (-rg, -we)

Use of Sample in Kaldi Speech Recognition Pipeline

The Wall Street Journal DNN model used in this example was prepared using the Kaldi s5 recipe and the Kaldi Nnet (nnet1) framework. It is possible to recognize speech by substituting the speech_sample for Kaldi's nnet-forward command. Since the speech_sample does not yet use pipes, it is necessary to use temporary files for speaker- transformed feature vectors and scores when running the Kaldi speech recognition pipeline. The following operations assume that feature extraction was already performed according to the s5 recipe and that the working directory within the Kaldi source tree is egs/wsj/s5.

  1. Prepare a speaker-transformed feature set given the feature transform specified in final.feature_transform and the feature files specified in feats.scp:
    nnet-forward --use-gpu=no final.feature_transform "ark,s,cs:copy-feats scp:feats.scp ark:- |" ark:feat.ark
  2. Score the feature set using the speech_sample:
    ./speech_sample -d GNA_AUTO -bs 8 -i feat.ark -m wsj_dnn5b_smbr_fp32.xml -o scores.ark
  3. Run the Kaldi decoder to produce n-best text hypotheses and select most likely text given the WFST (HCLG.fst), vocabulary (words.txt), and TID/PID mapping (final.mdl):
    latgen-faster-mapped --max-active=7000 --max-mem=50000000 --beam=13.0 --lattice-beam=6.0 --acoustic-scale=0.0833 --allow-partial=true --word-symbol-table=words.txt final.mdl HCLG.fst ark:scores.ark ark:-| lattice-scale --inv-acoustic-scale=13 ark:- ark:- | lattice-best-path --word-symbol-table=words.txt ark:- ark,t:-  > out.txt &
  4. Run the word error rate tool to check accuracy given the vocabulary (words.txt) and reference transcript (test_filt.txt):
    cat out.txt | utils/int2sym.pl -f 2- words.txt | sed s:<UNK>::g | compute-wer --text --mode=present ark:test_filt.txt ark,p:-

Neural Style Transfer Sample

Description

How to build and run the Neural Style Transfer sample (NST sample) application, which does inference using models of style transfer topology.

Running the Application

Running the application with the -h option results in the message:

$ ./style_transfer_sample --h
InferenceEngine:
    API version ............ <version>
    Build .................. <number>
style_transfer_sample [OPTION]
Options:
    -h
                            Print a usage message.
    -i "<path1>" "<path3>"
                            Required. Path to a directory with images or path to an image files: a .ubyte file for LeNet
                            and a .bmp file for the other networks.
    -m "<path>"
                            Required. Path to an .xml file with a trained model.
        -l "<absolute_path>"
                            Optional. Absolute path to library with MKL-DNN (CPU) custom layers (*.so).
        Or
        -c "<absolute_path>"
                            Optional. Absolute path to GPU custom layers config (*.xml).
    -pp "<path>"
                            Path to a plugin directory.
    -p "<name>"
                            Plugin name. For example Intel® MKL-DNN. If this parameter is pointed, the sample looks for this plugin only
    -d "<device>"
                            Specify the target device to infer on; CPU or GPU is acceptable. The sample looks for a suitable plugin for the specified device.
    -nt "<integer>"
                            Number of top results (default 10)
    -ni "<integer>"
                            Number of iterations (default 1)
    -pc
                            Enables per-layer performance report

Running the application with an empty list of options results in an error message and the usage list above.

To do inference on an image using a trained model of NST network on Intel® Processors using the following command:

$ ./style_transfer_sample -i <path_to_image>/cat.bmp -m <path_to_model>/1_decoder_FP32.xml

Output Description

The application outputs one or more styled image, starting with named out1.bmp, which were redrawn in style of model which used for inference. Style of output images depend on models which use for sample.


Hello Infer Request Classification

Description

How to run the Hello Infer Classification sample application. The sample is simplified version of the Image Classification Sample. It's intended to demonstrate using of new Infer Request API of Inference Engine in applications. See Integrate with customer application New Request API for details.

Running the Application

To do inference on an image using a trained AlexNet network on Intel® Processors:

$ ./hello_request_classification <path_to_model>/alexnet_fp32.xml <path_to_image>/cat.bmp CPU

Output Description

The top-10 inference results


Interactive Face Detection

Description

This sample showcases the Object Detection task applied to face recognition using a sequence of neural networks.

Async API can improve the overall frame-rate of the application, because rather than wait for inference to complete, the application can continue operating on the host while accelerator is busy. This sample maintains three parallel infer requests for the Age/Gender Recognition, Head Pose Estimation, and Emotions Recognition that run simultaneously.

Other sample objectives:

  • Video as input support via OpenCV*
  • Visualization of the resulting face bounding boxes from Face Detection network
  • Visualization of age/gender, head pose, and emotion information for each detected face
  • OpenCV* provides resulting bounding boxes, labels, and other information. You can copy and paste this code without pulling Inference Engine sample helpers into your application

How it Works

  1. The application reads command line parameters loads up to four networks, depending on -m... options family to the Inference Engine.
  2. The application gets a frame from the OpenCV's VideoCapture.
  3. The application performs inference on the frame detection network.
  4. The application performs three simultaneous inferences, using the Age/Gender, Head Pose and Emotions detection networks if those specified in command line.
  5. The application displays the results.

The new Async API operates with new notion of the Infer Request that encapsulates the inputs/outputs and separates scheduling and waiting for result. For more information about Async API and the difference between Sync and Async modes performance, refer to Object Detection SSD, Async API Performance Showcase Sample.

Running the Application

Running the application with the -h option results in the following usage message:

./interactive_face_detection -h
InferenceEngine: 
    API version ............ <version>
    Build .................. <number>
interactive_face_detection [OPTION]
Options:
    -h                               Print a usage message.
    -i "<path>"                Optional. Path to an video file. Default value is "cam" to work with camera.
    -m "<path>"                Required. Path to an .xml file with a trained face detection model.
    -m_ag "<path>"             Optional. Path to an .xml file with a trained age gender model.
    -m_hp "<path>"             Optional. Path to an .xml file with a trained head pose model.
    -m_em "<path>"             Optional. Path to an .xml file with a trained emotions model.
      -l "<absolute_path>"     Required for MKLDNN (CPU)-targeted custom layers.Absolute path to a shared library with the kernels impl.
          Or
      -c "<absolute_path>"     Required for clDNN (GPU)-targeted custom kernels. Absolute path to the xml file with the kernels desc.
    -d "<device>"              Specify the target device for Face Detection (CPU, GPU, FPGA, or MYRIAD). The sample looks for a suitable plugin for a specified device.
    -d_ag "<device>"           Specify the target device for Age Gender Detection (CPU, GPU, FPGA, or MYRIAD). The sample will look for a suitable plugin for a specified device.
    -d_hp "<device>"           Specify the target device for Head Pose Detection (CPU, GPU, FPGA, or MYRIAD). The sample will look for a suitable plugin for a specified device.
    -d_em "<device>"           Specify the target device for Emotions Detection (CPU, GPU, FPGA, or MYRIAD). The sample will look for a suitable plugin for device specified.
    -n_ag "<num>"              Specify number of maximum simultaneously processed faces for Age Gender Detection (default is 16).
    -n_hp "<num>"              Specify number of maximum simultaneously processed faces for Head Pose Detection (default is 16).
    -n_em "<num>"              Specify number of maximum simultaneously processed faces for Emotions Detection (default is 16).
    -no_wait                         No wait for key press in the end.
    -no_show                         No show processed video.
    -pc                              Enables per-layer performance report.
    -r                               Inference results as raw values.
    -t                               Probability threshold for detections.

Running the application with an empty list of options results in an error message and the usage list above.

To run the demo, you can use public models or a set of pre-trained and optimized models delivered with the package:

  • <INSTAL_DIR>/deployment_tools/intel_models/face-detection-adas-0001
  • <INSTAL_DIR>/deployment_tools/intel_models/age-gender-recognition-retail-0013
  • <INSTAL_DIR>/deployment_tools/intel_models/head-pose-estimation-adas-0001
  • <INSTAL_DIR>/deployment_tools/intel_models/emotions-recognition-retail-0003

For example, to do inference on a GPU with the OpenVINO™ toolkit pre-trained models, run the following command:

./interactive_face_detection -i <path_to_video>/inputVideo.mp4 -m face-detection-adas-0001.xml -m_ag age-gender-recognition-retail-0013.xml -m_hp head-pose-estimation-adas-0001.xml -m_em emotions-recognition-retail-0003.xml -d GPU

NOTE:If you want to use public models, they must be first converted to the Inference Engine format (`*.xml` + `*.bin`) using the Model Optimizer tool.

Sample Output

The sample uses OpenCV* to display the resulting frame with detections rendered as bounding boxes with lables if provided. In default mode, the sample reports:

  • OpenCV* time: frame decoding + time to render the bounding boxes, labels, and displaying the results
  • Face Detection time: inference time for the face Detection network

If Age/Gender recognition, Head Pose estimation, or Emotions recognition are enabled, the additional information is reported:

  • Age/Gender + Head Pose + Emotions Detection time: combined inference time of simultaneously executed age gender, head pose and emotion recognition networks.

Image Segmentation Sample

Description

Performs inference using image segmentation networks like FCN8.

The sample application reads command line parameters and loads a network and an image to the Inference Engine plugin. When inference is done, the application provides an output image.

Running the Application

Running the application with the -h option results in the message:

$ ./segmentation_sample -h
InferenceEngine: 
    API version ............ <version>
    Build .................. <number>
segmentation_sample [OPTION]
Options:
    -h                      
                            Print a usage message.
    -i "<path1>" "<path3>"
                            Required. Path to a directory with images or path to an image files: a .ubyte file for LeNet
                            and a .bmp file for the other networks.
    -m "<path>"             
                            Required. Path to an .xml file with a trained model.
        -l "<absolute_path>"    
                            Optional. Absolute path to library with MKL-DNN (CPU) custom layers (*.so).
        Or
        -c "<absolute_path>"
                            Optional. Absolute path to GPU custom layers config (*.xml).
    -pp "<path>"            
                            Path to a plugin directory.
    -d "<device>"           
                            Specify the target device to infer on; CPU or GPU is acceptable. The sample looks for a suitable plugin for the specified device.
    -ni "<integer>"         
                            Number of iterations (default 1)
    -pc                     
                            Enables per-layer performance report

Running the application with an empty list of options results in an error message and the usage list above.

To do inference on Intel® Processors using an image from a trained FCN8 network:

$ ./segmentation_sample -i <path_to_image>/inputImage.bmp -m <path_to_model>/fcn8.xml

Output Description

The application outputs are a segmented image named out.bmp.


Crossroad Camera Sample

This sample provides an inference pipeline for persons' detection, recognition and reidentification. The sample uses Person Detection network followed by the Person Attributes Recognition and Person Reidentification Retail networks applied on top of the detection results. The corresponding pre-trained models are delivered with the product:

  • person-vehicle-bike-detection-crossroad-0078, which is a primary detection network for finding the persons (and other objects if needed)
  • person-attributes-recognition-crossroad-0031, which is executed on top of the results from the first network and reports person attributes like gender, has hat, has long-sleeved clothes
  • person-reidentification-retail-0079, which is executed on top of the results from the first network and prints a vector of features for each detected person. This vector is used to conclude if it is already detected person or not.

For details on the models, please refer to the descriptions in the deployment_tools/intel_models folder of the OpenVINO™ toolkit installation directory.

Other sample objectives are:

  • Images/Video/Camera as inputs, via OpenCV*
  • Example of simple networks pipelining: Person Attributes and Person Reidentification networks are executed on top of the Person Detection results
  • Visualization of Person Attributes and Person Reidentification (REID) information for each detected person

How It Works

On the start-up, the application reads command line parameters and loads the specified networks. The Person Detection network is required, the other two are optional.

Upon getting a frame from the OpenCV VideoCapture, the application performs inference of Person Detection network, then performs another two inferences of Person Attributes Recognition and Person Reidentification Retail networks if they were specified in the command line, and displays the results. In case of the Person Reidentification Retail network, the resulting vector is generated for each detected person. This vector is compared one-by-one with all previously detected persons vectors using cosine similarity algorithm. If comparison result is greater than the specified (or default) threshold value, it is concluded that the person was already detected and a known REID value is assigned. Otherwise, the vector is added to a global list, and a new REID value is assigned.

Running

Running the application with the -h option yields the following usage message:

./crossroad_camera_sample -h
InferenceEngine:
    API version ............ 1.0
crossroad_camera_sample [OPTION]
Options:
    -h                           Print a usage message.
    -i "<path>"                  Required. Path to a video or image file. Default value is "cam" to work with camera.
    -m "<path>"                  Required. Path to the Person/Vehicle/Bike Detection Crossroad model (.xml) file.
    -m_pa "<path>"               Optional. Path to the Person Attributes Recognition Crossroad model (.xml) file.
    -m_reid "<path>"             Optional. Path to the Person Reidentification Retail model (.xml) file.
      -l "<absolute_path>"       For MKLDNN (CPU)-targeted custom layers, if any. Absolute path to a shared library with the kernels impl.
          Or
      -c "<absolute_path>"       For clDNN (GPU)-targeted custom kernels, if any. Absolute path to the xml file with the kernels desc.
    -d "<device>"                Specify the target device for Person/Vehicle/Bike Detection (CPU, GPU, FPGA, MYRIAD, or HETERO).
    -d_pa "<device>"             Specify the target device for Person Attributes Recognition (CPU, GPU, FPGA, MYRIAD, or HETERO).
    -d_reid "<device>"           Specify the target device for Person Reidentification Retail (CPU, GPU, FPGA, MYRIAD, or HETERO).
    -no_show                     No show processed video.
    -pc                          Enables per-layer performance statistics.
    -r                           Output Inference results as raw values.
    -t                           Probability threshold for person/vehicle/bike crossroad detections.
    -t_reid                      Cosine similarity threshold between two vectors for person reidentification.

Sample Output

The sample uses OpenCV to display the resulting frame with detections rendered as bounding boxes and text. In the default mode, the sample reports Person Detection time - inference time for the Person/Vehicle/Bike Detection network.

If Person Attributes Recognition or Person Reidentification Retail are enabled, the additional info below is reported also:

  • Person Attributes Recognition time - Inference time of Person Attributes Recognition averaged by the number of detected persons.
  • Person Reidentification time - Inference time of Person Reidentification averaged by the number of detected persons.

Multi-Channel Face Detection Sample

This sample provides an inference pipeline for multi-channel face detection. The sample uses Face Detection network. The corresponding pre-trained model delivered with the product is face-detection-retail-0004, which is a primary detection network for finding faces.

For details on the models, please refer to the descriptions in the deployment_tools/intel_models folder of the OpenVINO™ toolkit installation directory.

Other sample objectives are:

  • Up to 16 Cameras as inputs, via OpenCV*
  • Visualization of detected faces from all channels on single screen

How It Works

NOTE: Running the sample requires using at least one web camera attached to your machine.

On the start-up, the application reads command line parameters and loads the specified networks. The Face Detection network is required.

Running

Running the application with the -h option yields the following usage message:

./multichannel_face_detection_sample -h

multichannel_face_detection [OPTION]
Options:

    -h                           Print a usage message.
    -m "<path>"                  Required. Path to an .xml file with a trained face detection model.
      -l "<absolute_path>"       Required for MKLDNN (CPU)-targeted custom layers.Absolute path to a shared library with the kernels impl.
          Or
      -c "<absolute_path>"       Required for clDNN (GPU)-targeted custom kernels.Absolute path to the xml file with the kernels desc.
    -d "<device>"                Specify the target device for Face Detection (CPU, GPU, FPGA, or MYRIAD). Sample will look for a suitable plugin for device specified.
    -nc                          Maximum number of processed camera inputs (web cams). If not specified, 4 cameras are expected by default. 
    -bs                          Processing batch size, number of frames processed per infer request
    -n_ir                        Number of infer requests
    -n_iqs                       Frame queue size for input channels
    -fps_sp                      FPS measurement sampling period. Duration between timepoints, msec
    -num_sp                      Exit after N sampling periods in performance testing(No show) mode
    -t                           Probability threshold for detections.
    -no_show                     No show processed video.
    -show_stats                  Enable statictics output
    -duplicate_num               Enable and specify number of channel additionally copied from real sources
    -real_input_fps              Disable input frames caching, for maximum throughput pipeline

For example, to run the sample with the pre-trained face detection model on FPGA with fallback on CPU, with one single camera, use the following command:

./multi-channel-sample -m <INSTALL_DIR>/deployment_tools/intel_models/face-detection-retail-0004/FP32/face-detection-retail-0004.xml 
-l <samples_build_folder>/intel64/Release/lib/libcpu_extension.so -d HETERO:FPGA,CPU -nc 1

To run with a single camera but several channels, specify additional parameter: -duplicate_num 3. You will see 4 channels: 1 real and 3 duplicated.

Sample Output

The sample uses OpenCV to display the resulting bunch of frame with detections rendered as bounding boxes. On the Top of the screen sample reports Throughput (in frames per second). If needed it prints on the screen more detailed statistics.


Using the Validation Application to Check Accuracy on a Dataset

The Inference Engine Validation application lets you score common topologies with standard inputs and outputs configuration. These topologies include AlexNet and SSD. The Validation application allows the user to collect simple validation metrics for the topologies. It supports Top-1/Top-5 counting for classification networks and 11-points mAP calculation for object detection networks.

Possible Validation application uses:

  • Check if Inference Engine scores the public topologies well
  • Verify if the user's custom topology compatible with the default input/output configuration and compare its accuracy with the public ones
  • Using Validation application as another sample: although the code is much more complex than in classification and object detection samples, it's still open and could be re-used

The application loads a network to the Inference Engine plugin. Then:

  1. The application reads the validation set (the -i option):
    • If -i specifies a directory. The application tries to load labels first. To do so, the application searches for a file with the same base name as the model, but with a .labels extension. The application then searches the specified directory and adds all images from sub-directories whose names are equal to a known label to the validation set. If there are no sub-directories whose names are equal to known labels, the validation set is considered empty.
    • If -i specifies a .txt file. The application reads the .txt file, considering every line that has the format: <relative_path_from_txt_to_img] <ID] where ID is the image number that the network should classify.
  2. The application reads the number of images specified by -b and loads the images to the plugin. When all images are loaded, the plugin does inference and the Validation application collects the statistics.

NOTE: Image load time is not part of of the inference time reported by the application.

As an option, use the -dump option to retrieve the inference results. This option creates an inference report with the name in as dumpfileXXXX.csv. in this format, using semicolon separated values:

  • Image_path
  • Flag representing correctness of prediction
  • ID of the Top-1 class
  • Probability that the image belongs to the Top-1 class
  • ID of the Top-2 class
  • Probability that the image belongs to the Top-x class, where x is an integer

CLI Options

Usage: validation_app [OPTION]
Available options:
    -h                        Print a usage message
    -t                  Type of the network being scored ("C" by default)
      -t "C" for classification
      -t "OD" for object detection
    -i [path]                 Required. Directory with validation images, directorys grouped by labels or a .txt file list for classification networks or a VOC-formatted dataset for object detection networks
    -m [path]                 Required. Path to an .xml file with a trained model
    -l [absolute_path]        Required for Intel® MKL-DNN (CPU)-targeted custom layers.Absolute path to a shared library with the kernel implementations
    -c [absolute_path]        Required for GPU-targeted custom kernels.Absolute path to the xml file with the kernel descriptions
    -d [device]               Specify the target device to infer on; CPU, GPU, FPGA or MYRIAD is acceptable. The sample looks for a suitable plugin for the specified device. The plugin is CPU by default.
    -b N                      Batch size value. If not specified, the batch size value is determined from IR
    -ppType             Preprocessing type. One of "None", "Resize", "ResizeCrop"
    -ppSize N                 Preprocessing size (used with ppType="ResizeCrop")
    -ppWidth W                Preprocessing width (overrides -ppSize, used with ppType="ResizeCrop")
    -ppHeight H               Preprocessing height (overrides -ppSize, used with ppType="ResizeCrop")
    --dump                    Dump filenames and inference results to a csv file

    Classification-specific options:
      -Czb true               "Zero is a background" flag. Some networks are trained with a modified dataset where the class IDs are enumerated from 1, but 0 is an undefined "background" class (which is never detected)

    Object detection-specific options:
      -ODkind           Kind of an object detection network: SSD
      -ODa [path]             Required for OD networks. Path to the directory containing .xml annotations for images
      -ODc              Required for OD networks. Path to the file containing classes list
      -ODsubdir         Directory between the image path (-i) and image name, specified in the .xml. Use JPEGImages for VOC2007

Option Categories

  • Common options are usually named with a single letter or word, such as -b or –dump. These options have a common sense in all validation_app modes.
  • Network type-specific options are named as an acronym of the network type (such as C or OD, followed by a letter or a word addendum. These options are specific for the network type. For instance, ODa makes sense only for an object detection network.

The next section shows how to use the Validation application in classification mode to score a classification CNN on a pack of images.

Running the Application in Classification Mode

This section demonstrates how to run the Validation application in classification mode to score a classification CNN on a pack of images.

To do inference of a chosen pack of images:

$ ./validation_app -t C -i <path to images main directory or .txt file] -m <model to use for classification] -d <CPU|GPU]

Source dataset format: directories as classes

A correct list of files looks similar to:

<path]/dataset
    /apron
        /apron1.bmp
        /apron2.bmp
    /collie
        /a_big_dog.jpg
    /coral reef
        /reef.bmp
    /Siamese
        /cat3.jpg

To score this dataset put the -i <path]/dataset option in the command line.

Source dataset format: a list of images

This example uses a single list file in the format image_name-tabulation-class_index. The correct list of files:

<path]/dataset
    /apron1.bmp
    /apron2.bmp
    /a_big_dog.jpg
    /reef.bmp
    /cat3.jpg
    /labels.txt

where labels.txt:

apron1.bmp 411
apron2.bmp 411
cat3.jpg 284
reef.bmp 973
a_big_dog.jpg 231

To score this dataset put the -i <path>/dataset/labels.txt option in the command line.

Output Description

A progress bar shows the inference progress. Upon completion, the common information is displayed.

Network load time: time spent on topology load in ms
Model: path to chosen model
Model Precision: precision of a chosen model
Batch size: specified batch size
Validation dataset: path to a validation set
Validation approach: Classification networks
Device: device type

You see statistics such as the average inference time, and top-1 and top-5 accuracy:

Average infer time (ms): 588.977 (16.98 images per second with batch size = 10)

Top1 accuracy: 70.00% (7 of 10 images were detected correctly, top class is correct)
Top5 accuracy: 80.00% (8 of 10 images were detected correctly, top five classes contain required class)

Using Object Detection with the Validation Application

Description

Running the Validation application in object detection mode to score an object detection on the SSD CNN pack of images.

Running SSD on the VOC Dataset

Use these steps to score SSD on the original dataset that was used to test it during its training.

./validation_app -d CPU -t OD -ODa "<...>/VOCdevkit/VOC2007/Annotations" -i "<...>/VOCdevkit" -m "<...>/vgg_voc0712_ssd_300x300.xml" -ODc "<...>/VOC_SSD_Classes.txt" -ODsubdir JPEGImages
  1. Go to the SSD author's github page to select the pre-trained SSD-300.
  2. From the same page, download the VOC2007 test dataset:
    $wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar
    tar -xvf VOCtest_06-Nov-2007.tar
  3. Use the Model Optimizer to convert the model. For help, see https://software.intel.com/en-us/articles/OpenVINO-ModelOptimizer
  4. Create a proper class file (made from the original labelmap_voc.prototxt) none_of_the_above 0 aeroplane 1 bicycle 2 bird 3 boat 4 bottle 5 bus 6 car 7 cat 8 chair 9 cow 10 diningtable 11 dog 12 horse 13 motorbike 14 person 15 pottedplant 16 sheep 17 sofa 18 train 19 tvmonitor 20
  5. Save it as VOC_SSD_Classes.txt
  6. Score the model on the dataset:

  7. You see a progress bar followed by your data:
    Progress: [....................] 100.00% done    
    [ INFO ] Processing output blobs
    Network load time: 27.70ms
    Model: /home/user/models/ssd/withmean/vgg_voc0712_ssd_300x300/vgg_voc0712_ssd_300x300.xml
    Model Precision: FP32
    Batch size: 1
    Validation dataset: /home/user/Data/SSD-data/testonly/VOCdevkit
    Validation approach: Object detection network
    
    Average infer time (ms): 166.49 (6.01 images per second with batch size = 1)
    Average precision per class table: 
    
    Class   AP
    1   0.796
    2   0.839
    3   0.759
    4   0.695
    5   0.508
    6   0.867
    7   0.861
    8   0.886arXiv
    9   0.602
    10  0.822
    11  0.768
    12  0.861
    13  0.874
    14  0.842
    15  0.797
    16  0.526
    17  0.792
    18  0.795
    19  0.873
    20  0.773
    Mean Average Precision (mAP): 0.7767

The Mean Value Precision is in a table on the SSD author's page and in the arXiv paper.

Hardware-accelerated Function-as-a-Service Using AWS Greengrass* (Beta)

Hardware accelerated Function-as-a-Service (FaaS) enables cloud developers to deploy inference functionalities on Intel® IoT edge devices with accelerators (Intel® Processor Graphics, Intel® FPGA, and Intel® Movidius™ Neural Compute Stick).  These functions provide a great developer experience and seamless migration of visual analytics from cloud to edge in a secure manner using containerized environment. Hardware-accelerated FaaS provides the best-in-class performance by accessing optimized deep learning libraries on Intel IoT edge devices with accelerators.

This section describes implementation of FaaS inference samples (based on Python* 2.7) using AWS Greengrass* and AWS Lambda* software. AWS Lambda functions (lambdas) can be created, modified, or updated in the cloud and can be deployed from cloud to edge using AWS Greengrass. This document covers description of samples, pre-requisites for Intel edge device, configuring an AWS Greengrass group, creating and packaging lambda functions, deployment of lambdas and various options to consume the inference output.

Description

greengrass_classification_sample.py

This AWS Greengrass sample classifies a video stream using classification networks such as AlexNet and GoogLeNet and publishes top-10 results on AWS IoT* Cloud every second. 

greengrass_object_detection_sample_ssd.py

This AWS Greengrass sample detects objects in a video stream and classifies them using single-shot multi-box detection (SSD) networks such as SSD Squeezenet, SSD Mobilenet, and SSD300. This sample publishes detection outputs such as class label, class confidence, and bounding box coordinates on AWS IoT Cloud every second.

Supported Platforms

Pre-requisites

  • Download and install the OpenVINO™ toolkit from https://software.intel.com/en-us/openvino-toolkit
  • Python* 2.7 with opencv-python, numpy, boto3. Use sudo pip2 install to install the packages in locations accessible by AWS Greengrass.
  • Download Intel's edge optimized models available at: https://github.com/intel/Edge-optimized-models. Any custom pre-trained classification or SSD models can be used.
  • Convert the above models to Intermediate Representation (IR) using the Model Optimizer tool from the OpenVINO™ toolkit. Follow the instructions at: https://software.intel.com/en-us/articles/OpenVINO-ModelOptimizer. For CPU, models should be converted with data type FP32 and for GPU and FPGA, it should be with data type FP16 for the best performance.
  • To run the samples, the OpenVINO™ toolkit provides the pre-compiled libcpu_extension libraries available in the <INSTALL_DIR>/deployment_tools/inference_engine/lib/Ubuntu_16.04/intel64/ folder:
    • libcpu_extension_sse4.so – for Intel Atom® processors
    • libcpu_extension_avx2.so – for Intel® Core™ and Intel® Xeon® processors.

    To run the samples on other devices, it is recommended to rebuild the libraries for a specific target to get performance gain. For build instructions, refer to the Inference Engine Developer Guide.

Configuring an AWS Greengrass group

For each Intel's edge platform, you need to create a new AWS Greengrass group and install AWS Greengrass core software to establish the connection between cloud and edge. 

Creating and Packaging Lambda Functions

  • To download the AWS Greengrass Core SDK for Python* 2.7, follow the steps 1-4 at: https://docs.aws.amazon.com/greengrass/latest/developerguide/create-lambda.html
  • Replace greengrassHelloWorld.py with AWS Greengrass sample (greengrass_classification_sample.py, greengrass_object_detection_sample_ssd.py) and zip it with extracted AWS Greengrass SDK folders from the previous step into greengrass_sample_python_lambda.zip. The zip should contain:
    • greengrass_common
    • greengrass_ipc_python_sdk
    • greengrasssdk
    • greengrass sample(greengrass_classification_sample.py or  greengrass_classification_sample.py)

    For example:

    zip -r greengrass_sample_python_lambda.zip greengrass_common greengrass_ipc_python_sdk greengrasssdk greengrass_object_detection_sample_ssd.py

Deployment of Lambdas

Configuring the Lambda function

  • After creating the AWS Greengrass group and the lambda function, start configuring the lambda function for AWS Greengrass by following the steps 1-8 in AWS Greengrass developer guide at: https://docs.aws.amazon.com/greengrass/latest/developerguide/config-lambda.html
  • In addition to the details mentioned in step 8 of the AWS Greengrass developer guide, change the Memory limit to 2048MB to accommodate large input video streams.
  • Add the following environment variables as key-value pair when editing the lambda configuration and click on Update:
    KeyValue
    LD_LIBRARY_PATH<INSTALL_DIR>/opencv/share/OpenCV/3rdparty/lib:
    <INSTALL_DIR>/opencv/lib:/opt/intel/opencl:
    <INSTALL_DIR>/deployment_tools/inference_engine/external/cldnn/lib:
    <INSTALL_DIR>/deployment_tools/inference_engine/external/mkltiny_lnx/lib:
    <INSTALL_DIR>/deployment_tools/inference_engine/lib/ubuntu_16.04/intel64:
    <INSTALL_DIR>/deployment_tools/model_optimizer/model_optimizer_caffe/bin:
    <INSTALL_DIR>/openvx/lib
    PYTHONPATH<INSTALL_DIR>/deployment_tools/inference_engine/python_api/Ubuntu_1604/python2
    PARAM_MODEL_XML<MODEL_DIR>/<IR.xml>, where <MODEL_DIR> is user specified and contains IR.xml, the Intermediate Representation file from Intel Model Optimizer
    PARAM_INPUT_SOURCE<DATA_DIR>/input.mp4 to be specified by user. Holds both input and output data.
    PARAM_DEVICEFor CPU, specify `CPU`. For GPU, specify `GPU`. For FPGA, specify `HETERO:FPGA,CPU`.
    PARAM_CPU_EXTENSION_PATH<INSTALL_DIR>/deployment_tools/inference_engine/lib/Ubuntu_16.04/intel64/<CPU_EXTENSION_LIB>, where CPU_EXTENSION_LIB is libcpu_extension_sse4.so for Intel Atom® processors and libcpu_extension_avx2.so for Intel® Core™ and Intel® Xeon® processors.
    PARAM_OUTPUT_DIRECTORY<DATA_DIR> to be specified by user. Holds both input and output data.
    PARAM_NUM_TOP_RESULTSUser specified for classification sample (e.g. 1 for top-1 result, 5 for top-5 results)
  • Use below LD_LIBRARY_PATH and additional environment variables for Intel® Arria® 10 GX FPGA Development Kit:
    KeyValue
    LD_LIBRARY_PATH/opt/altera/aocl-pro-rte/aclrte-linux64/board/a10_ref/linux64/lib:
    /opt/altera/aocl-pro-rte/aclrte-linux64/host/linux64/lib:
    <INSTALL_DIR>/opencv/share/OpenCV/3rdparty/lib:
    <INSTALL_DIR>/opencv/lib:/opt/intel/opencl:
    <INSTALL_DIR>/deployment_tools/inference_engine/external/cldnn/lib:
    <INSTALL_DIR>/deployment_tools/inference_engine/external/mkltiny_lnx/lib:
    <INSTALL_DIR>/deployment_tools/inference_engine/lib/ubuntu_16.04/intel64:
    <INSTALL_DIR>/deployment_tools/model_optimizer/model_optimizer_caffe/bin:
    <INSTALL_DIR>/openvx/lib
    DLA_AOCX<INSTALL_DIR>/a10_devkit_bitstreams/0-8-1_a10dk_fp16_8x48_arch06.aocx
    CL_CONTEXT_COMPILER_MODE_INTELFPGA3
  • Add subscription to subscribe or publish messages from AWS Greengrass lambda function by following the steps 10-14 in AWS Greengrass developer guide at: https://docs.aws.amazon.com/greengrass/latest/developerguide/config-lambda.html. The “Optional topic filter” field should be the topic mentioned inside the lambda function. For example, openvino/ssd or openvino/classification.

Local Resources

  • Add local resources and access privileges by following the instructions https://docs.aws.amazon.com/greengrass/latest/developerguide/access-local-resources.html
  • Following are the local resources needed for various hardware (CPU, GPU and FPGA) options:
    • General (for all hardware options):
      NameResource TypeLocal PathAccess
      ModelDirVolume<MODEL_DIR> to be specified by userRead-Only
      WebcamDevice/dev/video0Read-Only
      DataDirVolume<DATA_DIR> to be specified by user. Holds both input and output data.Read and Write
      OpenVINOPathVolume<INSTALL_DIR> where INSTALL_DIR is the OpenVINO installation directoryRead-Only
    • GPU:
      NameResource TypeLocal PathAccess
      GPUDevice/dev/dri/renderD128Read and Write
    • FPGA:
      NameResource TypeLocal PathAccess
      FPGADevice/dev/acla10_ref0Read and Write
      FPGA_DIR1Volume/opt/Intel/OpenCL/BoardsRead and Write
      FPGA_DIR2Volume/etc/OpenCL/vendorsRead and Write
    • VPU:
      Intel® Movidius™ Myriad™ VPU has not been validated with AWS Greengrass yet. This section will be updated in future releases.

Deploy

To deploy the lambda function to AWS Greengrass core device, select “Deployments” on group page and follow the instructions at: https://docs.aws.amazon.com/greengrass/latest/developerguide/configs-core.html.

Output Consumption

There are four options available for output consumption. These options are used to report/stream/upload/store inference output at an interval defined by the variable ‘reporting_interval’ in the AWS Greengrass samples. 
  1. AWS IoT* Cloud Output
    This option is enabled by default in the AWS Greengrass samples using a variable ‘enable_iot_cloud_output’.  We can use it to verify the lambda running on the edge device. It enables publishing messages to AWS IoT cloud using the subscription topic specified in the lambda (For example, openvino/classification for classification and openvino/ssd for object detection samples). For classification, top-1 result with class label are published to AWS IoT cloud. For SSD object detection, detection results such as bounding box co-ordinates of objects, class label, and class confidence are published. To view the output on AWS IoT cloud, follow the instructions at https://docs.aws.amazon.com/greengrass/latest/developerguide/lambda-check.html
  2. AWS Kinesis Streaming
    This option enables inference output to be streamed from the edge device to cloud using AWS Kinesis streams when ‘enable_kinesis_output’ is set to True. The edge devices act as data producers and continually push processed data to the cloud. The users need to set up and specify AWS Kinesis stream name, AWS Kinesis shard, and AWS region in the AWS Greengrass samples.
  3. Cloud Storage using AWS S3 Bucket
    This option enables uploading and storing processed frames (in JPEG format) in an AWS S3* bucket when the enable_s3_jpeg_output variable is set to True. The users need to set up and specify the AWS S3 bucket name in the AWS Greengrass samples to store the JPEG images. The images are named using the timestamp and uploaded to AWS S3.
  4. Local Storage
    This option enables storing processed frames (in JPEG format) on the edge device when the enable_s3_jpeg_output variable is set to True. The images are named using the timestamp and stored in a directory specified by PARAM_OUTPUT_DIRECTORY.
For more complete information about compiler optimizations, see our Optimization Notice.