Using the Model Optimizer to Convert TensorFlow* Models

Introduction

The Model Optimizer is a cross-platform command-line tool that facilitates the transition between the training and deployment environment, performs static model analysis, and adjusts deep learning models for optimal execution on end-point target devices.

The Model Optimizer process assumes you have a network model trained using a supported frameworks. The scheme below illustrates the typical workflow for deploying a trained deep learning model:

Intel Computer Vision Basic Workflow

A summary of the steps for optimizing and deploying a model that was trained with the TensorFlow* framework:

  1. Configure the Model Optimizer for TensorFlow* (TensorFlow was used to train your model).
  2. Freeze the TensorFlow model if your model is not already frozen.
  3. Convert a TensorFlow model to produce an optimized Intermediate Representation (IR) of the model based on the trained network topology, weights, and biases values.
  4. Test the model in the Intermediate Representation format using the Inference Engine in the target environment via provided Inference Engine validation application or sample applications.
  5. Integrate the Inference Engine in your application to deploy the model in the target environment.

Model Optimizer Workflow

The Model Optimizer process assumes you have a network model that was trained with the TensorFlow* frameworks. The workflow is:

  1. Configure the Model Optimizer for the TensorFlow* framework by running the configuration bash script for Linux* OS or batch file for Windows* OS from the <INSTALL_DIR>/deployment_tools/model_optimizer/install_prerequisites folder:
    • For Linux* OS:
      install_prerequisites_tf.sh
    • For Windows* OS:
      install_prerequisites_tf.bat

    For more details on configuring the Model Optimizer, see Configure the Model Optimizer.

  2. Provide as input a trained network that contains the certain topology, and the adjusted weights and biases.
  3. Convert the TensorFlow* model to an optimized Intermediate Representation.

The Model Optimizer produces as output an Intermediate Representation (IR) of the network, which can be read, loaded, and inferred with the Inference Engine. The Inference Engine API offers a unified API across a number of supported Intel® platforms. The Intermediate Representation is a pair of files that describe the whole model:

  • .xml: Describes the network topology
  • .bin: Contains the weights and biases binary data

The Intermediate Representation (IR) files can be read, loaded and inferred with Inference Engine. The Inference Engine API offers a unified API across a number of supported Intel® platforms.

Supported Topologies

Supported Unfrozen Topologies from the TensorFlow*-Slim Image Classification Model Library

Detailed information on how to convert models from the TensorFlow*-Slim Image Classification Model Library is available in the Converting TensorFlow*-Slim Image Classification Model Library Models chapter. The table below contains a list of supported TensorFlow*-Slim Image Classification models and required mean and scale values. The mean values are specified as if the input image is read in BGR channels order layout like Inference Engine classification sample does.

Model NameSlim Model Checkpoint FileMean Values (--mean_values)Scale Values (--scale)
Inception v1inception_v1_2016_08_28.tar.gz[127.5,127.5,127.5]127.5
Inception v2inception_v1_2016_08_28.tar.gz[127.5,127.5,127.5]127.5
Inception v3inception_v3_2016_08_28.tar.gz[127.5,127.5,127.5]127.5
Inception V4inception_v4_2016_09_09.tar.gz[127.5,127.5,127.5]127.5
Inception ResNet v2inception_resnet_v2_2016_08_30.tar.gz[127.5,127.5,127.5]127.5
MobileNet v1 128mobilenet_v1_0.25_128.tgz[127.5,127.5,127.5]127.5
MobileNet v1 160mobilenet_v1_0.5_160.tgz[127.5,127.5,127.5]127.5
MobileNet v1 224mobilenet_v1_1.0_224.tgz[127.5,127.5,127.5]127.5
NasNet Largenasnet-a_large_04_10_2017.tar.gz[127.5,127.5,127.5]127.5
NasNet Mobilenasnet-a_mobile_04_10_2017.tar.gz[127.5,127.5,127.5]127.5
ResidualNet-50 v1resnet_v1_50_2016_08_28.tar.gz[103.94,116.78,123.68]1
ResidualNet-50 v2resnet_v2_50_2017_04_14.tar.gz[103.94,116.78,123.68]1
ResidualNet-101 v1resnet_v1_101_2016_08_28.tar.gz[103.94,116.78,123.68]1
ResidualNet-101 v2resnet_v2_101_2017_04_14.tar.gz[103.94,116.78,123.68]1
ResidualNet-152 v1resnet_v1_152_2016_08_28.tar.gz[103.94,116.78,123.68]1
ResidualNet-152 v2resnet_v2_152_2017_04_14.tar.gz[103.94,116.78,123.68]1
VGG-16vgg_16_2016_08_28.tar.gz[103.94,116.78,123.68]1
VGG-19vgg_19_2016_08_28.tar.gz[103.94,116.78,123.68]1

Supported Frozen Topologies from TensorFlow Object Detection Models Zoo

Detailed information on how to convert models from the Object Detection Models Zoo is available in the Converting TensorFlow Object Detection API Models chapter. The table below contains models from the Object Detection Models zoo that are supported.

Model NameTensorFlow Object Detection API Models (Frozen)
SSD MobileNet V1 COCO*ssd_mobilenet_v1_coco_2018_01_28.tar.gz
SSD MobileNet V1 0.75 Depth COCOssd_mobilenet_v1_0.75_depth_300x300_coco14_sync_2018_07_03.tar.gz
SSD MobileNet V1 PPN COCOssd_mobilenet_v1_ppn_shared_box_predictor_300x300_coco14_sync_2018_07_03.tar.gz
SSD MobileNet V1 FPN COCOssd_mobilenet_v1_fpn_shared_box_predictor_640x640_coco14_sync_2018_07_03.tar.gz
SSD ResNet50 FPN COCOssd_resnet50_v1_fpn_shared_box_predictor_640x640_coco14_sync_2018_07_03.tar.gz
SSD MobileNet V2 COCOssd_mobilenet_v2_coco_2018_03_29.tar.gz
SSD Lite MobileNet V2 COCOssdlite_mobilenet_v2_coco_2018_05_09.tar.gz
SSD Inception V2 COCOssd_inception_v2_coco_2018_01_28.tar.gz
RFCN ResNet 101 COCOrfcn_resnet101_coco_2018_01_28.tar.gz
Faster R-CNN Inception V2 COCOfaster_rcnn_inception_v2_coco_2018_01_28.tar.gz
Faster R-CNN ResNet 50 COCOfaster_rcnn_resnet50_coco_2018_01_28.tar.gz
Faster R-CNN ResNet 50 Low Proposals COCOfaster_rcnn_resnet50_lowproposals_coco_2018_01_28.tar.gz
Faster R-CNN ResNet 101 COCOfaster_rcnn_resnet101_coco_2018_01_28.tar.gz
Faster R-CNN ResNet 101 Low Proposals COCOfaster_rcnn_resnet101_lowproposals_coco_2018_01_28.tar.gz
Faster R-CNN Inception ResNet V2 COCOfaster_rcnn_inception_resnet_v2_atrous_coco_2018_01_28.tar.gz
Faster R-CNN Inception ResNet V2 Low Proposals COCOfaster_rcnn_inception_resnet_v2_atrous_lowproposals_coco_2018_01_28.tar.gz
Faster R-CNN NasNet COCOfaster_rcnn_nas_coco_2018_01_28.tar.gz
Faster R-CNN NasNet Low Proposals COCOfaster_rcnn_nas_lowproposals_coco_2018_01_28.tar.gz
Mask R-CNN Inception ResNet V2 COCOmask_rcnn_inception_resnet_v2_atrous_coco_2018_01_28.tar.gz
Mask R-CNN Inception V2 COCOmask_rcnn_inception_v2_coco_2018_01_28.tar.gz
Mask R-CNN ResNet 101 COCOmask_rcnn_resnet101_atrous_coco_2018_01_28.tar.gz
Mask R-CNN ResNet 50 COCOmask_rcnn_resnet50_atrous_coco_2018_01_28.tar.gz
Faster R-CNN ResNet 101 Kitti*faster_rcnn_resnet101_kitti_2018_01_28.tar.gz
Faster R-CNN Inception ResNet V2 Open Images*faster_rcnn_inception_resnet_v2_atrous_oid_2018_01_28.tar.gz
Faster R-CNN Inception ResNet V2 Low Proposals Open Images*faster_rcnn_inception_resnet_v2_atrous_lowproposals_oid_2018_01_28.tar.gz
Faster R-CNN ResNet 101 AVA v2.1*faster_rcnn_resnet101_ava_v2.1_2018_04_30.tar.gz

NOTE: Faster and Mask R-CNN models are supported on CPU and GPU only with batch size equal to 1.

Other Supported Topologies

Load Non-Frozen Models to the Model Optimizer

There are three ways to store non-frozen TensorFlow models and load them to the Model Optimizer:

  1. Checkpoint: In this case, a model consists of two files:
    • inference_graph.pb or inference_graph.pbtxt
    • checkpoint_file.ckpt

    If you do not have an inference graph file, refer to Freezing Custom Models in Python.

    To convert such TensorFlow model:

    1. Go to the <INSTALL_DIR>/deployment_tools/model_optimizer directory
    2. Run the mo_tf.py script with the path to the checkpoint file to convert a model:
    • If input model is in .pb format:
       mo_tf.py --input_model <INFERENCE_GRAPH>.pb --input_checkpoint <INPUT_CHECKPOINT>
    • If input model is in .pbtxt format:
       mo_tf.py --input_model <INFERENCE_GRAPH>.pbtxt --input_checkpoint <INPUT_CHECKPOINT> --input_model_is_text
  2. MetaGraph:

    In this case, a model consists of three or four files stored in the same directory:

    • model_name.meta
    • model_name.index
    • model_name.data-00000-of-00001 (digit part may vary)
    • checkpoint (optional)

    To convert such TensorFlow model:

    1. Go to the <INSTALL_DIR>/deployment_tools/model_optimizer directory
    2. Run the mo_tf.py script with a path to the MetaGraph .meta file to convert a model:
       mo_tf.py --input_meta_graph <INPUT_META_GRAPH>.meta
  3. SavedModel:

    In this case, a model consists of a special directory:

    To convert such TensorFlow model:

    1. Go to the <INSTALL_DIR>/deployment_tools/model_optimizer directory
    2. Run the mo_tf.py script with a path to the SavedModel directory to convert a model:
      mo_tf.py --saved_model_dir <SAVED_MODEL_DIRECTORY>

Freezing Custom Models in Python*

When a network is defined in Python* code, you have to create an inference graph file. Usually, graphs are built in a form that allows model training. That means that all trainable parameters are represented as variables in the graph. To use the graph with the Model Optimizer, it should be frozen.

The graph is frozen and dumped to a file with the following code:

import tensorflow as tf
from tensorflow.python.framework import graph_io
frozen = tf.graph_util.convert_variables_to_constants(sess, sess.graph_def, ["<name_of_the_output_node>"])
graph_io.write_graph(frozen, './', 'inference_graph.pb', as_text=False)

Where:

  • sess is an nstance of the TensorFlow* Session object where the network topology is defined.
  • ["<name_of_the_output_node>"] is a list of output node names in the graph; frozen graph will include only those nodes from the original sess.graph_def that are directly or indirectly used to compute given output nodes. <name_of_the_output_node> is an example of possible output node name. You should derive the names based on your own graph.
  • ./ is a directory where the inference graph file should be generated.
  • inference_graph.pb is a name of the generated inference graph file.
  • as_text specifies whether the generated file should be in human-readable text format or binary.

Once the inference graph file is ready, use the steps in the previous section to freeze your model.

Convert a TensorFlow* Model

To convert a TensorFlow model:

  1. Go to the <INSTALL_DIR>/deployment_tools/model_optimizer directory.
  2. Use the mo_tf.py script to simply convert a model with the path to the input model .pb file:
    python3 mo_tf.py --input_model <INPUT_MODEL>.pb

Two groups of parameters are available to convert your model:

Use Framework-Agnostic Conversion Parameters

To adjust the conversion process, you can use the general (framework-agnostic) parameters:

	optional arguments:
  -h, --help            show this help message and exit
  --framework {tf,caffe,mxnet,kaldi,onnx}
                        Name of the framework used to train the input model.

Framework-agnostic parameters:
  --input_model INPUT_MODEL, -w INPUT_MODEL, -m INPUT_MODEL
                        Tensorflow*: a file with a pre-trained model (binary
                        or text .pb file after freezing). Caffe*: a model
                        proto file with model weights
  --model_name MODEL_NAME, -n MODEL_NAME
                        Model_name parameter passed to the final create_ir
                        transform. This parameter is used to name a network in
                        a generated IR and output .xml/.bin files.
  --output_dir OUTPUT_DIR, -o OUTPUT_DIR
                        Directory that stores the generated IR. By default, it
                        is the directory from where the Model Optimizer is
                        launched.
  --input_shape INPUT_SHAPE
                        Input shape(s) that should be fed to an input node(s)
                        of the model. Shape is defined as a comma-separated
                        list of integer numbers enclosed in parentheses or
                        square brackets, for example [1,3,227,227] or
                        (1,227,227,3), where the order of dimensions depends
                        on the framework input layout of the model. For
                        example, [N,C,H,W] is used for Caffe* models and
                        [N,H,W,C] for TensorFlow* models. Model Optimizer
                        performs necessary transformations to convert the
                        shape to the layout required by Inference Engine
                        (N,C,H,W). The shape should not contain undefined
                        dimensions (? or -1) and should fit the dimensions
                        defined in the input operation of the graph. If there
                        are multiple inputs in the model, --input_shape should
                        contain definition of shape for each input separated
                        by a comma, for example: [1,3,227,227],[2,4] for a
                        model with two inputs with 4D and 2D shapes.
  --scale SCALE, -s SCALE
                        All input values coming from original network inputs
                        will be divided by this value. When a list of inputs
                        is overridden by the --input parameter, this scale is
                        not applied for any input that does not match with the
                        original input of the model.
  --reverse_input_channels
                        Switch the input channels order from RGB to BGR (or
                        vice versa). Applied to original inputs of the model
                        if and only if a number of channels equals 3. Applied
                        after application of --mean_values and --scale_values
                        options, so numbers in --mean_values and
                        --scale_values go in the order of channels used in the
                        original model.
  --log_level {CRITICAL,ERROR,WARN,WARNING,INFO,DEBUG,NOTSET}
                        Logger level
  --input INPUT         The name of the input operation of the given model.
                        Usually this is a name of the input placeholder of the
                        model.
  --output OUTPUT       The name of the output operation of the model. For
                        TensorFlow*, do not add :0 to this name.
  --mean_values MEAN_VALUES, -ms MEAN_VALUES
                        Mean values to be used for the input image per
                        channel. Values to be provided in the (R,G,B) or
                        [R,G,B] format. Can be defined for desired input of
                        the model, for example: "--mean_values
                        data[255,255,255],info[255,255,255]". The exact
                        meaning and order of channels depend on how the
                        original model was trained.
  --scale_values SCALE_VALUES
                        Scale values to be used for the input image per
                        channel. Values are provided in the (R,G,B) or [R,G,B]
                        format. Can be defined for desired input of the model,
                        for example: "--scale_values
                        data[255,255,255],info[255,255,255]". The exact
                        meaning and order of channels depend on how the
                        original model was trained.
  --data_type {FP16,FP32,half,float}
                        Data type for all intermediate tensors and weights. If
                        original model is in FP32 and --data_type=FP16 is
                        specified, all model weights and biases are quantized
                        to FP16.
  --disable_fusing      Turn off fusing of linear operations to Convolution
  --disable_resnet_optimization
                        Turn off resnet optimization
  --finegrain_fusing FINEGRAIN_FUSING
                        Regex for layers/operations that won't be fused.
                        Example: --finegrain_fusing Convolution1,.*Scale.*
  --disable_gfusing     Turn off fusing of grouped convolutions
  --move_to_preprocess  Move mean values to IR preprocess section
  --extensions EXTENSIONS
                        Directory or a comma separated list of directories
                        with extensions. To disable all extensions including
                        those that are placed at the default location, pass an
                        empty string.
  --batch BATCH, -b BATCH
                        Input batch size
  --version             Version of Model Optimizer
  --silent              Prevent any output messages except those that
                        correspond to log level equals ERROR, that can be set
                        with the following option: --log_level. By default,
                        log level is already ERROR.
  --freeze_placeholder_with_value FREEZE_PLACEHOLDER_WITH_VALUE
                        Replaces input layer with constant node with provided
                        value, e.g.: "node_name->True"
  --generate_deprecated_IR_V2
                        Force to generate legacy/deprecated IR V2 to work with
                        previous versions of the Inference Engine. The
                        resulting IR may or may not be correctly loaded by
                        Inference Engine API (including the most recent and
                        old versions of Inference Engine) and provided as a
                        partially-validated backup option for specific
                        deployment scenarios. Use it at your own discretion.
                        By default, without this option, the Model Optimizer
                        generates IR V3.

NOTE: Model Optimizer does not revert input channels from RGB to BGR by default as it was in 2017 R3 Beta release. The command line parameter --reverse_input_channels must be specified manually to perform reversion. For details, refer to When to Reverse Input Channels chapter.

The sections below provide details on using particular parameters and examples of CLI commands.

When to Specify Mean and Scale Values

Usually neural network models are trained with the normalized input data. This means that the input data values are converted to be in a specific range, for example, [0, 1] or [-1, 1]. Sometimes the mean values (mean images) are subtracted from the input data values as part of the pre-processing. There are two cases how the input data pre-processing is implemented:

  • The input pre-processing operations are a part of a topology. In this case, the application that uses the framework to infer the topology does not pre-process the input.
  • The input pre-processing operations are not a part of a topology and the pre-processing is performed within the application which feeds the model with an input data.

In the first case, the Model Optimizer generates the IR with required pre-processing layers and Inference Engine samples may be used to infer the model.

In the second case, information about mean/scale values should be provided to the Model Optimizer to embed it to the generated IR. Model Optimizer provides a number of command line parameters to specify them: --scale, --scale_values, --mean_values, --mean_file.

If both mean and scale values are specified, the mean is subtracted first and then scale is applied. Input values are divided by the scale value(s).

There is no a universal recipe for determining the mean/scale values for a particular model. The steps below could help to determine them:

  1. Read the model documentation. Usually the documentation describes mean/scale value if the pre-processing is required.
  2. Open the example script/application executing the model and track how the input data is read and passed to the framework.
  3. Open the model in a visualization tool and check for layers performing subtraction or multiplication (like Sub, Mul, ScaleShift, Eltwise etc) of the input data. If such layers exist, the pre-processing is most probably the part of the model.

When to Specify Input Shapes

There are situations when the input data shape for the model is not fixed, like for the fully-convolutional neural networks. In this case, for example, TensorFlow* models contain -1 values in the shape attribute of the Placeholder operation. Inference Engine does not support input layers with undefined size, so if the input shapes are not defined in the model, the Model Optimizer fails to convert the model.

The solution is to provide the input shape(s) using the --input_shape command line parameter for all inputs of the model or provide the batch size using the -b command line parameter if the model contains just one input with undefined batch size only. In the latter case, the Placeholder shape for the TensorFlow* model looks like this [-1, 224, 224, 3].

When to Reverse Input Channels

Inference Engine samples load input images in BGR channels order. But the model may be trained on images loaded with the RGB channels order. In this case, inference results using the Inference Engine samples will be incorrect. The solution is to provide --reverse_input_channels command-line parameter. Then the Model Optimizer performs first convolution or other channel dependent operation weights modification so these operations output will be like the image is passed with RGB channels order.

Command-Line Interface (CLI) Examples Using Framework-Agnostic Parameters

  • Launching the Model Optimizer for <model_name>.pb with debug log level. Use this to better understand what is happening internally when a model is converted:
    python3 mo_tf.py --input_model <model_name>.pb --log_level DEBUG
  • Launching the Model Optimizer for <model_name>.pb with the output Intermediate Representation called result.xml and result.bin that are placed in the specified ../../models/:
    python3 mo_tf.py --input_model <model_name>.pb --model_name result --output_dir ../../models/
  • Launching the Model Optimizer for <model_name>.pb and providing scale values for a single input:
    python3 mo_tf.py --input_model <model_name>.pb --scale_values [59,59,59]
  • Launching the Model Optimizer for <model_name>.pb with two inputs with two sets of scale values for each input. The number of sets of scale/mean values should be exactly the same as the number of inputs of the given model:
    python3 mo_tf.py --input_model <model_name>.pb --input data,rois --scale_values [59,59,59],[5,5,5]
  • Launching the Model Optimizer for <model_name>.pb with specified input layer (data), and changing the shape of the input layer to be  [1,3,224,224], and specifying the name of the output layer:
    python3 mo_tf.py --input_model <model_name>.pb --input data --input_shape [1,3,224,224] --output pool5
  • Launching the Model Optimizer for <model_name>.pb with disabled fusing for linear operations with convolution, which is set by the --disable_fusing flag, and grouped convolutions, which is set by the --disable_gfusing flag: 
    python3 mo_tf.py --input_model <model_name>.pb --disable_fusing --disable_gfusing
  • Launching the Model Optimizer for <model_name>.pb, with reversing channels order between RGB and BGR, specifying mean values for the input and the precision of the Intermediate Representation to be FP16:
    python3 mo_tf.py --input_model <model_name>.pb --reverse_input_channels --mean_values [255,255,255] --data_type FP16
  • Launching the Model Optimizer for model.pb with extensions from specified directories. In particular, from /home/ and from /home/some/other/path. Also, the following command shows how to pass the mean file to the Intermediate Representation. It must be in a .binaryproto format:
    python3 mo_tf.py --input_model <model_name>.pb --extensions /home/,/some/other/path/ --mean_file mean_file.binaryproto

Using TensorFlow*-Specific Conversion Parameters

The following list provides the TensorFlow*-specific parameters. 

TensorFlow*-specific parameters:
	--input_model_is_text
                        TensorFlow*: treat the input model file as a text
                        protobuf format. If not specified, the Model Optimizer
                        treats it as a binary file by default.
  --input_checkpoint INPUT_CHECKPOINT
                        TensorFlow*: variables file to load.
  --input_meta_graph INPUT_META_GRAPH
                        Tensorflow*: a file with a meta-graph of the model
                        before freezing
  --saved_model_dir SAVED_MODEL_DIR
                        TensorFlow*: directory representing non frozen model
  --saved_model_tags SAVED_MODEL_TAGS
                        Group of tag(s) of the MetaGraphDef to load, in string
                        format, separated by ','. For tag-set contains
                        multiple tags, all tags must be passed in.
  --offload_unsupported_operations_to_tf
                        TensorFlow*: automatically offload unsupported
                        operations to TensorFlow*
  --tensorflow_subgraph_patterns TENSORFLOW_SUBGRAPH_PATTERNS
                        TensorFlow*: a list of comma separated patterns that
                        will be applied to TensorFlow* node names to infer a
                        part of the graph using TensorFlow*.
  --tensorflow_operation_patterns TENSORFLOW_OPERATION_PATTERNS
                        TensorFlow*: a list of comma separated patterns that
                        will be applied to TensorFlow* node type (ops) to
                        infer these operations using TensorFlow*.
  --tensorflow_custom_operations_config_update TENSORFLOW_CUSTOM_OPERATIONS_CONFIG_UPDATE
                        TensorFlow*: update the configuration file with node
                        name patterns with input/output nodes information.
  --tensorflow_use_custom_operations_config TENSORFLOW_USE_CUSTOM_OPERATIONS_CONFIG
                        TensorFlow*: use the configuration file with custom
                        operation description.
  --tensorflow_object_detection_api_pipeline_config TENSORFLOW_OBJECT_DETECTION_API_PIPELINE_CONFIG
                        TensorFlow*: path to the pipeline configuration file
                        used to generate model created with help of Object
                        Detection API.
  --tensorboard_logdir TENSORBOARD_LOGDIR
                        TensorFlow*: dump the input graph to a given directory
                        that should be used with TensorBoard.
  --tensorflow_custom_layer_libraries TENSORFLOW_CUSTOM_LAYER_LIBRARIES
                        TensorFlow*: comma separated list of shared libraries
                        with TensorFlow* custom operations implementation.
  --disable_nhwc_to_nchw
                        Disables default translation from NHWC to NCHW

Note: Models produces with TensorFlow* usually have not fully defined shapes (contain -1 in some dimensions). It is necessary to pass explicit shape for the input using command line parameter --input_shape or -b to override just batch dimension. If the shape is fully defined, then there is no need to specify either -b or --input_shape options.

Command-Line Interface (CLI) Examples Using TensorFlow*-Specific Parameters

  • Launching the Model Optimizer for Inception V1 frozen model when model file is a plain text protobuf:
    python3 mo_tf.py --input_model inception_v1.pbtxt --input_model_is_text -b 1
  • Launching the Model Optimizer for Inception V1 frozen model and automatically offload unsupported operations to TensorFlow*. Model Optimizer saves part of the model GraphDef into the generated XML. For more information about this feature, refer to Offloading Computations to TensorFlow*.
    python3 mo_tf.py --input_model inception_v1.pb --offload_unsupported_operations_to_tf -b 1
  • Launching the Model Optimizer for Inception V1 frozen model and offload two sub-graphs of the model defined by scope (node name regular expressions) to TensorFlow*: ".*InceptionV1/Conv2d_2b_1x1.*" and ".*InceptionV1/Conv2d_2c_3x3.*". Model Optimizer saves part of the model GraphDef into the generated .xml. For more information about this feature refer to Offloading Computations to TensorFlow*.
    python3 mo_tf.py --input_model inception_v1.pb -b 1 --tensorflow_subgraph_patterns .*InceptionV1/Conv2d_2b_1x1.*,.*InceptionV1/Conv2d_2c_3x3.*
  • Launching the Model Optimizer for Inception V1 frozen model and offload operations those type match specific regular expressions: "Relu,Softm.*". In this case all operations of type Relu and Softmax are matched. Model Optimizer saves part of the model GraphDef into the generated XML. For more information about this feature refer to Offloading Computations to TensorFlow*.
    python3 mo_tf.py --input_model inception_v1.pb -b 1 --tensorflow_operation_patterns Relu,Soft.*
  • Launching the Model Optimizer for Inception V1 frozen model and update custom sub-graph replacement file transform.json with information about input and output nodes of the matched sub-graph. For more information about this feature refer to Sub-Graph Replacement in the Model Optimizer.
    python3 mo_tf.py --input_model inception_v1.pb -b 1 --tensorflow_custom_operations_config_update transform.json
  • Launching the Model Optimizer for Inception V1 frozen model and use custom sub-graph replacement file transform.json for model conversion. For more information about this feature refer to Sub-Graph Replacement in the Model Optimizer.
    python3 mo_tf.py --input_model inception_v1.pb -b 1 --tensorflow_use_custom_operations_config transform.json
  • Launching the Model Optimizer for Inception V1 frozen model and dump information about the graph to TensorBoard log dir /tmp/log_dir.
    python3 mo_tf.py --input_model inception_v1.pb -b 1 --tensorboard_logdir /tmp/log_dir
  • Launching the Model Optimizer for a model with custom TensorFlow operations (refer to the TensorFlow* documentation) implemented in C++ and compiled into the shared library my_custom_op.so. Model Optimizer falls back to TensorFlow to infer output shape of operations implemented in the library if a custom TensorFlow operation library is provided. If it is not provided, a custom operation with an inference function is needed. For more information about custom operations, refer to the Extending the Model Optimizer with New Primitives.
    python3 mo_tf.py --input_model custom_model.pb --tensorflow_custom_layer_libraries ./my_custom_op.so

Converting TensorFlow* Object Detection API Models

What's New in the 2018 R4 Release

  • With 2018 R4 release, the Model Optimizer supports the --input_shape command line parameter for the TensorFlow* Object Detection API topologies. Refer to the Custom Input Shape for more information.
  • To generate IRs for SSD topologies, the Model Optimizer creates a number of PriorBoxClustered layers instead of a constant node with prior boxes calculated for the particular input image size. This change allows you to reshape the topology in the Inference Engine using dedicated Inference Engine API. The reshaping is supported for all SSD topologies except FPNs which contain hardcoded shapes for some operations preventing from changing topology input shape.

How to Convert a Model

With 2018 R3 release, the Model Optimizer introduces a new approach to convert models created using the TensorFlow* Object Detection API. Compared with the previous approach, the new process produces inference results with higher accuracy and does not require modifying any configuration files and providing intricate command line parameters.

You can download TensorFlow* Object Detection API models from the Object Detection Model Zoo.

NOTE: Before converting, make sure you have configured the Model Optimizer. For configuration steps, refer to Configuring the Model Optimizer.

To convert a TensorFlow* Object Detection API model, go to the <INSTALL_DIR>/deployment_tools/model_optimizer directory and run the mo_tf.py script with the following required parameters:

  • --input_model <path_to_frozen.pb> - File with a pre-trained model (binary or text .pb file after freezing)
  • --tensorflow_use_custom_operations_config <path_to_subgraph_replacement_configuration_file.json> - A subgraph replacement configuration file that describes rules to convert specific TensorFlow* topologies. For the models downloaded from the TensorFlow* Object Detection API zoo, you can find the configuration files in the <INSTALL_DIR>/deployment_tools/model_optimizer/extensions/front/tf directory. Use:
    • ssd_v2_support.json - for frozen SSD topologies from the models zoo
    • faster_rcnn_support.json - for frozen Faster R-CNN topologies from the models zoo
    • faster_rcnn_support_api_v1.7.json - for Faster R-CNN topologies trained manually using the TensorFlow* Object Detection API version 1.7.0 or higher
    • mask_rcnn_support.json - for frozen Mask R-CNN topologies from the models zoo
    • mask_rcnn_support_api_v1.7.json - for Mask R-CNN topologies trained manually using the TensorFlow* Object Detection API version 1.7.0 or higher up to 1.10.1 inclusively
    • mask_rcnn_support_api_v1.11.json - for Mask R-CNN topologies trained manually using the TensorFlow* Object Detection API version 1.11.0 or higher
    • rfcn_support.json - for the frozen RFCN topology from the models zoo frozen with TensorFlow* version 1.9.0 or lower.
  • --tensorflow_object_detection_api_pipeline_config <path_to_pipeline.config> - A special configuration file that describes the topology hyper-parameters and structure of the TensorFlow Object Detection API model. For the models downloaded from the TensorFlow* Object Detection API zoo, the configuration file is named pipeline.config. If you plan to train a model yourself, you can find templates for these files in the models repository.
  • --input_shape (optional) - A custom input image shape. Refer to Custom Input Shape for more information how the --input_shape parameter is handled for the TensorFlow* Object Detection API models.

NOTE: If you convert a TensorFlow* Object Detection API model to use with the Inference Engine sample applications, you must specify the --reverse_input_channels parameter also.

Additionally to the mandatory parameters listed above you can use optional conversion parameters if needed. A full list of parameters is available in the Converting a TensorFlow* Model topic.

For example, if you downloaded the pre-trained SSD InceptionV2 topology and extracted archive to the directory /tmp/ssd_inception_v2_coco_2018_01_28, the sample command line to convert the model looks as follows:

<INSTALL_DIR>/deployment_tools/model_optimizer/mo_tf.py --input_model=/tmp/ssd_inception_v2_coco_2018_01_28/frozen_inference_graph.pb --tensorflow_use_custom_operations_config <INSTALL_DIR>/deployment_tools/model_optimizer/extensions/front/tf/ssd_v2_support.json --tensorflow_object_detection_api_pipeline_config /tmp/ssd_inception_v2_coco_2018_01_28/pipeline.config --reverse_input_channels

Custom Input Shape

Model Optimizer handles command-line parameter --input_shape for TensorFlow* Object Detection API models in a special way depending on the image resizer type defined in the pipeline.config file. TensorFlow* Object Detection API generates different Preprocessor sub-graph based on the image resizer type. Model Optimizer supports two types of image resizer:

  • fixed_shape_resizer - Stretches input image to the specific height and width. The pipeline.config snippet below shows a fixed_shape_resizer sample definition:

     

    image_resizer {
      fixed_shape_resizer {
        height: 300
        width: 300
      }
    }
  • keep_aspect_ratio_resizer - Resizes the input image keeping aspect ratio to satisfy the minimum and maximum size constraints. The pipeline.config snippet below shows a keep_aspect_ratio_resizer sample definition:
    image_resizer {
      keep_aspect_ratio_resizer {
    min_dimension: 600
    max_dimension: 1024
      }
    }

Fixed Shape Resizer Replacement

  • If the --input_shape command line parameter is not specified, the Model Optimizer generates an input layer with the height and width as defined in the pipeline.config.
  • If the --input_shape [1, H, W, 3] command line parameter is specified, the Model Optimizer sets the input layer height to H and width to W and convert the model. However, the conversion may fail because of the following reasons:
  • The model is not reshape-able, meaning that it's not possible to change the size of the model input image. For example, SSD FPN models have Reshape operations with hard-coded output shapes, but the input size to these Reshape instances depends on the input image size. In this case, the Model Optimizer shows an error during the shape inference phase. Run the Model Optimizer with --log_level DEBUG to see the inferred layers output shapes to see the mismatch.
  • Custom input shape is too small. For example, if you specify --input_shape [1,100,100,3] to convert a SSD Inception V2 model, one of convolution or pooling nodes decreases input tensor spatial dimensions to non-positive values. In this case, the Model Optimizer shows error message like this: [ ERROR ] Shape [ 1 -1 -1 256] is not fully defined for output X of "node_name".

Keep Aspect Ratio Resizer Replacement

  • If the --input_shape command line parameter is not specified, the Model Optimizer generates an input layer with both height and width equal to the value of parameter min_dimension in the keep_aspect_ratio_resizer.
  • If the --input_shape [1, H, W, 3] command line parameter is specified, the Model Optimizer scales the specified input image height H and width W to satisfy the min_dimension and max_dimension constraints defined in the keep_aspect_ratio_resizer. The following function calculates the input layer height and width:
def calculate_shape_keeping_aspect_ratio(H: int, W: int, min_dimension: int, max_dimension: int):
    ratio_min = min_dimension / min(H, W)
    ratio_max = max_dimension / max(H, W)
    ratio = min(ratio_min, ratio_max)
    return int(round(H * ratio)), int(round(W * ratio))

Models with keep_aspect_ratio_resizer were trained to recognize object in real aspect ratio, in contrast with most of the classification topologies trained to recognize objects stretched vertically and horizontally as well. By default, the Model Optimizer converts topologies with keep_aspect_ratio_resizer to consume a square input image. If the non-square image is provided as input, it is stretched without keeping aspect ratio that results to objects detection quality decrease.

NOTE: It is highly recommended to specify the --input_shape command line parameter for the models with keep_aspect_ratio_resizer if the input image dimensions are known in advance.

Important Notes About Feeding Input Images to the Samples

Inference Engine comes with a number of samples that use Object Detection API models including:

There are a number of important notes about feeding input images to the samples:

  1. Inference Engine samples stretch input image to the size of the input layer without preserving aspect ratio. This behavior is usually correct for most topologies (including SSDs), but incorrect for the following Faster R-CNN topologies: Inception ResNet, Inception V2, ResNet50 and ResNet101. Images pre-processing for these topologies keeps aspect ratio. Also all Mask R-CNN and R-FCN topologies require keeping aspect ratio. The type of pre-processing is defined in the pipeline configuration file in the section image_resizer. If keeping aspect ratio is required, then it is necessary to resize image before passing it to the sample.
  2. TensorFlow* implementation of image resize may be different from the one implemented in the sample. Even reading input image from compressed format (like .jpg) could give different results in the sample and TensorFlow*. So, if it is necessary to compare accuracy between the TensorFlow* and the Inference Engine it is recommended to pass pre-scaled input image in a non-compressed format (like .bmp).
  3. If you want to infer the model with the Inference Engine samples, convert the model specifying the --reverse_input_channels command line parameter. The samples load images in BGR channels order, while TensorFlow* models were trained with images in RGB order. When the --reverse_input_channels command line parameter is specified, the Model Optimizer performs first convolution or other channel dependent operation weights modification so the output will be like the image is passed with RGB channels order.

Detailed Explanations of Model Conversion Process

This section is intended for users who want to understand how the Model Optimizer performs Object Detection API models conversion in details. The knowledge given in this section is also useful for users having complex models that are not converted with the Model Optimizer out of the box. It is highly recommended to read Sub-Graph Replacement in Model Optimizer chapter first to understand sub-graph replacement concepts which are used here.

Implementation of the sub-graph replacers for Object Detection API models is located in the file <INSTALL_DIR>/deployment_tools/model_optimizer/extensions/front/tf/ObjectDetectionAPI.py.

It is also important to open the model in the TensorBoard to see the topology structure. Model Optimizer can create an event file that can be then fed to the TensorBoard* tool. Run the Model Optimizer with providing two command line parameters:

  • --input_model <path_to_frozen.pb> - Path to the frozen model
  • --tensorboard_logdir - Path to the directory where TensorBoard looks for the event files

SSD (Single Shot Multibox Detector) Topologies

The SSD topologies are the simplest ones among Object Detection API topologies, so they will be analyzed first. The sub-graph replacement configuration file ssd_v2_support.json, which should be used to convert these models, contains three sub-graph replacements: ObjectDetectionAPIPreprocessorReplacement, ObjectDetectionAPISSDPostprocessorReplacement and ObjectDetectionAPIOutputReplacement. Their implementation is described below.

Preprocessor Block

All Object Detection API topologies contain Preprocessor block of nodes (or "scope") that performs two tasks:

  • Scales image to the size required by the topology.
  • Applies mean and scale values if needed.

Model Optimizer cannot convert the part of the Preprocessor block performing scaling because the TensorFlow implementation uses while- loops which the Inference Engine does not support. Another reason is that the Inference Engine samples scale input images to the size of the input layer from the Intermediate Representation (IR) automatically. Given that it is necessary to cut-off the scaling part of the Preprocessor block and leave only operations applying mean and scale values. This task is solved using the Model Optimizer sub-graph replacer mechanism.

The Preprocessor block has two outputs: the tensor with pre-processed image(s) data and a tensor with pre-processed image(s) size(s). While converting the model, Model Optimizer keeps only the nodes producing the first tensor. The second tensor is a constant which can be obtained from the pipeline.config file to be used in other replacers.

The implementation of the Preprocessor block sub-graph replacer is the following (file <INSTALL_DIR>/deployment_tools/model_optimizer/extensions/front/tf/ObjectDetectionAPI.py):

class ObjectDetectionAPIPreprocessorReplacement(FrontReplacementFromConfigFileSubGraph):
    """
    The class replaces the "Preprocessor" block resizing input image and applying mean/scale values. Only nodes related
    to applying mean/scaling values are kept.
    """
    replacement_id = 'ObjectDetectionAPIPreprocessorReplacement'

    def run_before(self):
        return [Pack, Sub]

    def nodes_to_remove(self, graph: nx.MultiDiGraph, match: SubgraphMatch):
        new_nodes_to_remove = match.matched_nodes_names()
        # do not remove nodes that perform input image scaling and mean value subtraction
        for node_to_keep in ('Preprocessor/sub', 'Preprocessor/sub/y', 'Preprocessor/mul', 'Preprocessor/mul/x'):
            if node_to_keep in new_nodes_to_remove:
                new_nodes_to_remove.remove(node_to_keep)
        return new_nodes_to_remove

    def generate_sub_graph(self, graph: nx.MultiDiGraph, match: SubgraphMatch):
        argv = graph.graph['cmd_params']
        layout = graph.graph['layout']
        if argv.tensorflow_object_detection_api_pipeline_config is None:
            raise Error(missing_param_error)
        pipeline_config = PipelineConfig(argv.tensorflow_object_detection_api_pipeline_config)

        sub_node = match.output_node(0)[0]
        if not sub_node.has('op') or sub_node.op != 'Sub':
            raise Error('The output op of the Preprocessor sub-graph is not of type "Sub". Looks like the topology is '
                        'not created with TensorFlow Object Detection API.')

        mul_node = None
        if sub_node.in_node(0).has('op') and sub_node.in_node(0).op == 'Mul':
            log.info('There is image scaling node in the Preprocessor block.')
            mul_node = sub_node.in_node(0)

        initial_input_node_name = 'image_tensor'
        if initial_input_node_name not in graph.nodes():
            raise Error('Input node "{}" of the graph is not found. Do not run the Model Optimizer with '
                        '"--input" command line parameter.'.format(initial_input_node_name))
        placeholder_node = Node(graph, initial_input_node_name)

        # set default value of the batch size to 1 if user didn't specify batch size and input shape
        batch_dim = get_batch_dim(layout, 4)
        if argv.batch is None and placeholder_node.shape[batch_dim] == -1:
            placeholder_node.shape[batch_dim] = 1
        if placeholder_node.shape[batch_dim] > 1:
            print("[ WARNING ] The batch size more than 1 is supported for SSD topologies only.")
        height, width = calculate_placeholder_spatial_shape(graph, match, pipeline_config)
        placeholder_node.shape[get_height_dim(layout, 4)] = height
        placeholder_node.shape[get_width_dim(layout, 4)] = width

        # save the pre-processed image spatial sizes to be used in the other replacers
        graph.graph['preprocessed_image_height'] = placeholder_node.shape[get_height_dim(layout, 4)]
        graph.graph['preprocessed_image_width'] = placeholder_node.shape[get_width_dim(layout, 4)]

        to_float_node = placeholder_node.out_node(0)
        if not to_float_node.has('op') or to_float_node.op != 'Cast':
            raise Error('The output of the node "{}" is not Cast operation. Cannot apply replacer.'.format(
                initial_input_node_name))

        # connect to_float_node directly with node performing scale on mean value subtraction
        if mul_node is None:
            create_edge(to_float_node, sub_node, 0, 0)
        else:
            create_edge(to_float_node, mul_node, 0, 1)

        print('The Preprocessor block has been removed. Only nodes performing mean value subtraction and scaling (if'
              ' applicable) are kept.')
        return {}

The run_before function defines a list of replacers which current replacer should be run before. In this case it is Pack and Sub. The Sub operation is not supported by Inference Engine plugins so Model Optimizer replaces it with a combination of the Eltwise layer (element-wise sum) and the ScaleShift layer. But the Preprocessor replacer expects to see Sub node, so it should be called before the Sub is replaced.

The nodes_to_remove function returns list of nodes that should be removed after the replacement happens. In this case it removes all nodes matched in the Preprocessor scope except the Sub and Mul nodes performing mean value subtraction and scaling.

The generate_sub_graph function performs the following actions:

  • Lines 20-24: Reads the pipeline.config configuration file to get the model hyper-parameters and other attributes.
  • Lines 25-29: Checks that the output node of the Preprocessor scope is of type Sub.
  • Lines 31-34: Checks that the input of the Sub node is of type Mul. This information is needed to correctly connect the input node of the topology later.
  • Lines 36-50: Finds the topology input (placeholder) node and sets its weight and height according to the image resizer defined in the pipeline.config file and the --input_shape provided by the user. The batch size is set to 1 by default, but it will be overridden if you specify a batch size using command-line option -b. Refer to the Custom Input Shape on how the Model Optimizer calculates input layer height and width.
  • Lines 52-54: Saves the placeholder shape in the graph object for other sub-graph replacements.
  • Lines 56-59: Checks that the placeholder node follows the 'Cast' node which converts model input data from UINT8 to FP32.
  • Lines 61-65: Creates edge from the placeholder node to the Mul (if present) or Sub node to a correct input port (0 for Sub and 1 for Mul).
  • Line 69: The replacer returns a dictionary with nodes mapping that is used by other sub-graph replacement functions. In this case, it is not needed, so the empty dictionary is returned.
Postprocessor Block

A distinct feature of any SSD topology is a part performing non-maximum suppression of proposed images bounding boxes. This part of the topology is implemented with dozens of primitive operations in TensorFlow, while in Inference Engine, it is one layer called DetectionOutput. Thus, to convert a SSD model from the TensorFlow, the Model Optimizer should replace the entire sub-graph of operations that implement the DetectionOutput layer with a single DetectionOutput node.

The Inference Engine DetectionOutput layer implementation consumes three tensors in the following order:

  1. Tensor with locations of bounding boxes
  2. Tensor with confidences for each bounding box
  3. Tensor with prior boxes ("anchors" in a TensorFlow terminology)

The Inference Engine DetectionOutput layer implementation produces one tensor with seven numbers for each actual detection:

  • batch index
  • class label
  • class probability
  • x_1 box coordinate
  • y_1 box coordinate
  • x_2 box coordinate
  • y_2 box coordinate.

There are more output tensors in the TensorFlow Object Detection API: detection_boxes, detection_classes, detection_scores, and num_detections, but the values in them are consistent with the output values of the Inference Engine DetectionOutput layer.

The sub-graph replacement by points is used in the ssd_v2_support.json to match the Postprocessor block. The start points are defined the following way:

  • Postprocessor/Shape receives tensor with bounding boxes
  • Postprocessor/scale_logits receives tensor with confidences(probabilities) for each box
  • Postprocessor/Tile receives tensor with prior boxes (anchors)
  • Postprocessor/Reshape_1 is specified only to match the whole Postprocessor scope. Not used in the replacement code.
  • Postprocessor/ToFloat is specified only to match the whole Postprocessor scope. Not used in the replacement code.

There are a number of differences in layout, format, and content of input tensors to DetectionOutput layer and what tensors generates TensorFlow, so additional tensors processing before creating DetectionOutput layer is required. It is described below. The sub-graph replacement class for the DetectionOutput layer is given below:

class ObjectDetectionAPISSDPostprocessorReplacement(FrontReplacementFromConfigFileSubGraph):
    replacement_id = 'ObjectDetectionAPISSDPostprocessorReplacement'

    def run_after(self):
        return [ObjectDetectionAPIPreprocessorReplacement]

    def run_before(self):
        # the replacer uses node of type "RealDiv" as one of the start points, but Model Optimizer replaces nodes of
        # type "RealDiv" with a new ones, so it is necessary to replace the sub-graph before replacing the "RealDiv"
        # nodes
        return [Div, StandaloneConstEraser]

    def output_edges_match(self, graph: nx.DiGraph, match: SubgraphMatch, new_sub_graph: dict):
        # the DetectionOutput in IE produces single tensor, but in TF it produces two tensors, so create only one output
        # edge match
        return {match.output_node(0)[0].id: new_sub_graph['detection_output_node'].id}

    def generate_sub_graph(self, graph: nx.MultiDiGraph, match: SubgraphMatch):
        argv = graph.graph['cmd_params']
        if argv.tensorflow_object_detection_api_pipeline_config is None:
            raise Error(missing_param_error)
        pipeline_config = PipelineConfig(argv.tensorflow_object_detection_api_pipeline_config)
        num_classes = _value_or_raise(match, pipeline_config, 'num_classes')

        # reshapes confidences to 4D before applying activation function
        expand_dims_op = Reshape(graph, {'dim': np.array([0, 1, -1, num_classes + 1])})
        # do not convert from NHWC to NCHW this node shape
        expand_dims_node = expand_dims_op.create_node([match.input_nodes(1)[0][0].in_node(0)],
                                                      dict(name='do_ExpandDims_conf'))

        activation_function = _value_or_raise(match, pipeline_config, 'postprocessing_score_converter')
        activation_conf_node = add_activation_function_after_node(graph, expand_dims_node, activation_function)
        PermuteAttrs.set_permutation(expand_dims_node, expand_dims_node.out_node(), None)

        # IE DetectionOutput layer consumes flattened tensors
        # reshape operation to flatten locations tensor
        reshape_loc_op = Reshape(graph, {'dim': np.array([0, -1])})
        reshape_loc_node = reshape_loc_op.create_node([match.input_nodes(0)[0][0].in_node(0)],
                                                      dict(name='do_reshape_loc'))

        # IE DetectionOutput layer consumes flattened tensors
        # reshape operation to flatten confidence tensor
        reshape_conf_op = Reshape(graph, {'dim': np.array([0, -1])})
        reshape_conf_node = reshape_conf_op.create_node([activation_conf_node], dict(name='do_reshape_conf'))

        if pipeline_config.get_param('ssd_anchor_generator_num_layers') is not None or \
            pipeline_config.get_param('multiscale_anchor_generator_min_level') is not None:
            # change the Reshape operations with hardcoded number of output elements of the convolution nodes to be
            # reshapable
            _relax_reshape_nodes(graph, pipeline_config)

            # create PriorBoxClustered nodes instead of a constant value with prior boxes so the model could be reshaped
            if pipeline_config.get_param('ssd_anchor_generator_num_layers') is not None:
                priors_node = _create_prior_boxes_node(graph, pipeline_config)
            elif pipeline_config.get_param('multiscale_anchor_generator_min_level') is not None:
                priors_node = _create_multiscale_prior_boxes_node(graph, pipeline_config)
        else:
            log.info('The anchor generator is not known. Save constant with prior-boxes to IR.')
            priors_node = match.input_nodes(2)[0][0].in_node(0)

        # creates DetectionOutput Node object from Op class
        detection_output_op = DetectionOutput(graph, match.custom_replacement_desc.custom_attributes)
        detection_output_op.attrs['old_infer'] = detection_output_op.attrs['infer']
        detection_output_op.attrs['infer'] = __class__.do_infer
        detection_output_node = detection_output_op.create_node(
            [reshape_loc_node, reshape_conf_node, priors_node],
            dict(name=detection_output_op.attrs['type'],
                 clip=1,
                 confidence_threshold=_value_or_raise(match, pipeline_config, 'postprocessing_score_threshold'),
                 top_k=_value_or_raise(match, pipeline_config, 'postprocessing_max_detections_per_class'),
                 keep_top_k=_value_or_raise(match, pipeline_config, 'postprocessing_max_total_detections'),
                 nms_threshold=_value_or_raise(match, pipeline_config, 'postprocessing_iou_threshold')))

        return {'detection_output_node': detection_output_node}

The run_before and run_after functions define lists of replacers that this replacer should be run before and after respectively.

The input_edges_match and output_edges_match functions generate dictionaries describing how the input/output nodes matched with the replacer should be connected with new nodes generated in the generate_sub_graph function. Refer to Sub-Graph Replacement in the Model Optimizer for more information.

The generate_sub_graph function performs the following actions:

  • Lines 19-23: Reads the pipeline.config configuration file to get the model hyper-parameters and other attributes.
  • Lines 25-32: Makes tensor with confidences 4D and apply correct activation function (read from the pipeline.config file) to it.
  • Line 33: Disables permutation of expand_dims_node's attributes because they are already in the NCHW layout.
  • Lines 35-39: Makes tensor with bounding boxes 2D, where the first dimension corresponds to a batch size.
  • Lines 49-52: Makes tensor with confidences 2D, where the first dimension corresponds to a batch size.
  • Lines 41-44: Creates a node with DetectionOutput layer with a number of layer attributes from the pipeline.config file. Also the inference function (infer attribute) is updated with a custom inference function __class__.do_infer. The latter change is described below.
  • Lines 46-59: Creates several PriorBoxClustered layers which generate prior boxes depending on the type of the grid anchor generator defined in the pipeline.config file. If the grid anchor type is not known then initialize priors_node as a node matched by the sub-graph replacement. In the latter case it is a constant node with prior boxes calculated for a particular input image shape.
  • Lines 61-72: Creates DetectionOutput layer with attributes from the pipeline.config file.
  • Line 74: Returns dictionary with mapping of nodes that is used in the input_edges_match and output_edges_match functions.

The paragraphs below explains why the inference function for the Detection Output layer is modified. Before that, it is necessary to make acquaintance with selected high-level steps of the Model Optimize model conversion pipeline. Note, that only selected steps are required for understanding the change are mentioned:

  1. Model Optimizer creates calculation graph from the initial topology where each nodes corresponds to a operation from the initial model.
  2. Model Optimizer performs "Front replacers" (including the one being described now).
  3. Model Optimizer adds data nodes between operation nodes to the graph.
  4. Model Optimizer performs "Middle replacers".
  5. Model Optimizer performs "shape inference" phase. During this phase the shape of all data nodes is being calculated. Model Optimizer also calculates value for data tensors which are constant, i.e. do not depend on input. For example, tensor with prior boxes (generated with MultipleGridAnchorGenerator or similar scopes) does not depend on input and is evaluated by Model Optimizer during shape inference. Model Optimizer uses inference function stored in the 'infer' attribute of operation nodes.
  6. Model Optimizer performs "Back replacers".
  7. Model Optimizer generates IR.

The do_infer function is needed to perform some adjustments to the tensor with prior boxes (anchors) that is known only after the shape inference phase and to perform additional transformations described below. This change is performed only if the tensor with prior boxes is not constant (so it is produced by PriorBoxClustered layers during inference). It is possible to make the Postprocessor block replacement as a Middle replacer (so the prior boxes tensor would be evaluated by the time the replacer is called), but in this case it will be necessary to correctly handle data nodes which are created between each pair of initially adjacent operation nodes. In order to inject required modification to the inference function of the DetectionOutput node, a new function is created to perform modifications and to call the initial inference function. The code of a new inference function is the following:

   @staticmethod
    def do_infer(node: Node):
        prior_boxes = node.in_node(2).value
        if prior_boxes is not None:
            # these are default variances values
            variance = np.array([[0.1, 0.1, 0.2, 0.2]])
            # replicating the variance values for all prior-boxes
            variances = np.tile(variance, [prior_boxes.shape[-2], 1])
            # DetectionOutput in the Inference Engine expects the prior-boxes in the following layout: (values, variances)
            prior_boxes = prior_boxes.reshape([-1, 4])
            prior_boxes = np.concatenate((prior_boxes, variances), 0)
            # compared to the IE's DetectionOutput, the TF keeps the prior-boxes in YXYX, need to get back to the XYXY
            prior_boxes = np.concatenate((prior_boxes[:, 1:2], prior_boxes[:, 0:1],
                                          prior_boxes[:, 3:4], prior_boxes[:, 2:3]), 1)
            #  adding another dimensions, as the prior-boxes are expected as 3d tensors
            prior_boxes = prior_boxes.reshape((1, 2, -1))
            node.in_node(2).shape = np.array(prior_boxes.shape, dtype=np.int64)
            node.in_node(2).value = prior_boxes

        node.old_infer(node)
        # compared to the IE's DetectionOutput, the TF keeps the locations in YXYX, need to get back to the XYXY
        # for last convolutions that operate the locations need to swap the X and Y for output feature weights & biases
        conv_nodes = backward_bfs_for_operation(node.in_node(0), ['Conv2D'])
        swap_weights_xy(conv_nodes)
        squeeze_reshape_and_concat(conv_nodes)

        for node_name in node.graph.nodes():
            node = Node(node.graph, node_name)
            if node.has_and_set('swap_xy_count') and len(node.out_nodes()) != node['swap_xy_count']:
                raise Error('The weights were swapped for node "{}", but this weight was used in other nodes.'.format(
                    node.name))
  • Lines 3-18: Updates the value of the tensor with prior boxes by appending variance values if the prior boxes are pre-calculated. Inference Engine implementation of the DetectionOutput layer expects these values located within the tensor with bounding boxes, but in TensorFlow they are applied in different way.
  • Line 20: Executes initial inference function to calculate the output shape of this node.
  • Lines 23-24: Finds predecessor node of type "Conv2D" of the node with bounding boxes (which is node.in_node(0)) and modifies convolution weights so "X" and "Y" coordinates are swapped. In TensorFlow bounding boxes are stored in the tensors in "YXYX" order, while in the Inference Engine it is "XYXY".
  • Line 25: Executes function looking for Reshape operations after the Conv2D nodes found above with 4D output and remove the dimension with index 2 which should be equal to 1. This is a workaround to make tensor 3D so its shape will not be transposed during the IR generation. The problem arises when bounding boxes predictions are reshaped from [1, 1, 1, X] to [1, X / 4, 1, 4]. The result tensor should not be transposed because after transpose it will have shape [1, 4, X / 4, 1] and the concatenation over dimension with index 2 will produce incorrect tensor. Also the function looks for Concat operations and changes the concatenation dimension from 2 to 1.

Faster R-CNN Topologies

The Faster R-CNN models contain several building blocks similar to building blocks from SSD models so it is highly recommended to read the section about converting them first. Detailed information about Faster R-CNN topologies is provided in the abstract.

Preprocessor Block

Faster R-CNN topologies contain similar Preprocessor block as SSD topologies. The same ObjectDetectionAPIPreprocessorReplacement sub-graph replacer is used to cut it off.

Proposal Layer

The Proposal layer is implemented with dozens of primitive operations in TensorFlow, meanwhile, it is a single layer in the Inference Engine. The ObjectDetectionAPIProposalReplacement sub-graph replacer identifies nodes corresponding to the layer and replaces them with required new nodes.

class ObjectDetectionAPIProposalReplacement(FrontReplacementFromConfigFileSubGraph):
    """
    This class replaces sub-graph of operations with Proposal layer and additional layers transforming
    tensors from layout of TensorFlow to layout required by Inference Engine.
    Refer to comments inside the function for more information about performed actions.
    """
    replacement_id = 'ObjectDetectionAPIProposalReplacement'

    def run_after(self):
        return [ObjectDetectionAPIPreprocessorReplacement]

    def run_before(self):
        return [Sub, CropAndResizeReplacement]

    def output_edges_match(self, graph: nx.DiGraph, match: SubgraphMatch, new_sub_graph: dict):
        return {match.output_node(0)[0].id: new_sub_graph['proposal_node'].id}

    def nodes_to_remove(self, graph: nx.MultiDiGraph, match: SubgraphMatch):
        new_list = match.matched_nodes_names().copy()
        # do not remove nodes that produce box predictions and class predictions
        new_list.remove(match.single_input_node(0)[0].id)
        new_list.remove(match.single_input_node(1)[0].id)
        return new_list

    def generate_sub_graph(self, graph: nx.MultiDiGraph, match: SubgraphMatch):
        argv = graph.graph['cmd_params']
        if argv.tensorflow_object_detection_api_pipeline_config is None:
            raise Error(missing_param_error)
        pipeline_config = PipelineConfig(argv.tensorflow_object_detection_api_pipeline_config)

        input_height = graph.graph['preprocessed_image_height']
        input_width = graph.graph['preprocessed_image_width']
        max_proposals = _value_or_raise(match, pipeline_config, 'first_stage_max_proposals')
        proposal_ratios = _value_or_raise(match, pipeline_config, 'anchor_generator_aspect_ratios')
        proposal_scales = _value_or_raise(match, pipeline_config, 'anchor_generator_scales')
        anchors_count = len(proposal_ratios) * len(proposal_scales)

        # Convolution/matmul node that produces classes predictions
        # Permute result of the tensor with classes permissions so it will be in a correct layout for Softmax
        predictions_node = backward_bfs_for_operation(match.single_input_node(1)[0], ['Add'])[0]
        permute_predictions_op = Permute(graph, dict(order=np.array([0, 2, 3, 1])))
        permute_predictions_node = permute_predictions_op.create_node([], dict(name=predictions_node.name + '/Permute'))
        insert_node_after(predictions_node, permute_predictions_node, 0)

        # creates constant input with the image height, width and scale H and scale W (if present) required for Proposal
        const_op = Const(graph, dict(value=np.array([[input_height, input_width, 1]], dtype=np.float32)))
        const_node = const_op.create_node([], dict(name='proposal_const_image_size'))

        reshape_classes_op = Reshape(graph, dict(dim=np.array([0, -1, 2])))
        reshape_classes_node = reshape_classes_op.create_node([permute_predictions_node],
                                                              dict(name='reshape_FirstStageBoxPredictor_class'))

        softmax_conf_op = Softmax(graph, dict(axis=2))
        softmax_conf_node = softmax_conf_op.create_node([reshape_classes_node],
                                                        dict(name='FirstStageBoxPredictor_softMax_class'))
        PermuteAttrs.set_permutation(reshape_classes_node, softmax_conf_node, None)

        reshape_softmax_op = Reshape(graph, dict(dim=np.array([1, anchors_count, 2, -1])))
        reshape_softmax_node = reshape_softmax_op.create_node([softmax_conf_node], dict(name='reshape_softmax_class'))
        PermuteAttrs.set_permutation(softmax_conf_node, reshape_softmax_node, None)

        permute_reshape_softmax_op = Permute(graph, dict(order=np.array([0, 1, 3, 2])))
        permute_reshape_softmax_node = permute_reshape_softmax_op.create_node([reshape_softmax_node], dict(
            name=reshape_softmax_node.name + '/Permute'))

        # implement custom reshape infer function because we need to know the input convolution node output dimension
        # sizes but we can know it only after partial infer
        reshape_permute_op = Reshape(graph,
                                     dict(dim=np.ones([4]), anchors_count=anchors_count, conv_node=predictions_node))
        reshape_permute_op.attrs['old_infer'] = reshape_permute_op.attrs['infer']
        reshape_permute_op.attrs['infer'] = __class__.classes_probabilities_reshape_shape_infer
        reshape_permute_node = reshape_permute_op.create_node([permute_reshape_softmax_node],
                                                              dict(name='Reshape_Permute_Class'))

        proposal_op = ProposalOp(graph, dict(min_size=1,
                                             framework='tensorflow',
                                             pre_nms_topn=2 ** 31 - 1,
                                             box_size_scale=5,
                                             box_coordinate_scale=10,
                                             post_nms_topn=max_proposals,
                                             feat_stride=_value_or_raise(match, pipeline_config,
                                                                         'features_extractor_stride'),
                                             ratio=proposal_ratios,
                                             scale=proposal_scales,
                                             base_size=_value_or_raise(match, pipeline_config,
                                                                       'anchor_generator_base_size'),
                                             nms_thresh=_value_or_raise(match, pipeline_config,
                                                                        'first_stage_nms_iou_threshold')))

        anchors_node = backward_bfs_for_operation(match.single_input_node(0)[0], ['Add'])[0]
        proposal_node = proposal_op.create_node([reshape_permute_node, anchors_node, const_node],
                                                dict(name='proposals'))

        # the TF implementation of ROIPooling with bi-linear filtration need proposals scaled by image size
        proposal_scale_const = np.array([1.0, 1 / input_height, 1 / input_width, 1 / input_height, 1 / input_width],
                                        dtype=np.float32)
        proposal_scale_const_op = Const(graph, dict(value=proposal_scale_const))
        proposal_scale_const_node = proposal_scale_const_op.create_node([], dict(name='Proposal_scale_const'))

        scale_proposals_op = Eltwise(graph, dict(operation='mul'))
        scale_proposals_node = scale_proposals_op.create_node([proposal_node, proposal_scale_const_node],
                                                              dict(name='scaled_proposals'))

        proposal_reshape_4d_op = Reshape(graph, dict(dim=np.array([1, 1, max_proposals, 5]), nchw_layout=True))
        proposal_reshape_4d_node = proposal_reshape_4d_op.create_node([scale_proposals_node],
                                                                      dict(name="reshape_proposals_4d"))

        # creates the Crop operation that gets input from the Proposal layer and gets tensor with bounding boxes only
        crop_op = Crop(graph, dict(axis=np.array([3]), offset=np.array([1]), dim=np.array([4]), nchw_layout=True))
        crop_node = crop_op.create_node([proposal_reshape_4d_node], dict(name='crop_proposals'))

        proposal_reshape_3d_op = Reshape(graph, dict(dim=np.array([0, -1, 4]), nchw_layout=True))
        proposal_reshape_3d_node = proposal_reshape_3d_op.create_node([crop_node], dict(name="tf_proposals"))

        return {'proposal_node': proposal_reshape_3d_node}

    @staticmethod
    def classes_probabilities_reshape_shape_infer(node: Node):
        # now we can determine the reshape dimensions from Convolution node
        conv_node = node.conv_node
        conv_output_shape = conv_node.out_node().shape

        # update desired shape of the Reshape node
        node.dim = np.array([0, conv_output_shape[1], conv_output_shape[2], node.anchors_count * 2])
        node.old_infer(node)

The main interest of the implementation of this replacer is the generate_sub_graph function.

Lines 26-36: Parses the pipeline.config file and gets required parameters for the Proposal layer.

Lines 38-73: Performs the following manipulations with the tensor with class predictions:

  1. TensorFlow uses the NHWC layout, while the Inference Engine uses NCHW. Model Optimizer by default performs transformations with all nodes data in the inference graph to convert it to the NCHW layout. The size of 'C' dimension of the tensor with class predictions is equal to base_anchors_count * 2, where 2 corresponds to a number of classes (background and foreground) and base_anchors_count is equal to number of anchors that are applied to each position of 'H' and 'W' dimensions. Therefore, there are H * W * base_anchors_count bounding boxes.

    Lines 54-56 apply the Softmax layer to this tensor to get class probabilities for each bounding box.

  2. The dimension with classes must be in the fastest growing dimension to apply the Softmax activation. Lines 41-43 permute the tensor to NHWC layout first (because the Model Optimizer automatically permuted it to NCHW before) and then reshape to [N, total_bounding_boxes, 2].
  3. After applying the Softmax activation, lines 52-64 perform reversed actions to reshape the tensor to the initial dimensions.
  4. The inference function injection (like with DetectionOutput layer for SSD conversion) is used for the last reshape (lines 71-72), as the value of 'H' and 'W' dimensions are unknown during the replacement (because this is a Front replacer that is performed before the shape inference).

Lines 75-92: Adds the Proposal layer to the graph. This layer has one input containing input image size (lines 46-47). The image sizes are read from the pipeline.config file.

Lines 94-106: Scales bounding boxes to [0,1] interval as required by the ROIPooling layer with a bi-linear filtration.

Lines 108-113: Crops the output from the Proposal node to remove the batch indices (the Inference Engine implementation of the Proposal layer generates tensor with shape [num_proposals, 5]). The final tensor contains just box coordinates as in the TensorFlow implementation.

Lines 118-125: Updated inference function for the Reshape layer which restores the original shape of tensor with class probabilities. The inference function is patched because the original shape for this tensor is known only during the shape inference phase.

SecondStagePostprocessor Block

The SecondStagePostprocessor block is similar to the Postprocessor block from the SSDs topologies. But there are a number of differences in conversion of the SecondStagePostprocessor block.

class ObjectDetectionAPIDetectionOutputReplacement(FrontReplacementFromConfigFileSubGraph):
    """
    Replaces the sub-graph that is equal to the DetectionOutput layer from Inference Engine. This replacer is used for
    Faster R-CNN, R-FCN and Mask R-CNN topologies conversion.
    The replacer uses a value of the custom attribute 'coordinates_swap_method' from the sub-graph replacement
    configuration file to choose how to swap box coordinates of the 0-th input of the generated DetectionOutput layer.
    Refer to the code for more details.
    """
    replacement_id = 'ObjectDetectionAPIDetectionOutputReplacement'

    def run_before(self):
        return [ObjectDetectionAPIMaskRCNNROIPoolingSecondReplacement, Unpack]

    def run_after(self):
        return [ObjectDetectionAPIProposalReplacement, CropAndResizeReplacement]

    def nodes_to_remove(self, graph: nx.MultiDiGraph, match: SubgraphMatch):
        new_nodes_to_remove = match.matched_nodes_names().copy()
        new_nodes_to_remove.extend(['detection_boxes', 'detection_scores', 'num_detections'])
        return new_nodes_to_remove

    def output_edges_match(self, graph: nx.DiGraph, match: SubgraphMatch, new_sub_graph: dict):
        # the DetectionOutput in IE produces single tensor, but in TF it produces four tensors, so we need to create
        # only one output edge match
        return {match.output_node(0)[0].id: new_sub_graph['detection_output_node'].id}

    def generate_sub_graph(self, graph: nx.MultiDiGraph, match: SubgraphMatch):
        argv = graph.graph['cmd_params']
        if argv.tensorflow_object_detection_api_pipeline_config is None:
            raise Error(missing_param_error)
        pipeline_config = PipelineConfig(argv.tensorflow_object_detection_api_pipeline_config)

        num_classes = _value_or_raise(match, pipeline_config, 'num_classes')
        first_stage_max_proposals = _value_or_raise(match, pipeline_config, 'first_stage_max_proposals')
        activation_function = _value_or_raise(match, pipeline_config, 'postprocessing_score_converter')

        activation_conf_node = add_activation_function_after_node(graph, match.single_input_node(1)[0].in_node(0),
                                                                  activation_function)

        # IE DetectionOutput layer consumes flattened tensors
        # reshape operation to flatten confidence tensor
        reshape_conf_op = Reshape(graph, dict(dim=np.array([1, -1])))
        reshape_conf_node = reshape_conf_op.create_node([activation_conf_node], dict(name='do_reshape_conf'))

        # TF produces locations tensor without boxes for background.
        # Inference Engine DetectionOutput layer requires background boxes so we generate them with some values
        # and concatenate with locations tensor
        fake_background_locs_blob = np.tile([[[1, 1, 2, 2]]], [first_stage_max_proposals, 1, 1])
        fake_background_locs_const_op = Const(graph, dict(value=fake_background_locs_blob))
        fake_background_locs_const_node = fake_background_locs_const_op.create_node([])

        reshape_loc_op = Reshape(graph, dict(dim=np.array([first_stage_max_proposals, num_classes, 4])))
        reshape_loc_node = reshape_loc_op.create_node([match.single_input_node(0)[0].in_node(0)],
                                                      dict(name='reshape_loc'))

        concat_loc_op = Concat(graph, dict(axis=1))
        concat_loc_node = concat_loc_op.create_node([fake_background_locs_const_node, reshape_loc_node],
                                                    dict(name='concat_fake_loc'))
        PermuteAttrs.set_permutation(reshape_loc_node, concat_loc_node, None)
        PermuteAttrs.set_permutation(fake_background_locs_const_node, concat_loc_node, None)

        # constant node with variances
        variances_const_op = Const(graph, dict(value=np.array([0.1, 0.1, 0.2, 0.2])))
        variances_const_node = variances_const_op.create_node([])

        # reshape locations tensor to 2D so it could be passed to Eltwise which will be converted to ScaleShift
        reshape_loc_2d_op = Reshape(graph, dict(dim=np.array([-1, 4])))
        reshape_loc_2d_node = reshape_loc_2d_op.create_node([concat_loc_node], dict(name='reshape_locs_2'))
        PermuteAttrs.set_permutation(concat_loc_node, reshape_loc_2d_node, None)

        # element-wise multiply locations with variances
        eltwise_locs_op = Eltwise(graph, dict(operation='mul'))
        eltwise_locs_node = eltwise_locs_op.create_node([reshape_loc_2d_node, variances_const_node],
                                                        dict(name='scale_locs'))

        # IE DetectionOutput layer consumes flattened tensors
        reshape_loc_do_op = Reshape(graph, dict(dim=np.array([1, -1])))

        custom_attributes = match.custom_replacement_desc.custom_attributes
        coordinates_swap_method = 'add_convolution'
        if 'coordinates_swap_method' not in custom_attributes:
            log.error('The ObjectDetectionAPIDetectionOutputReplacement sub-graph replacement configuration file '
                      'must contain "coordinates_swap_method" in the "custom_attributes" dictionary. Two values are '
                      'supported: "swap_weights" and "add_convolution". The first one should be used when there is '
                      'a MatMul or Conv2D node before the "SecondStagePostprocessor" block in the topology. With this '
                      'solution the weights of the MatMul or Conv2D nodes are permuted, simulating the swap of XY '
                      'coordinates in the tensor. The second could be used in any other cases but it is worse in terms '
                      'of performance because it adds the Conv2D node which performs permuting of data. Since the '
                      'attribute is not defined the second approach is used by default.')
        else:
            coordinates_swap_method = custom_attributes['coordinates_swap_method']
        supported_swap_methods = ['swap_weights', 'add_convolution']
        if coordinates_swap_method not in supported_swap_methods:
            raise Error('Unsupported "coordinates_swap_method" defined in the sub-graph replacement configuration '
                        'file. Supported methods are: {}'.format(', '.join(supported_swap_methods)))

        if coordinates_swap_method == 'add_convolution':
            swapped_locs_node = add_convolution_to_swap_xy_coordinates(graph, eltwise_locs_node, 4)
            reshape_loc_do_node = reshape_loc_do_op.create_node([swapped_locs_node], dict(name='do_reshape_locs'))
        else:
            reshape_loc_do_node = reshape_loc_do_op.create_node([eltwise_locs_node], dict(name='do_reshape_locs'))

        # find Proposal output which has the data layout as in TF: YXYX coordinates without batch indices.
        proposal_nodes_ids = [node_id for node_id, attrs in graph.nodes(data=True)
                              if 'name' in attrs and attrs['name'] == 'proposals']
        if len(proposal_nodes_ids) != 1:
            raise Error("Found the following nodes '{}' with name 'proposals' but there should be exactly 1. "
                        "Looks like ObjectDetectionAPIProposalReplacement replacement didn't work.".
                        format(proposal_nodes_ids))
        proposal_node = Node(graph, proposal_nodes_ids[0])

        swapped_proposals_node = add_convolution_to_swap_xy_coordinates(graph, proposal_node, 5)

        # reshape priors boxes as Detection Output expects
        reshape_priors_op = Reshape(graph, dict(dim=np.array([1, 1, -1])))
        reshape_priors_node = reshape_priors_op.create_node([swapped_proposals_node],
                                                            dict(name='DetectionOutput_reshape_priors_'))

        detection_output_op = DetectionOutput(graph, {})
        if coordinates_swap_method == 'swap_weights':
            # update infer function to re-pack weights
            detection_output_op.attrs['old_infer'] = detection_output_op.attrs['infer']
            detection_output_op.attrs['infer'] = __class__.do_infer
        detection_output_node = detection_output_op.create_node(
            [reshape_loc_do_node, reshape_conf_node, reshape_priors_node],
            dict(name=detection_output_op.attrs['type'], share_location=0, normalized=0, variance_encoded_in_target=1,
                 clip=1, code_type='caffe.PriorBoxParameter.CENTER_SIZE', pad_mode='caffe.ResizeParameter.CONSTANT',
                 resize_mode='caffe.ResizeParameter.WARP',
                 num_classes=num_classes,
                 input_height=graph.graph['preprocessed_image_height'],
                 input_width=graph.graph['preprocessed_image_width'],
                 confidence_threshold=_value_or_raise(match, pipeline_config, 'postprocessing_score_threshold'),
                 top_k=_value_or_raise(match, pipeline_config, 'postprocessing_max_detections_per_class'),
                 keep_top_k=_value_or_raise(match, pipeline_config, 'postprocessing_max_total_detections'),
                 nms_threshold=_value_or_raise(match, pipeline_config, 'postprocessing_iou_threshold')))
        PermuteAttrs.set_permutation(reshape_priors_node, detection_output_node, None)
        # sets specific name to the node so we can find it in other replacers
        detection_output_node.name = 'detection_output'

        output_op = Output(graph, dict(name='do_OutputOp'))
        output_op.create_node([detection_output_node])

        print('The graph output nodes "num_detections", "detection_boxes", "detection_classes", "detection_scores" '
              'have been replaced with a single layer of type "Detection Output". Refer to IR catalogue in the '
              'documentation for information about this layer.')

        return {'detection_output_node': detection_output_node}

    @staticmethod
    def do_infer(node):
        node.old_infer(node)
        # compared to the IE's DetectionOutput, the TF keeps the locations in YXYX, need to get back to the XYXY
        # for last matmul/Conv2D that operate the locations need to swap the X and Y for output feature weights & biases
        swap_weights_xy(backward_bfs_for_operation(node.in_node(0), ['MatMul', 'Conv2D']))

The differences in conversion are the following:

  • The locations tensor does not contain information about class 0 (background), but Inference Engine DetectionOutput layer expects it. Lines 45-58 append dummy tensor with fake coordinates.
  • The prior boxes tensor are not constant like in SSDs models, so it is not possible to apply the same solution. Instead, the element-wise multiplication is added to scale prior boxes values with [0.1, 0.1, 0.2, 0.2] and the attribute variance_encoded_in_target=1 is set to the DetectionOutput layer (lines 62-74).
  • The X and Y coordinates in the tensor with bounding boxes locations adjustments should be swapped. For some topologies it could be done by updating preceding convolution weights, but if there is no preceding convolutional node, the Model Optimizer inserts convolution node with specific kernel and weights that performs coordinates swap during topology inference.
  • Added marker node of type OpOutput that is used by the Model Optimizer to determine output nodes of the topology. It is used in the dead nodes elimination pass.
Cutting Off Parts of the Topology

You can cut off part of the topology with the --output command-line parameter. Detailed information on why it could be useful is provided in the Cutting Off Parts of a Model topic. The Faster R-CNN models are cut at the end using the sub-graph replacer ObjectDetectionAPIOutputReplacement.

class ObjectDetectionAPIOutputReplacement(FrontReplacementFromConfigFileGeneral):
    """
    This replacer is used to cut-off the network by specified nodes for models generated with Object Detection API.
    The custom attribute for the replacer contains one value for key "outputs". This string is a comma separated list
    of outputs alternatives. Each output alternative is a '|' separated list of node name which could be outputs. The
    first node from each alternative that exits in the graph is chosen. Others are ignored.
    For example, if the "outputs" is equal to the following string:

        "Reshape_16,SecondStageBoxPredictor_1/Conv_3/BiasAdd|SecondStageBoxPredictor_1/Conv_1/BiasAdd"

    then the "Reshape_16" will be an output if it exists in the graph. The second output will be
    SecondStageBoxPredictor_1/Conv_3/BiasAdd if it exist in the graph, if not then
    SecondStageBoxPredictor_1/Conv_1/BiasAdd will be output if it exists in the graph.
    """
    replacement_id = 'ObjectDetectionAPIOutputReplacement'

    def run_before(self):
        return [ObjectDetectionAPIPreprocessorReplacement]

    def transform_graph(self, graph: nx.MultiDiGraph, replacement_descriptions: dict):
        if graph.graph['cmd_params'].output is not None:
            log.warning('User defined output nodes are specified. Skip the graph cut-off by the '
                        'ObjectDetectionAPIOutputReplacement.')
            return
        outputs = []
        outputs_string = replacement_descriptions['outputs']
        for alternatives in outputs_string.split(','):
            for out_node_name in alternatives.split('|'):
                if graph.has_node(out_node_name):
                    outputs.append(out_node_name)
                    break
                else:
                    log.debug('A node "{}" does not exist in the graph. Do not add it as output'.format(out_node_name))
        _outputs = output_user_data_repack(graph, outputs)
        add_output_ops(graph, _outputs, graph.graph['inputs'])

This is a replacer of type "general" which is called just once in comparison with other Front-replacers ("scope" and "points") which are called for each matched instance. The replacer reads node names that should become new output nodes, like specifying --output <node_names>. The only difference is that the string containing node names could contain '|' character specifying output node names alternatives. Detailed explanation is provided in the class description in the code.

The detection_boxes, detection_scores, num_detections nodes are specified as outputs in the faster_rcnn_support.json file. These nodes are used to remove part of the graph that is not be needed to calculate value of specified output nodes.

R-FCN topologies

The R-FCN models are based on Faster R-CNN models so it is highly recommended to read the section about converting them first. Detailed information about R-FCN topologies is provided in the abstract.

Preprocessor Block

R-FCN topologies contain similar Preprocessor block as SSD and Faster R-CNN topologies. The same ObjectDetectionAPIPreprocessorReplacement sub-graph replacer is used to cut it off.

Proposal Layer

Similar to Faster R-CNNs, R-FCN topologies contain implementation of Proposal layer before the SecondStageBoxPredictor block, so ObjectDetectionAPIProposalReplacement replacement is used in the sub-graph replacement configuration file.

SecondStageBoxPredictor block

The SecondStageBoxPredictor block differs from the self-titled block from Faster R-CNN topologies. It contains a number of CropAndResize operations consuming variously scaled boxes generated with a Proposal layer. This block of operations is converted to intermediate representation as is, without using sub-graph replacements.

SecondStagePostprocessor block

The SecondStagePostprocessor block implements functionality of the DetectionOutput layer from the Inference Engine. The ObjectDetectionAPIDetectionOutputReplacement sub-graph replacement is used to replace the block. For this type of topologies the replacer adds convolution node to swap coordinates of boxes in of the 0-th input tensor to the DetectionOutput layer. The custom attribute coordinates_swap_method is set to value add_convolution in the sub-graph replacement configuration file to enable that behaviour. A method (swap_weights) is not suitable for this type of topologies because there are no Mul or Conv2D operations before the 0-th input of the DetectionOutput layer.

Cutting Off Part of the Topology

The R-FCN models are cut at the end with the sub-graph replacer ObjectDetectionAPIOutputReplacement as Faster R-CNNs topologies using the following output node names: detection_boxes.

Mask R-CNN Topologies

The Mask R-CNN models are based on Faster R-CNN models so it is highly recommended to read the section about converting them first. Detailed information about Mask R-CNN topologies is provided in the abstract.

Preprocessor Block

Mask R-CNN topologies contain similar Preprocessor block as SSD and Faster R-CNN topologies. The same ObjectDetectionAPIPreprocessorReplacement sub-graph replacer is used to cut it off.

Proposal and ROI (Region of Interest) Pooling

Proposal and ROI Pooling layers are added to Mask R-CNN topologies like in Faster R-CNNs.

DetectionOutput

Unlike in SSDs and Faster R-CNNs, the implementation of the DetectionOutput layer in Mask R-CNNs topologies is not separated in a dedicated scope. But the matcher is defined with start/end points defined in the mask_rcnn_support.json so the replacer correctly adds the DetectionOutput layer.

One More ROIPooling

There is the second CropAndResize (equivalent of ROIPooling layer) that uses boxes produced with the DetectionOutput layer. The ObjectDetectionAPIMaskRCNNROIPoolingSecondReplacement replacer is used to replace this node:

class ObjectDetectionAPIMaskRCNNROIPoolingSecondReplacement(FrontReplacementFromConfigFileSubGraph):
    replacement_id = 'ObjectDetectionAPIMaskRCNNROIPoolingSecondReplacement'

    def output_edges_match(self, graph: nx.DiGraph, match: SubgraphMatch, new_sub_graph: dict):
        return {match.output_node(0)[0].id: new_sub_graph['roi_pooling_node'].id}

    def generate_sub_graph(self, graph: nx.MultiDiGraph, match: SubgraphMatch):
        argv = graph.graph['cmd_params']
        if argv.tensorflow_object_detection_api_pipeline_config is None:
            raise Error(missing_param_error)
        pipeline_config = PipelineConfig(argv.tensorflow_object_detection_api_pipeline_config)
        roi_pool_size = _value_or_raise(match, pipeline_config, 'initial_crop_size')

        detection_output_nodes_ids = [node_id for node_id, attrs in graph.nodes(data=True)
                                      if 'name' in attrs and attrs['name'] == 'detection_output']
        if len(detection_output_nodes_ids) != 1:
            raise Error("Found the following nodes '{}' with 'detection_output' but there should be exactly 1.".
                        format(detection_output_nodes_ids))
        detection_output_node = Node(graph, detection_output_nodes_ids[0])

        # add reshape of Detection Output so it can be an output of the topology
        reshape_detection_output_2d_op = Reshape(graph, dict(dim=np.array([-1, 7])))
        reshape_detection_output_2d_node = reshape_detection_output_2d_op.create_node(
            [detection_output_node], dict(name='reshape_do_2d'))

        # adds special node of type "Output" that is a marker for the output nodes of the topology
        output_op = Output(graph, dict(name='do_reshaped_OutputOp'))
        output_node = output_op.create_node([reshape_detection_output_2d_node])

        # add attribute 'output_sort_order' so it will be used as a key to sort output nodes before generation of IR
        output_node.in_edge()['data_attrs'].append('output_sort_order')
        output_node.in_edge()['output_sort_order'] = [('detection_boxes', 0)]

        # creates the Crop operation that gets input from the DetectionOutput layer, cuts of slices of data with batch
        # indices and class labels producing a tensor with classes probabilities and bounding boxes only as it is
        # expected by the ROIPooling layer
        crop_op = Crop(graph, dict(axis=np.array([3]), offset=np.array([2]), dim=np.array([5]), nchw_layout=True))
        crop_node = crop_op.create_node([detection_output_node], dict(name='crop_do'))

        # reshape bounding boxes as required by ROIPooling
        reshape_do_op = Reshape(graph, dict(dim=np.array([-1, 5])))
        reshape_do_node = reshape_do_op.create_node([crop_node], dict(name='reshape_do'))

        roi_pooling_op = ROIPooling(graph, dict(method="bilinear", spatial_scale=1,
                                                pooled_h=roi_pool_size, pooled_w=roi_pool_size))
        roi_pooling_node = roi_pooling_op.create_node([match.single_input_node(0)[0].in_node(), reshape_do_node],
                                                      dict(name='ROI_pooling_2'))
        return {'roi_pooling_node': roi_pooling_node}

The Inference Engine DetectionOutput layer implementation produces one tensor with seven numbers for each actual detection:

  • batch index
  • class label
  • class probability
  • x_1 box coordinate
  • y_1 box coordinate
  • x_2 box coordinate
  • y_2 box coordinate.

The boxes coordinates must be fed to the ROIPooling layer, so the Crop layer is added to remove unnecessary part (lines 37-38).

Then the result tensor is reshaped (lines 41-42) and ROIPooling layer is created (lines 44-47).

Mask Tensors Processing

The post-processing part of Mask R-CNN topologies filters out bounding boxes with low probabilities and applies activation function to the rest one. This post-processing is implemented using the Gather operation, which is not supported by the Inference Engine. Special Front-replacer removes this post-processing and just inserts the activation layer to the end. The filtering of bounding boxes is done in the dedicated demo mask_rcnn_demo. The code of the replacer is the following:

class ObjectDetectionAPIMaskRCNNSigmoidReplacement(FrontReplacementFromConfigFileGeneral):
    """
    This replacer is used to convert Mask R-CNN topologies only.
    Adds activation with sigmoid function to the end of the network producing masks tensors.
    """
    replacement_id = 'ObjectDetectionAPIMaskRCNNSigmoidReplacement'

    def run_after(self):
        return [ObjectDetectionAPIMaskRCNNROIPoolingSecondReplacement]

    def transform_graph(self, graph: nx.MultiDiGraph, replacement_descriptions):
        output_node = None
        op_outputs = [n for n, d in graph.nodes(data=True) if 'op' in d and d['op'] == 'OpOutput']
        for op_output in op_outputs:
            last_node = Node(graph, op_output).in_node(0)
            if last_node.name.startswith('SecondStageBoxPredictor'):
                sigmoid_op = Activation(graph, dict(operation='sigmoid'))
                sigmoid_node = sigmoid_op.create_node([last_node], dict(name=last_node.id + '/sigmoid'))
                sigmoid_node.name = 'masks'

                if output_node is not None:
                    raise Error('Identified two possible outputs from the topology. Cannot proceed.')
                # add special node of type "Output" that is a marker for the output nodes of the topology
                output_op = Output(graph, dict(name=sigmoid_node.name + '/OutputOp'))
                output_node = output_op.create_node([sigmoid_node])

        print('The predicted masks are produced by the "masks" layer for each bounding box generated with a '
              '"detection_output" layer.\n Refer to IR catalogue in the documentation for information '
              'about the DetectionOutput layer and Inference Engine documentation about output data interpretation.\n'
              'The topology can be inferred using dedicated demo "mask_rcnn_demo".')

The replacer looks for the output node which name starts with SecondStageBoxPredictor (another node of type OpOutput is located after the DetectionOutput node). This node contains the generated masks. The replacer adds activation layer Sigmoid after this node as it is done in the initial TensorFlow* model.

Cutting Off Parts of the Topology

The Mask R-CNN models are cut at the end with the sub-graph replacer ObjectDetectionAPIOutputReplacement using the following output node names:

SecondStageBoxPredictor_1/Conv_3/BiasAdd|SecondStageBoxPredictor_1/Conv_1/BiasAdd

One of these two nodes produces output mask tensors. The child nodes of these nodes are related to post-processing which is implemented in the Mask R-CNN demo and should be cut off.

Converting a TensorFlow* FaceNet Model

Public pre-trained FaceNet models contain both training and inference parts of the graph. Switching between this two states is performed by using the phase_train value. Intermediate Representation (IR) of the models produced by the Model Optimizer is used by the Inference Engine for inference and this means that the training part is redundant.

There are two inputs in this network:

  • boolean phase_train which manages state of the graph (train/infer)
  • batch_size which is a part of batch joining pattern.

To generate an Intermediate Representation (IR) of the FaceNet model, run the Model Optimizer with the following parameters:

python3 ./mo_tf.py
--input_model path_to_model/model_name.pb       \
--freeze_placeholder_with_value phase_train->False

Batch joining pattern transforms to placeholder with model default shape if --input_shape or --batch/-b was not provided. Otherwise, placeholder shape has custom parameters.

  • --freeze_placeholder_with_value phase_train->False to switch graph to inference mode
  • --batch/-b is applicable to override original network batch
  • --input_shape is applicable with or without --input

Other options are also applicable.

Convert YOLO* Models to the Intermediate Representation (IR)

This tutorial explains how to convert real-time object detection YOLOv1*, YOLOv2*, and YOLOv3* public models to the Intermediate Representation (IR). All YOLO* models are originally implemented in the DarkNet* framework and consist of two files:

  • .cfg file with model configurations
  • .weights file with model weights

Depending on a YOLO model version, the Model Optimizer converts it differently:

  • YOLOv3 has several implementations. This tutorial uses a TensorFlow implementation of YOLOv3 model, which can be directly converted to the IR.
  • YOLOv1 and YOLOv2 models must be first converted to TensorFlow* using DarkFlow*.

Convert YOLOv3 Model to IR

On GitHub*, you can find several public versions of TensorFlow YOLOv3 model implementation. This tutorial explains how to convert YOLOv3 model from the https://github.com/mystic123/tensorflow-yolo-v3 repository (commit fb9f543) to IR , but the process is similar for other versions of TensorFlow YOLOv3 model.

Overview of YOLOv3 Model Architecture

Originally, YOLOv3 model includes feature extractor called Darknet-53 with three branches at the end that make detections at three different scales. These branches must end with the YOLO Region layer.

Region layer was first introduced in the DarkNet framework. Other frameworks, including TensorFlow, do not have the Region implemented as a single layer, so every author of public YOLOv3 model creates it using simple layers. This badly affects performance. For this reason, the main idea of YOLOv3 model conversion to IR is to cut off these custom Region-like parts of the model and complete the model with the Region layers where required.

Dump YOLOv3 TenorFlow* Model

To dump TensorFlow model out of https://github.com/mystic123/tensorflow-yolo-v3 GitHub repository (commit fb9f543), follow the instructions below:

  1. Clone the repository:
      git clone https://github.com/mystic123/tensorflow-yolo-v3.git
      cd tensorflow-yolo-v3
  2. (Optional) Checkout to the commit that the conversion was tested on:
    git checkout fb9f543
  3. Open demo.py file in a text editor and make the following changes:
    • Replace NCHW with NHWC on the line 57.
    • Insert the following lines after the line 64:
      from tensorflow.python.framework import graph_io
          frozen = tf.graph_util.convert_variables_to_constants(sess, sess.graph_def, ['concat_1'])
          graph_io.write_graph(frozen, './', 'yolo_v3.pb', as_text=False)
    • Save all the changes and close the file.
  4. Download the yolov3.weights file from the DarkNet website https://pjreddie.com/media/files/yolov3.weights  OR use your pretrained weights with the same structure as in the yolov3.weights file.
  5. Download coco.names file from the DarkNet GitHub repository https://github.com/pjreddie/darknet/blob/master/data/coco.names  OR use labels that fit your task.
  6. Find an image to use as an input for the model.
  7. From the darkflow directory, run the following command:
    python3 demo.py                      \
      --weights_file <path_to_weights_file>/yolov3.weights    \
      --class_names <path_to_labels_file>/coco.names.txt      \
      --input_img <path_to_image>/<image>               \
      --output_img ./out.jpg

    This command creates the yolo_v3.pb file, which is a TensorFlow representation of the YOLOv3 model, in the darkflow directory.

Convert YOLOv3 TensorFlow Model to the IR

To solve the problems explained in the YOLO V3 architecture overview section, use the yolo_v3.json configuration file with custom operations located in the <OPENVINO_INSTALL_DIR>/deployment_tools/model_optimizer/extensions/front/tf repository.

It consists of several attributes:

[
  {
    "id": "TFYOLOV3",
    "match_kind": "general",
    "custom_attributes": {
      "classes": 80,
      "coords": 4,
      "num": 9,
      "mask": [0, 1, 2],
      "entry_points": ["detector/yolo-v3/Reshape", "detector/yolo-v3/Reshape_4", "detector/yolo-v3/Reshape_8"]
    }
  }
]

where:

  • id and match_kind are parameters that you cannot change.
  • custom_attributes is a parameter that stores all the YOLOv3 specific attributes:
    • classes, coords, num, and mask are attributes that you should copy from the configuration file file that was used for model training. If you used DarkNet officially shared weights, you can use yolov3.cfg configuration file at https://raw.githubusercontent.com/pjreddie/darknet/master/cfg/yolov3.cfg. Replace the default values in custom_attributes with the parameters that follow the [yolo] title in the configuration file.
    • entry_points

To generate the IR of the YOLOv3 TensorFlow model, run:

python3 mo_tf.py
--input_model /path/to/yolo_v3.pb
--tensorflow_use_custom_operations_config $MO_ROOT/extensions/front/tf/yolo_v3.json

The Intel® Distribution of OpenVINO™ toolkit provides a demo that uses YOLOv3 model. For more information, refer to Object Detection YOLO* V3 Demo, Async API Performance Showcase.

Convert YOLOv1 and YOLOv2 Models to the IR

Before converting, choose a YOLOv1 or YOLOv2 model version that best suits your task and download model configuration file and corresponding weight file:

  • from DarkFlow repository: configuration files are stored in the cfg directory, links to weight files are given in the README.md file. The files from this repository are adatped for conversion to TensorFlow using DarkFlow.
  • from DarkNet website and repository: configuration files are stored in the cfg directory of the repository, links to weight files are given on the YOLOv1 and YOLOv2 websites.

To convert DarkNet YOLOv1 and YOLOv2 models to IR, follow the next steps:

  1. Install DarkFlow
  2. Convert DarkNet YOLOv1 or YOLOv2 model to TensorFlow using DarkFlow
  3. Convert TensorFlow YOLOv1 or YOLOv2 model to IR
Install DarkFlow*

You need DarkFlow to convert YOLOv1 and YOLOv2 models to TensorFlow. To install DarkFlow:

  1. Install DarkFlow required dependencies.
  2. Clone DarkFlow git repository:
    git clone https://github.com/thtrieu/darkflow.git
  3. Go to the root directory of the cloned repository:
    cd darkflow
  4. Install DarkFlow using the instructions from the README.md file in the DarkFlow repository.
Convert DarkNet* YOLOv1 or YOLOv2 Model to TensorFlow*

To convert YOLOv1 or YOLOv2 model to TensorFlow, go to the root directory of the cloned DarkFlow repository and run the following command:

python3 ./flow --model <path_to_model>/<model_name>.cfg --load <path_to_model>/<model_name>.weights --savepb

If the model was successfully converted, you can find the <model_name>.meta and <model_name>.pb files in built_graph subdirectory of the cloned DarkFlow repository.

File <model_name>.pb is a TensorFlow representation of the YOLO model.

Convert TensorFlow YOLOv1 or YOLOv2 Model to the IR

Converted TensorFlow YOLO model is missing Region layer and its parameters. Original YOLO Region layer parameters are stored in the configuration <path_to_model>/<model_name>.cfg file under the [region] title.

To recreate the original model structure, use the yolo_v1_v2.jsonconfiguration file with custom operations and Region layer parameters when converting the model to the IR. This file is located in the <OPENVINO_INSTALL_DIR>/deployment_tools/model_optimizer/extensions/front/tf directory.

If chosen model has specific values of this parameters, create another configuration file with custom operations and use it for conversion.

To generate the IR of the YOLOv1 model, provide TensorFlow YOLOv1 or YOLOv2 model to the Model Optimizer with the following parameters:

python3 ./mo_tf.py
--input_model <path_to_model>/<model_name>.pb       \
--batch 1                                       \
--tensorflow_use_custom_operations_config <OPENVINO_INSTALL_DIR>/deployment_tools/model_optimizer/extensions/front/tf/yolo_v1_v2.json

where:

  • --batch/-b or --input_shape defines shape of model input. In the example, --batch is equal to 1, but other integers larger than 1 are also applicable.
  • --tensorflow_use_custom_operations_config adds missing Region layer to the model with the RegionYolo name. For other applicable parameters, refer to Conver Model from TensorFlow.

Converting TensorFlow*-Slim Image Classification Model Library Models

TensorFlow*-Slim Image Classification Model Library is a library to define, train and evaluate classification models in TensorFlow*. The library contains Python scripts defining the classification topologies together with checkpoint files for several pre-trained classification topologies. To convert a TensorFlow*-Slim library model, complete the following steps:

  1. Download the TensorFlow*-Slim models git repository.
  2. Download the pre-trained model checkpoint.
  3. Export the inference graph.
  4. Convert the model using the Model Optimizer.

The Example of an Inception V1 Model Conversion section below illustrates the process of converting an Inception V1 Model.

Example of an Inception V1 Model Conversion

This example demonstrates how to convert the model on Linux* OS, but it could be easily adopted for the Windows* OS.

Step 1. Create a new directory to clone the TensorFlow*-Slim git repository to:

mkdir tf_models
git clone https://github.com/tensorflow/models.git tf_models

Step 2. Download and unpack the Inception V1 model checkpoint file:

wget http://download.tensorflow.org/models/inception_v1_2016_08_28.tar.gz
tar xzvf inception_v1_2016_08_28.tar.gz

Step 3. Export the inference graph - the protobuf file (.pb) containing the architecture of the topology.

NOTE: This file does not contain the neural network weights and cannot be used for inference.

python3 tf_models/research/slim/export_inference_graph.py \
    --model_name inception_v1 \
    --output_file inception_v1_inference_graph.pb

Model Optimizer comes with the summarize graph utility, which identifies graph input and output nodes. Run the utility to determine input/output nodes of the Inception V1 model:

python3 <MODEL_OPTIMIZER_INSTALL_DIR>/mo/utils/summarize_graph.py --input_model ./inception_v1_inference_graph.pb

The output looks as follows:

1 input(s) detected:
Name: input, type: float32, shape: (-1,224,224,3)
1 output(s) detected:
InceptionV1/Logits/Predictions/Reshape_1

The tool finds one input node with name input, type float32, fixed image size (224,224,3) and undefined batch size -1. The output node name is InceptionV1/Logits/Predictions/Reshape_1.

Step 4. Convert the model with the Model Optimizer:

<MODEL_OPTIMIZER_INSTALL_DIR>/mo_tf.py --input_model ./inception_v1_inference_graph.pb --input_checkpoint ./inception_v1.ckpt -b 1 --mean_value [127.5,127.5,127.5] --scale 127.5

The -b command line parameter is required because the Model Optimizer cannot convert a model with undefined input size.

Refer to the Mean and Scale Values for TensorFlow*-Slim Models for the information why --mean_values and --scale command line parameters are used.

Mean and Scale Values for TensorFlow*-Slim Models

The TensorFlow*-Slim Models were trained with normalized input data. There are several different normalization algorithms used in the Slim library. Inference Engine classification sample does not perform image pre-processing except resizing to the input layer size. It is necessary to pass mean and scale values to the Model Optimizer so they are embedded into the generated IR in order to get correct classification results.

The file preprocessing_factory.py contains a dictionary variable preprocessing_fn_map defining mapping between the model type and pre-processing function to be used. The function code should be analyzed to figure out the mean/scale values.

The inception_preprocessing.py file defines the pre-processing function for the Inception models. The preprocess_for_eval function contains the following code:

    ...
    if image.dtype != tf.float32:
      image = tf.image.convert_image_dtype(image, dtype=tf.float32)
    ...
    image = tf.subtract(image, 0.5)
    image = tf.multiply(image, 2.0)
    return image

Firstly, the image is converted to data type tf.float32 and the values in the tensor are scaled to the [0, 1] range using the tf.image.convert_image_dtype function. Then the 0.5 is subtracted from the image values and values multiplied by 2.0. The final image range of values is [-1, 1].

Inference Engine classification sample reads an input image as a three-dimensional array of integer values from the range [0, 255]. In order to scale them to [-1, 1] range, the mean value 127.5 for each image channel should be specified as well as scale factor 127.5.

Similarly, the mean/scale values can be determined for other Slim models.

The exact mean/scale values are defined in the table with list of supported TensorFlow*-Slim models in the Converting a TensorFlow* Model.

Custom Layer Definition

Internally, when you run the Model Optimizer, it loads the model, goes through the topology, and tries to find each layer type in a list of known layers. Custom layers are layers that are not included in the list of known layers. If your topology contains any layers that are not in this list of known layers, the Model Optimizer classifies them as custom. For more information about custom layers, refer to TensorFlow* Models with Custom Layers.

Supported Layers and the Mapping to Intermediate Representation Layers

Some TensorFlow* operations do not match to any Inference Engine layer, but are still supported by the Model Optimizer and can be used on constant propagation path. These layers are labeled 'Constant propagation' in the table.

Standard TensorFlow* operations:

NumberOperation Name in TensorFlowLayer Name in the Intermediate Representation
1TransposePermute
2LRNNorm
3SplitSplit
4SplitVSplit
5FusedBatchNormScaleShift (can be fused into Convolution or FullyConnected)
6Relu6Clamp
7DepthwiseConv2dNativeConvolution
8ExpandDimsConstant propagation
9SliceSplit
10ConcatV2Concat
11MatMulFullyConnected
12PackReshapes and Concat
13StridedSliceConstant propagation and several cases when StridedSlice can be expressed with Splits
14ProdConstant propagation
15ConstConstant propagation
16TileTile
17PlaceholderInput
18PadFused into Convolution or Pooling layers (not supported as single operation)
19Conv2DConvolution
20Conv2DBackpropInputDeconvolution
21IdentityIgnored, does not appear in the IR
22AddEltwise(operation = sum)
23MulEltwise(operation = mul)
24MaximumEltwise(operation = max)
25RsqrtPower(power=-0.5)
26NegPower(scale=-1)
27SubEltwise(operation = sum) + Power(scale=-1)
28ReluReLU
29AvgPoolPooling (pool_method=avg)
30MaxPoolPooling (pool_method=max)
31MeanPooling (pool_method = avg); spatial dimensions are supported only
32RandomUniformNot supported
33BiasAddFused or converted to ScaleShift
34ReshapeReshape
35SqueezeReshape
36ShapeConstant propagation
37SoftmaxSoftMax
38SpaceToBatchNDSupported in a pattern when converted to Convolution layer dilation attribute, Constant propagation
39BatchToSpaceNDSupported in a pattern when converted to Convolution layer dilation attribute, Constant propagation
40StopGradientIgnored, does not appear in IR
41SquareConstant propagation
42SumConstant propagation
43RangeConstant propagation
44CropAndResizeROIPooling (if the the method is 'bilinear')
45ArgMaxArgMax
46DepthToSpaceReshape + Permute + Reshape (works for CPU only because of 6D tensors)
47ExtractImagePatchesReorgYolo
48ResizeBilinearInterp
49ResizeNearestNeighborResample
50UnpackSplit + Reshape (removes dimension being unpacked) if the number of parts is equal to size along given axis
51AddNSeveral Eltwises
52ConcatConcat
53MinimumPower(scale=-1) + Eltwise(operation = min) + Power(scale=-1)
54UnsqueezeReshape
55RealDivPower(power = -1) and Eltwise(operation = mul)
56SquaredDifferencePower(scale = -1) + Eltwise(operation = sum) + Power(power = 2)

See the Model Optimizer Developer Guide for information about:

  • The Model Optimizer's internal procedure for working with custom layers
  • How to convert a TensorFlow model that has custom layers
  • Custom layer implementation details

Frequently Asked Questions (FAQ)

The Model Optimizer provides explanatory messages if it is unable to run to completion due to issues like typographical errors, incorrectly used options, or other issues. The message describes the potential cause of the problem and gives a link to the Model Optimizer FAQ. The FAQ has instructions on how to resolve most issues. The FAQ also includes links to relevant sections in the Model Optimizer Developer Guide to help you understand what went wrong.

Summary

In this document, you learned:

  • Basic information about how the Model Optimizer works with TensorFlow* models
  • Which TensorFlow* models are supported
  • How to freeze a TensorFlow model
  • How to convert a trained TensorFlow* model using the Model Optimizer with both framework-agnostic and TensorFlow*-specific command-line options

 

Legal Information

You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel products described herein. You agree to grant Intel a non-exclusive, royalty-free license to any patent claim thereafter drafted which includes subject matter disclosed herein.

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.

All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest Intel product specifications and roadmaps.

The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.

Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn more at http://www.intel.com/ or from the OEM or retailer.

No computer system can be absolutely secure.

Intel, Arria, Core, Movidia, Pentium, Xeon, and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.

OpenCL and the OpenCL logo are trademarks of Apple Inc. used with permission by Khronos

*Other names and brands may be claimed as the property of others.

Copyright © 2018, Intel Corporation. All rights reserved.

For more complete information about compiler optimizations, see our Optimization Notice.