Model Optimizer Developer Guide

Introduction

Model Optimizer is a cross-platform command-line tool that facilitates the transition between the training and deployment environment, performs static model analysis, and adjusts deep learning models for optimal execution on end-point target devices.

Model Optimizer process assumes you have a network model trained using one of the supported frameworks. The diagram below illustrates the typical workflow for deploying a trained deep learning model:

Intel Computer Vision Basic Workflow

A summary of the steps for optimizing and deploying a trained model:

  1. Configure the Model Optimizer for your framework.
  2. Convert a trained model to produce an optimized Intermediate Representation (IR) of the model based on the trained network topology, weights, and bias values.
  3. Test the model in the Intermediate Representation format using the Inference Engine in the target environment via provided Inference Engine validation application or sample applications.
  4. Integrate the Inference Engine into your application to deploy the model in the target environment. See the Inference Engine Guide.

Model Optimizer Workflow

Model Optimizer process assumes you have a network model that was trained with a supported framework. The workflow is:

  1. Configure the Model Optimizer for the framework that was used to train the network. To perform this configuration, use the configuration bash script for Linux* OS, or the batch file for Windows* OS. The script and batch file are in: <INSTALL_DIR>/deployment_tools/model_optimizer/install_prerequisites
    • For Linux* OS:
      install_prerequisites.sh
    • For Windows* OS:
      install_prerequisites.bat
    For more information about configuring the Model Optimizer, see Configuring the Model Optimizer.
  2. Provide as input a trained model that contains a specific topology and the adjusted weights and biases described in the framework-specific files.
  3. Convert the trained model to an optimized Intermediate Representation.

Model Optimizer produces an Intermediate Representation (IR) of the network as output. The Inference Engine reads, loads, and infers the Intermediate Representation. The Inference Engine API offers a unified API across supported Intel® platforms. Intermediate Representation is a pair of files that describe the whole model:

  • .xml: Describes the network topology
  • .bin: Contains the weights and biases binary data

Configuring the Model Optimizer

You must configure the Model Optimizer for the framework that was used to train the model. This section tells you how to configure the Model Optimizer either through scripts or by using a manual process.

Using Configuration Scripts

You can either configure all three frameworks at the same time, or install an individual framework. The scripts install all required dependencies and provide the fastest and easiest way to configure the Model Optimizer.

To configure all three frameworks, go to the <INSTALL_DIR>/deployment_tools/model_optimizer/install_prerequisites directory and run:

  • For Linux* OS:
    install_prerequisites.sh
  • For Windows* OS:
    install_prerequisites.bat

To configure a specific framework: Go to the <INSTALL_DIR>/model_optimizer/install_prerequisites directory and run:

  • For Caffe* on Linux:
    install_prerequisites_caffe.sh
  • For Caffe on Windows:
    install_prerequisites_caffe.bat
  • For TensorFlow* on Linux:
    install_prerequisites_tf.sh
  • For TensorFlow on Windows:
    install_prerequisites_tf.bat
  • For MXNet* on Linux:
    install_prerequisites_mxnet.sh
  • For MXNet on Windows:
    install_prerequisites_mxnet.bat
  • For Kaldi* on Linux:
    install_prerequisites_kaldi.sh
  • For Kaldi on Windows:
    install_prerequisites_kaldi.bat
  • For ONNX* on Linux:
    install_prerequisites_onnx.sh
  • For ONNX on Windows:
    install_prerequisites_onnx.bat

Caffe Note: By default, you do not need to install Caffe to create an Intermediate Representation for a Caffe model unless you use Caffe for custom layer shape inference and do not write Model Optimizer extensions. To learn more about implementing Model Optimizer custom operations and the limitations of using Caffe for shape inference, see Caffe Models with Custom Layers.

TensorFlow Note: To offload part of the inference to the TensorFlow framework, additional configuration steps are required.

Using Manual Configuration Process

If you prefer, you can manually configure the Model Optimizer for one framework at a time.

  1. Go to the Model Optimizer directory:
    cd <INSTALL_DIR>/deployment_tools/model_optimizer/
  2. Strongly recommended for all global Model Optimizer dependency installations: Create and activate a virtual environment. While not required, this step is strongly recommended since the virtual environment creates a Python* sandbox, and dependencies for the Model Optimizer do not influence the global Python configuration, installed libraries, or other components. In addition, a flag ensures that system-wide Python libraries are available in this sandbox. Skip this step only if you do want to install all the Model Optimizer dependencies globally:
    • Create a virtual environment:
      virtualenv -p /usr/bin/python3.5 .env3 --system-site-packages
    • Activate the virtual environment:
      virtualenv -p /usr/bin/python3.5 .env3/bin/activate
  3. Install all dependencies or only the dependencies for a specific framework:
    • To install dependencies for all frameworks:
      pip3 install -r requirements.txt
    • To install dependencies only for Caffe:
      pip3 install -r requirements_caffe.txt
    • To install dependencies only for TensorFlow:
      pip3 install -r requirements_tensorflow.txt
    • To install dependencies only for MXNet:
      pip3 install -r requirements_mxnet.txt
    • To install dependencies only for Kaldi:
      pip3 install -r requirements_kaldi.txt
    • To install dependencies only for ONNX:

      pip3 install -r requirements_onnx.txt

Using the protobuf Library in the Model Optimizer for Caffe*

These procedures require:

  • Access to GitHub and the ability to use git commands
  • Microsoft Visual Studio* 2013 for Win64*
  • C/C++

Model Optimizer uses the protobuf library to load trained Caffe* models. By default, the library executes the pure Python* language implementation, which is slow. These steps implement the faster, C implementation of the protobuf library on Windows* OS or Linux* OS.

Using the protobuf Library on Linux* OS

To use the C++ implementation of the protobuf library on Linux, it is enough to set up the environment variable:

export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=cpp

Using the protobuf Library on Windows* OS

On Windows, pre-built protobuf packages for Python versions 3.5, 3.6, and 3.7 are provided with the installation package and can be found in the <INSTALL_DIR>\deployment_tools\model_optimizer\install_prerequisites folder. Please note that they are not installed with the install_prerequisites.bat installation script due to possible issues with pip, and you can install them at your own discretion. Make sure that you install the protobuf version that matches the Python version you use:

  • protobuf-3.5.1-py3.5-win-amd64.egg for Python 3.5
  • protobuf-3.5.1-py3.6-win-amd64.egg for Python 3.6
  • protobuf-3.5.1-py3.7-win-amd64.egg for Python 3.7

To install the protobuf package:

  1. Open the command prompt as administrator.
  2. Go to the install_prerequisites folder of the Intel Distribution of OpenVINO toolkit installation directory:
    cd <INSTALL_DIR>\deployment_tools\model_optimizer\install_prerequisites
    
  3. Run the following command to install the protobuf for Python 3.6. If you want to install the protobuf for Python 3.5 or 3.7, replace protobuf-3.5.1-py3.6-win-amd64.egg with the corresponding file name from the list above.
    python -m easy_install protobuf-3.5.1-py3.6-win-amd64.egg
    

NOTE: If the Python version you use is lower than 3.5, you need to update it or build the library manually, as the minimum required version is 3.5.

Building the protobuf Library on Windows* OS

NOTE: These steps are optional. If you use Python version 3.5, 3.6, or 3.7, you can install the protobuf library using the pre-built packages.

  1. Clone protobuf source files from GitHub:
    git clone https://github.com/google/protobuf.git
    cd protobuf
  2. Create a Visual Studio solution file. Run these commands:
    cd C:\Path\to\protobuf\cmake\build
    mkdir solution
    cd solution C:\Path\to\protobuf\cmake\build\solution
    cmake -G "Visual Studio 12 2013 Win64" ../..
  3. Change the runtime library option for libprotobuf and libprotobuf-lite:
    1. Open the project's Property Pages dialog box
    2. Expand the C/C++ tab
    3. Select the Code Generation property page
    4. Change the Runtime Library property to Multi-thread DLL (/MD)
  4. Build the libprotoc, protoc, libprotobuf, and libprotobuf-lite projects in the Release configuration.
  5. Add a path to the build directory to the PATH environment variable:
    set PATH=%PATH%;C:\Path\to\protobuf\cmake\build\solution\Release
  6. Go to the python directory:
    cd C:\Path\to\protobuf\python
  7. Use a text editor to open and change these setup.py options:
    • Change from ​libraries = ['protobuf']
      to libraries = ['libprotobuf', 'libprotobuf-lite']
    • Change from extra_objects = ['../src/.libs/libprotobuf.a', '../src/.libs/libprotobuf-lite.a']
      to extra_objects = ['../cmake/build/solution/Release/libprotobuf.lib', '../cmake/build/solution/Release/libprotobuf-lite.lib']
  8. Build the Python* package with the CPP implementation:
    python setup.py build –cpp_implementation
  9. Install the Python package with the CPP implementation:
    python -m easy_install dist/protobuf-3.5.1-py3.6-win-amd64.egg
  10. Set an environment variable to boost the protobuf performance:
    set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=cpp

Preparing and Optimizing Your Trained Model

Inference Engine enables deploying your network model trained with any of supported deep learning frameworks: Caffe*, TensorFlow*, MXNet*, Kaldi*, or converted to the ONNX* format. To perform the inference, the Inference Engine does not operate with the original model, but with its Intermediate Representation (IR), which is optimized for execution on end-point target devices. To generate an IR for your trained model, the Model Optimizer tool is used.

Converting a Model to Intermediate Representation (IR)

Use the mo.py script from the <INSTALL_DIR>/deployment_tools/model_optimizer directory to run the Model Optimizer and convert the model to the Intermediate Representation (IR). The simplest way to convert a model is to run mo.py with a path to the input model file:

python3 mo.py --input_model <INPUT_MODEL>

NOTE: Some models require using additional arguments to specify conversion parameters, such as --scale, --scale_values, --mean_values, --mean_file. To learn about when you need to use these parameters, refer to Using Framework-Agnostic Conversion Parameters.

The mo.py script is a universal entry point that can deduce the framework that has produced the input model by a standard extension of the model file:

  • .caffemodel - Caffe models
  • .pb - TensorFlow models
  • .params - MXNet models
  • .onnx - ONNX models
  • .nnet - Kaldi models.

If the model files do not have standard extensions, you can use the --framework {tf,caffe,kaldi,onnx,mxnet} option to specify the framework type explicitly.

For example, the following commands are equivalent:

python3 mo.py --input_model /user/models/model.pb
python3 mo.py --framework tf --input_model /user/models/model.pb

To adjust the conversion process, you may use general parameters defined in the Converting a Model Using General Conversion Parameters and framework-specific parameters for:

How the Model Optimizer Works

Model Optimizer loads a model into memory, reads it, builds the internal representation of the model, optimizes it, and produces the Intermediate Representation. Intermediate Representation is the only format the Inference Engine accepts.

NOTE: Model Optimizer does not infer models. Model Optimizer is an offline tool that runs before the inference takes place.

Model Optimizer has two main purposes:

  • Produce a valid Intermediate Representation. If this main conversion artifact is not valid, the Inference Engine cannot run. The primary responsibility of the Model Optimizer is to produce the two files (.xml and .bin) that form the Intermediate Representation.
  • Produce an optimized Intermediate Representation. Pretrained models contain layers that are important for training, such as the Dropout layer. These layers are useless during inference and might increase the inference time.
    In many cases, these layers can be automatically removed from the resulting Intermediate Representation. However, if a group of layers can be represented as one mathematical operation, and thus as a single layer, the Model Optimizer recognizes such patterns and replaces these layers with one. The result is an Intermediate Representation that has fewer layers than the original model. This decreases the inference time.

To produce a valid Intermediate Representation, the Model Optimizer must be able to read the original model layers, handle their properties and represent them in Intermediate Representation format, while maintaining validity of the resulting Intermediate Representation.

For example, according to the catalog of Intermediate Representation layers, every layer must have an output. The layer output is represented in the Intermediate Representation by the output blob dimensions.

What You Need to Know About Your Model

Many common layers exist across known frameworks and neural network topologies. Examples of these layers are Convolution, Pooling, and Activation. To read the original model and produce the Intermediate Representation of a model, the Model Optimizer must be able to work with these layers.

The layer list varies by framework. For full lists of supported layers for each framework, refer to the following pages:

If your topology contains only layers from the list of layers, as is the case for the topologies used by most users, the Model Optimizer easily creates the Intermediate Representation. After that you can proceed to working with the Inference Engine.

However, if you use a topology with layers that are not recognized by the Model Optimizer out of the box, see Custom Layers in the Model Optimizer to learn how to work with custom layers.

Model Optimizer Directory Structure

After installation with Intel® Distribution of OpenVINO™ toolkit or Intel® Deep Learning Deployment Toolkit, the Model Optimizer folder has the following structure:

|-- model_optimizer
    |-- extensions
        |-- front/caffe
            |-- CustomLayersMapping.xml.example - example of file for registering custom Caffe layers (compatible with the 2017R3 release)
    |-- mo
        |-- back - Back-End logic: contains IR emitting logic
        |-- front - Front-End logic: contains matching between Framework-specific layers and IR specific, calculation
        of output shapes for each registered layer
        |-- graph - Graph utilities to work with internal IR representation
        |-- middle - Graph transformations - optimizations of the model
        |-- pipeline - Sequence of steps required to create IR for each framework
        |-- utils - Utility functions
    |-- tf_call_ie_layer - Source code that enables TensorFlow fallback in Inference Engine during model inference
    |-- mo.py - Centralized entry point that can be used for any supported framework
    |-- mo_caffe.py - Entry point particularly for Caffe
    |-- mo_kaldi.py - Entry point particularly for Kaldi
    |-- mo_mxnet.py - Entry point particularly for MXNet
    |-- mo_onnx.py - Entry point particularly for ONNX
    |-- mo_tf.py - Entry point particularly for TensorFlow
    |-- ModelOptimizer - Entry point particularly for Caffe that contains same CLI as 2017R3 publicly released Model Optimizer

Custom Layers in the Model Optimizer

Model Optimizer searches for each layer of the input model in the list of known layers before building the model's internal representation, optimizing the model, and producing the Intermediate Representation.

The list of known layers is different for each of supported frameworks. To see the layers supported by your framework, refer to the following pages:

Custom layers are layers that are not included into a list of known layers. If your topology contains any layers that are not in the list of known layers, the Model Optimizer classifies them as custom.

Caffe* Models with Custom Layers

You have two options if your Caffe* model has custom layers:

  • Register the custom layers as extensions to the Model Optimizer. For instructions, see Extending the Model Optimizer with New Primitives. When your custom layers are registered as extensions, the Model Optimizer generates a valid and optimized Intermediate Representation. You only need to write a small chunk of Python* code that lets the Model Optimizer:
    • Generate a valid Intermediate Representation according to the rules you specified
    • Be independent from the availability of Caffe on your computer
  • Register the custom layers as Custom and use the system Caffe to calculate the output shape of each Custom Layer, which is required by the Intermediate Representation format. For this method, the Model Optimizer requires the Caffe Python interface on your system. When registering the custom layer in the CustomLayersMapping.xml file, you can specify if layer parameters should appear in Intermediate Representation or if they should be skipped. To read more about the expected format and general structure of this file, see Legacy Mode for Caffe* Custom Layers. This approach has several limitations:
    • If your layer output shape depends on dynamic parameters, input data or previous layers parameters, calculation of output shape of the layer via Caffe can be incorrect. In this case, you need to patch Caffe on your own.
    • If the calculation of output shape of the layer via Caffe fails inside the framework, Model Optimizer is unable to produce any correct Intermediate Representation and you also need to investigate the issue in the implementation of layers in the Caffe and patch it.
    • You are not able to produce Intermediate Representation on any machine that does not have Caffe installed. If you want to use Model Optimizer on multiple machines, your topology contains Custom Layers and you use CustomLayersMapping.xml to fallback on Caffe, you need to configure Caffe on each new machine.
    For these reasons, it is best to use the Model Optimizer extensions for Custom Layers: you do not depend on the framework and fully control the workflow.

If your model contains Custom Layers, it is important to understand the internal workflow of Model Optimizer. Consider the following example.

Example:

The network has:

  • One input layer (#1)
  • One output Layer (#5)
  • Three internal layers (#2, 3, 4)

The custom and standard layer types are:

  • Layers 2 and 5 are implemented as Model Optimizer extensions
  • Layers 1 and 4 are supported in Model Optimizer out-of-the box
  • Layer 3 is neither in the list of supported layers nor in extensions, but is specified in CustomLayersMapping.xml

NOTE: If any of the layers are not in one of three categories described above, the Model Optimizer fails with an appropriate message and a link to the corresponding question in the Model Optimizer FAQ.

The general process is as shown:

Example custom layer network

  1. The example model is fed to the Model Optimizer that loads the model with the special parser, built on top of caffe.proto file. In case of failure, Model Optimizer asks you to prepare the parser that can read the model. For more information, refer to Model Optimizer FAQ #1.
  2. Model Optimizer extracts the attributes of all layers. In particular, it goes through the list of layers and attempts to find the appropriate extractor. In order of priority, Model Optimizer checks if the layer is:
    • Registered in CustomLayersMapping.xml
    • Registered as a Model Optimizer extension
    • Registered as a standard Model Optimizer layer

    When the Model Optimizer finds a satisfying condition from the list above, it extracts the attributes according to the following rules:

    • For bullet #1 - either takes all parameters or no parameters, according to the content of CustomLayersMapping.xml
    • For bullet #2 - takes only the parameters specified in the extension
    • For bullet #3 - takes only the parameters specified in the standard extractor
  3. Model Optimizer calculates the output shape of all layers. The logic is the same as it is for the priorities. Important: the Model Optimizer always takes the first available option.
  4. Model Optimizer optimizes the original model and produces the Intermediate Representation.

Extending the Model Optimizer with New Primitives

This section explains how to register a custom layer in the Model Optimizer, including how to register Proposal as a custom layer. This section also demonstrates how Proposal works as a custom layer.

Model Optimizer loads the model, goes through the topology, and tries to find each layer type in the list of known layers. If the Model Optimizer does not find a layer in that list, it looks for the layer in the list of custom layers. If the Model Optimizer fails to find the layer among the defined custom layers, it registers a Caffe fallback for the output shape inference. If the Model Optimizer does not find Caffe and cannot infer shapes, the Model Optimizer fails with an appropriate message.

You must know two things about custom layers with the Model Optimizer:

  • How to map a subgraph in a FW model to a subgraph consisting of Inference Engine layers. For Caffe, the subgraph is a one-to-one mapping of a Caffe layer to an Inference Engine layer.
  • How to infer shapes for unknown subgraphs. This can be either for a step in which the internal representation consists of framework-specific layers, or for a step in which the internal representation consists of Inference Engine layers.

You also have the option of a framework fallback for unknown subgraphs, for when the original framework is used for inference of output shapes of operations. The example below demonstrates the case in which the framework is not available or should not be used.

Preparing an Example Topology

NOTE: Skip this section if you have a topology with a layer that is not known to the Model Optimizer.

The information in this section prepares a Caffe* model with the provided deployment-ready prototxt for a well-known topology called Faster-R-CNN to demonstrate the workflow. To use this example, you must have weights and biases for inference.

  1. Download the .caffemodel file
  2. Run the Model Optimizer on the .caffemodel file:
    python mo.py --input_model ZF_faster_rcnn_final.caffemodel --input_proto test.prototxt
    You will likely see the error message:
    Error parsing text-format caffe.NetParameter: 196:16: Message type "caffe.DropoutParameter" has no field named "scale_train".
    Whether you see the error depends on your Caffe version. For example, BVLC Caffe does not support the boolean parameter scale_train for the dropout layer. The error message does not matter because the dropout layer is needed only for training, and the Model Optimizer removes it.
  3. Comment out these lines in test.prototxt:
    ...
    layer {
      name: "drop6"
      type: "Dropout"
      bottom: "fc6"
      top: "fc6"
      dropout_param {
        dropout_ratio: 0.5
        # scale_train: false # <-------------- comment out this line
      }
    }
    ...
    layer {
      name: "drop7"
      type: "Dropout"
      bottom: "fc7"
      top: "fc7"
      dropout_param {
        dropout_ratio: 0.5
        # scale_train: false # <-------------- comment out this line
      }
    }
    ...
  4. Run the Model Optimizer on this model again:
    python mo.py --input_model ZF_faster_rcnn_final.caffemodel --input_proto test.prototxt
    

You will see the message:

[ ERROR ]  Found custom layer proposal. Model Optimizer does not support this layer.
Please, register it in CustomLayersMapping.xml or implement extension.
For more information please refer to Model Optimizer FAQ, question #45.

This message means the Model Optimizer can load the model, but is unable to infer the shape and handle the custom layer properties.

Registering a Custom Layer as a Model Optimizer Extension

In the following sections, you will learn how to make the Model Optimizer independent from Caffe* when processing a model that has a custom layer. In this example, the custom layer is referred to as the Proposal layer.

Use this section to implement the mapping rules for the Proposal layer attributes and the output shape calculation. As part of these steps, you must first create a class for the Proposal layer and inherit it from general-purpose Op that defines the interface of every new custom layer.

In this section, it is important to understand the Op class and its function. The implementation of this class shows that it expects a graph and attributes to be passed when initializing. The graph and attributes are in <INSTALL_DIR>/deployment_tools/model_optimizer/mo/ops/op.py

Op keeps the attributes for each operation and contains logic for handling node creation for internal model representation. Op is responsible for dumping each particular operation to the .xml format for the Intermediate Representation. By inheriting from it, the technical items are complete and you concentrate on the specificity of this layer: the attributes it supports and the rules on computing its output shape.

Follow these steps:

  1. Create the file python_proposal.py in the directory <INSTALL_DIR>/deployment_tools/model_optimizer/extensions/ops:
    from mo.ops.op import Op
    class PythonProposalOp(Op):
        pass
  2. Define the name of the operation and make a stub constructor:
    from mo.ops.op import Op
    class PythonProposalOp(Op):
        op = 'Python'
        def __init__(self, graph, attrs):
            super().__init__(graph)
  3. Every Op must have three specific fields defined: type, op, and infer. In most cases, the type and op names are the same, and infer is defined as a function to compute the output shape. Reflect these fields in your constructor:
    from mo.ops.op import Op
    class PythonProposalOp(Op):
        op = 'Python'
        def __init__(self, graph, attrs):
            mandatory_props = {
                'type': __class__.op,
                'op': __class__.op,
                'infer': None
            }
            super().__init__(graph, mandatory_props, attrs)
    According to the Intermediate Representation catalog, Proposal has the attributes:
    • pre_nms_topn
    • post_nms_topn
    • nms_thresh
    • feat_stride
    • min_size
    • base_size
    • ratio
    • scale
  4. In defining supported attribute names, it is best to use the same names as in the original models. The names are similar to parameters and have no connection with the model layer properties. For clarity, you can use the name my_ratio for ratio. Other than defining the list of supported parameters, you can define only the parameters that appear in the Intermediate Representation in the backend_attrs method.
    Define your attributes:
    class PythonProposalOp(Op):
        # ... constructor
         def supported_attrs(self):
                return [
                    'pre_nms_topn',
                    'post_nms_topn',
                    'nms_thresh',
                    'feat_stride',
                    'min_size',
                    'base_size',
                    'ratio',
                    'scale'
                ]
  5. Model Optimizer now knows how to create the layer called Proposal when it is in the topology and what attributes this layer has. However, the Model Optimizer does not know how to calculate the output shape of this operation. Define a rule to calculate the output shape:
    import numpy as np
    from mo.graph.graph import Node
    from mo.ops.op import Op
    class PythonProposalOp(Op):
       def __init__(self, graph, attrs):
           mandatory_props = {
               'type': __class__.op,
               'op': __class__.op,
               'infer': PythonProposalOp.calculate_output_shape
           }
           super().__init__(graph, mandatory_props, attrs)
        # ... supported attrs
        @staticmethod
        def calculate_output_shape(node: Node):
            node.out_node().shape = (1, 1, 1, 1) # any Proposal now has always the same output
  6. According to the Intermediate Representation catalog, Proposal has the following output calculation formula, where shape dynamically depends on the post_nms_topn parameter.
    Implement the output calculation formula in Python*:
    import numpy as np
    class PythonProposalOp(Op):
        # ... static fields
        # ... constructor
        # ... supported attrs
        @staticmethod
        def calculate_output_shape(node: Node):
            input_shape = node.in_node(0).shape
            out_shape = np.array([0, 0], dtype=np.int64)
            # rois blob: holds R regions of interest, each is a 5 - tuple
            # (n, x1, y1, x2, y2) specifying an image batch index n and a
            # rectangle(x1, y1, x2, y2)
            out_shape[0] = input_shape[0] * node.post_nms_topn
            out_shape[1] = 5
            node.out_node(0).shape = out_shape
    The node does not contain this parameter because it should be initialized in the constructor and in other parameters. The Inference Engine contains the implementation of a Caffe-like Proposal layer and works well with the default values from caffe.proto:
    // Message that stores parameters used by ProposalLayer message ProposalParameter { optional uint32 feat_stride = 1 [default = 16]; optional uint32 base_size = 2 [default = 16]; optional uint32 min_size = 3 [default = 16]; repeated float ratio = 4; repeated float scale = 5; optional uint32 pre_nms_topn = 6 [default = 6000]; optional uint32 post_nms_topn = 7 [default = 300]; optional float nms_thresh = 8 [default = 0.7]; }
  7. Change the constructor as follows:
    class PythonProposalOp(Op):
        # ... static fields
        def __init__(self, graph, attrs):
            mandatory_props = {
                'type': __class__.op,
                'op': __class__.op,
                'feat_stride': 16,
                'base_size': 16,
                'min_size': 16,
                'ratio': [0.5, 1, 2],
                'scale': [8, 16, 32],
                'pre_nms_topn': 6000,
                'post_nms_topn': 300,
                'nms_thresh': 0.7,
                'infer': PythonProposalOp.calculate_output_shape
            }
            super().__init__(graph, mandatory_props, attrs)
        # ... supported attrs
        # ... calculate output shape

Summary

In this section, you implemented support for a custom layer with type Python that is Proposal layer in the topology. You learned how to calculate output shape of this layer.

The values of attributes are hardcoded, and in the next section you will learn how to extract these values from original framework model.

Registering Rules to Pass Extension Layer Properties from a Caffe* Model to the Intermediate Representation

Model Optimizer now knows how to set the shape of the PythonProposalOp operation, but it is incorrect to initialize attributes with the same values for every operation. Instead, the values should be extracted from the original topology. Model Optimizer does not know how to map the custom layer properties to the PythonProposalOp. For this, you must register the FrontExtractorOp instance.

NOTE: This step is required only if the layer requires parameters from the original model.

  1. Create the file python_proposal_ext.py in the folder <INSTALL_DIR>/deployment_tools/model_optimizer/extensions/front/caffe:
    from mo.front.extractor import FrontExtractorOp
    class PythonProposalFrontExtractor(FrontExtractorOp):
        pass
  2. Specify the operation that the extractor refers to and a specific flag. The flag represents whether the operation should be used by the Model Optimizer or should be excluded from processing:
    from mo.front.extractor import FrontExtractorOp
    class PythonProposalFrontExtractor(FrontExtractorOp):
        op = 'Python'
        enabled = True
  3. Register a mapping rule between the original model and the PythonProposalOp attributes, by overriding the following function:
    from mo.front.extractor import FrontExtractorOp
    from mo.ops.op import Op
    class PythonProposalFrontExtractor(FrontExtractorOp):
        op = 'Python'
        enabled = True
        @staticmethod
        def extract(node):
            proto_layer = node.pb
            param = proto_layer.python_param # each layer has a specific parameter, take a look at caffe.proto
            python_params = str(param.param_str) # for Python layers, all params are in param_str
            attrs = {
                'feat_stride': int(python_params.split(':')[-1])
            }
            # update the attributes of the node
            Op.get_op_class_by_name(__class__.op).update_node_stat(node, attrs)
            return __class__.enabled
    You have successfully extracted the parameter feat_stride from prototxt, assuming it is the only parameter in this layer.
  4. To increase the implementation's flexibility:
    import ast
    from mo.front.extractor import FrontExtractorOp
    from mo.ops.op import Op
    class PythonProposalFrontExtractor(FrontExtractorOp):
        op = 'Python'
        enabled = True
        @staticmethod
        def extract(node):
            proto_layer = node.pb
            param = proto_layer.python_param
            attrs = PythonProposalFrontExtractor.parse_param_str(str(param.param_str))
            # update the attributes of the node
            Op.get_op_class_by_name(__class__.op).update_node_stat(node, attrs)
            return __class__.enabled
        @staticmethod
        def parse_param_str(param_str: str):
            if param_str[0] != '{' and param_str[-1] != '}':
                param_str = '{' + param_str + '}'
            return ast.literal_eval(param_str)
    You can successfully convert the model. Open the .xml file and view your code:
    ...
    <layer id="42" name="proposal" precision="FP32" type="Python">
        <data base_size="16" feat_stride="16" min_size="16" nms_thresh="0.7" post_nms_topn="300" pre_nms_topn="6000" ratio="[0.5, 1, 2]" scale="[8, 16, 32]"/>
        <input>
            <port id="0">
                <dim>1</dim>
                <dim>18</dim>
                <dim>15</dim>
                <dim>15</dim>
            </port>
            <port id="1">
                <dim>1</dim>
                <dim>36</dim>
                <dim>15</dim>
                <dim>15</dim>
            </port>
            <port id="2">
                <dim>1</dim>
                <dim>3</dim>
            </port>
        </input>
        <output>
            <port id="3">
                <dim>300</dim>
                <dim>5</dim>
            </port>
        </output>
    </layer>
    ...

Look at the output shape of the custom layer you implemented. The shape was calculated according to the rules specified in PythonProposalOp. The ratio and scale properties have the value [0.5, 1, 2] and [8, 16, 32]. They have square brackets because they are originally a repeated parameter. You converted the parameter to a list in PythonProposalOp. Model Optimizer cast the value to a string. According to Python* rules, a list has a string representation of opening and closing square brackets and values joined by commas.

This is not a valid notation for the Intermediate Representation specification, because repeated parameters must be separated by a comma but without the brackets. Therefore, you must override the Model Optimizer default behavior regarding how it handles those parameters during the Intermediate Representation emitting stage after the optimizations are complete. To do so, implement backend_attrs() in the PythonProposalOp class:

class PythonProposalOp(Op):
    ... other methods
     def backend_attrs(self) -> list:
            """
            Gets list of attributes that should appear in resulting IR
            Returns:
                list of attributes names or list of tuples (name of attribute, pre-processing rule)
            """
            return [
                (  # a tuple per attribute
                    'ratio',  # name of attribute
                    # pre-processing rule in a form of lambda
                    # lambda takes a PythonProposalOp node with all defined properties
                    # it translates [1,2,3] -> "1,2,3"
                    lambda node: ','.join(map(str, node['ratio']))
                ),
                (
                    'scale',
                    lambda node: ','.join(map(str, node['scale']))
                ),
                'feat_stride',
                'base_size',
                'min_size',
                'pre_nms_topn',
                'post_nms_topn',
                'nms_thresh'
            ]

The model can now be successfully converted.

Open the .xml file. ratio and scale have the expected correct values 0.5,1,2 and 8,16,32.

NOTE: Model Optimizer supports the Faster-R-CNN topology. Run the following command for the same Intermediate Representation:

python mo.py --input_model ZF_faster_rcnn_final.caffemodel --input_proto test.prototxt --extensions <INSTALL_DIR>/deployment_tools/inference-engine/samples/object_detection_sample/fasterrcnn_extensions

Summary

In this section you learned how to:

  1. Create a framework-independent extension implementation of the Intermediate Representation custom layer with unified logic for calculating output shapes, specified set of attributes
  2. Use the Framework-Specific property extractor to map original model custom layer properties to the expected properties of the Framework-Independent extension
  3. Manipulate the custom layer properties representation in the resulting Intermediate Representation

Files used in this section:

  • <INSTALL_DIR>/deployment_tools/model_optimizer/extensions/ops/python_proposal.py:
    import numpy as np
    from mo.graph.graph import Node
    from mo.ops.op import Op
    class PythonProposalOp(Op):
        op = 'Python'
        def __init__(self, graph, attrs):
            mandatory_props = {
                'type': __class__.op,
                'op': __class__.op,
                'feat_stride': 16,
                'base_size': 16,
                'min_size': 16,
                'ratio': [0.5, 1, 2],
                'scale': [8, 16, 32],
                'pre_nms_topn': 6000,
                'post_nms_topn': 300,
                'nms_thresh': 0.7,
                'infer': PythonProposalOp.calculate_output_shape
            }
            super().__init__(graph, mandatory_props, attrs)
        def supported_attrs(self):
            return [
                'pre_nms_topn',
                'post_nms_topn',
                'nms_thresh',
                'feat_stride',
                'min_size',
                'base_size',
                'ratio',
                'scale'
            ]
        def backend_attrs(self) -> list:
            """
            Gets list of attributes that should appear in resulting IR
            Returns:
                list of attributes names or list of tuples (name of attribute, pre-processing rule)
            """
            return [
                (  # a tuple per attribute
                    'ratio',  # name of attribute
                    # pre-processing rule in a form of lambda
                    # lambda takes a PythonProposalOp node with all defined properties
                    # it translates [1,2,3] -> "1,2,3"
                    lambda node: ','.join(map(str, node['ratio']))
                ),
                (
                    'scale',
                    lambda node: ','.join(map(str, node['scale']))
                ),
                'feat_stride',
                'base_size',
                'min_size',
                'pre_nms_topn',
                'post_nms_topn',
                'nms_thresh'
            ]
        @staticmethod
        def calculate_output_shape(node: Node):
            input_shape = node.in_node(0).shape
            out_shape = np.array([0, 0], dtype=np.int64)
            # rois blob: holds R regions of interest, each is a 5 - tuple
            # (n, x1, y1, x2, y2) specifying an image batch index n and a
            # rectangle(x1, y1, x2, y2)
            out_shape[0] = input_shape[0] * node.post_nms_topn
            out_shape[1] = 5
            node.out_node(0).shape = out_shape
  • <INSTALL_DIR>/deployment_tools/model_optimizer/extensions/front/caffe/python_proposal_ext.py:
    import ast
    from mo.front.extractor import FrontExtractorOp
    from mo.ops.op import Op
    class PythonProposalFrontExtractor(FrontExtractorOp):
        op = 'Python'
        enabled = True
        @staticmethod
        def extract(node):
            proto_layer = node.pb
            param = proto_layer.python_param
            attrs = PythonProposalFrontExtractor.parse_param_str(str(param.param_str))
            # update the attributes of the node
            Op.get_op_class_by_name(__class__.op).update_node_stat(node, attrs)
            return __class__.enabled
        @staticmethod
        def parse_param_str(param_str: str):
            if param_str[0] != '{' and param_str[-1] != '}':
                param_str = '{' + param_str + '}'
            return ast.literal_eval(param_str)

Legacy Mode for Caffe* Custom Layers

Model Optimizer can register custom layers in a way that the output shape is calculated by the Caffe* framework installed on your system. This chapter covers this option.

NOTE: Caffe Python* API has an issue when layer name does not correspond to the name of its top. The fix was implemented on BVLC Caffe*. The Caffe framework on your computer must contain this fix. Otherwise, Caffe framework can unexpectedly fail during the fallback procedure.

NOTE: The Caffe fallback feature was validated against this GitHub revision. You may have issues with forks or later Caffe framework versions.

  1. Create a file CustomLayersMapping.xml:
    mv extensions/front/caffe/CustomLayersMapping.xml.example extensions/front/caffe/CustomLayersMapping.xml
  2. Add (register) custom layers to CustomLayersMapping.xml:
    <CustomLayer NativeType="${Type}" hasParam="${has_params}" protoParamName="${layer_param}"/>

Where:

  • ${Type} is a type of the layer in the Caffe
  • ${has_params} is "true" if the layer has parameters, and is "false" otherwise
  • ${layer_param} is a name of the layer parameters in caffe.proto if the layer has it

Example:

  1. Proposal layer has parameters, and they appear in the Intermediate Representation. The parameters are stored in the proposal_param property of the layer:
     <CustomLayer NativeType="Proposal" hasParam ="true" protoParamName = "proposal_param"/> 
  2. CustomLayer layer has no parameters:
    <CustomLayer NativeType="CustomLayer" hasParam ="false"/> 

For this feature, you need an appropriate version of Caffe installed on the computer on which you run the Model Optimizer.

Constraints of Using the Caffe Fallback

Several layers in the Caffe* framework can have shapes that dynamically depend on the input data, not only the layers that proceed the layer and its parameters. For example, SimplerNMS is filtering out bounding boxes that do not satisfy the condition. Internally, Caffe fallback forwards the whole net without any meaningful data - just some noise. It is natural to get only one bounding box (0,0,0,0) instead of expected number (for example, 15). There is an option to patch Caffe accordingly, however, it makes success of Intermediate Representation generation on the patched Caffe on the particular machine. To keep the solution independent from Caffe, we recommend to use extensions mechanism for such layers.

Known cases like Proposal, DetectionOutput, SimplerNMS are implemented as extensions and can be used out of the box.

A detailed description of supported layers is in the Intermediate Representation Layers Notation Reference Catalog.

Building Caffe*
  1. Build Caffe* with Python* 3.5:
    export CAFFE_HOME=PATH_TO_CAFFE
    cd $CAFFE_HOME
    rm -rf  ./build
    mkdir ./build
    cd ./build
    cmake -DCPU_ONLY=ON -DOpenCV_DIR=<your opencv install dir> -DPYTHON_EXECUTABLE=/usr/bin/python3.5 ..
    make all # also builds pycaffe
    make install
    make runtest # optional
  2. Add Caffe Python directory to PYTHONPATH to let it be imported from the Python program:
    export PYTHONPATH=$CAFFE_HOME/python;$PYTHONPATH
  3. Check the Caffe installation:
    python3
    import caffe

If Caffe was installed correctly, the Caffe module is imported without errors.

TensorFlow* Models with Custom Layers

You have three options for TensorFlow* models with custom layers:

  • Register those layers as extensions to the Model Optimizer. In this case, the Model Optimizer generates a valid and optimized Intermediate Representation.
  • If you have sub-graphs that should not be expressed with the analogous sub-graph in the Intermediate Representation, but another sub-graph should appear in the model, the Model Optimizer provides such an option. This feature is helpful for many TensorFlow models. To read more, see Sub-graph Replacement in the Model Optimizer.
  • Experimental feature of registering definite sub-graphs of the model as those that should be offloaded to TensorFlow during inference. In this case, the Model Optimizer produces an Intermediate Representation that:
    • Can be inferred only on CPU
    • Reflects each sub-graph as a single custom layer in the Intermediate Representation

    For more information, see Offloading Computations to TensorFlow*. This feature is for development only. It is expected to be used, when you have the model that has complex structure and it is not an easy task to write extensions for internal subgraphs. In this case, you offload these complex subgraphs to TensorFlow to make sure that Model Optimizer and Inference Engine can successfully execute your model, however, for each such subgraph, TensorFlow library is called that is not optimized for inference. Then, you start replacing each subgraph with extension and remove its offloading to TensorFlow during inference until all the models are converted by Model Optimizer and inferred by Inference Engine only with the maximum performance.

Sub-Graph Replacement in the Model Optimizer

Several reasons exist for why the Model Optimizer could not generate an Intermediate Representation for a model. However, in some cases, the Intermediate Representation could be generated after providing certain hints to the tool. The examples of hints below are mostly related to TensorFlow*, but potentially could be actual for models created in any framework:

  • Topology contains an operation (or a sub-graph of operations) not known for Model Optimizer, but this operation (sub-graph) could be expressed as a combination of known operations (so hint would be a description of this combination to the tool).
  • Sub-graph of operations in the topology expresses a single layer known to Inference Engine.
  • TensorFlow and Inference Engine use different layouts of tensors, NHWC and NCHW respectively. If some tensor in NHWC layout is flattened (for example, all the dimensions are squashed into single dim), it is not possible to convert it to NCHW layout required for Inference Engine, so Model Optimizer cannot produce correct Intermediate Representation.

The detailed solutions for the examples above are given later, the next subsection shows what is common in all three examples.

Sub-graph Replacement

In these cases, the sub-graph (or a single node) of initial graph is replaced with a new sub-graph (single node). The sub-graph replacement consists of the following steps:

  1. Identify an existing sub-graph for replacement
  2. Generate a new sub-graph
  3. Connect a new sub-graph to the graph (create input/output edges to the new sub-graph)
  4. Create output edges out of a new sub-graph to the graph
  5. Do something with the original sub-graph (for example, remove it)

Model Optimizer provides several ways to perform most of the sub-graph replacement steps. The next subsections describe these methods.

Replace a Single Operation with a Sub-graph of Operations

For example, there is an operation SquaredDifference in TensorFlow* which calculates (a - b)^2, where a and b are input tensors. Inference Engine does not support such operation. However, SquaredDifference could be expressed using two Power operations and one Eltwise Add. The Power operation calculates scale * (a ^ power) + shift, where a is a tensor and scale, power and shift are float values. The first Power operation negates the value of tensor b. The second one is used to square the result of a + (- b) which is calculated using the Eltwise Add operation applied to tensor a and tensor -b.

Given that, we can replace all SquaredDifference operations in the initial model with two Power and one Eltwise operations. The replacer is implemented in the following file <INSTALL_DIR>/deployment_tools/model_optimizer/extensions/front/SquaredDifference.py.

import networkx as nx
from mo.front.common.replacement import FrontReplacementOp
from mo.graph.graph import Node
from mo.ops.eltwise import Eltwise
from mo.ops.power import Power
class SquaredDifference(FrontReplacementOp):
    """
    Example class illustrating how to implement replacement of a single op in the front-end of the MO pipeline.
    This class replaces a single op SquaredDifference by a sub-graph consisting of 3 lower-level ops.
    """
    op = "SquaredDifference"
    enabled = True
    def replace_op(self, graph: nx.MultiDiGraph, node: Node):
        negate = Power(graph, dict(scale=-1, name=node.name + '/negate_'))
        add = Eltwise(graph, dict(operation='sum', name=node.name + '/add_'))
        squared = Power(graph, dict(power=2, name=node.name + '/squared_'))
        out_node = squared.create_node([add.create_node([node.in_node(0), negate.create_node([node.in_node(1)])])])
        # Replace edge from out port 0 of the matched node with a edge from node out_node.id with port 0.
        # The "explicit" version of the return value is: [(out_node.id, 0)])
        return [out_node.id]

Model Optimizer internal representation of the graph uses the networkx module.

Key lines:

  • Line 1: Imports this module.
  • Line 3: Imports class FrontReplacementOp that is used to replace operation of particular type with a new sub-graph. This class performs the first step of the sub-graph replacement (identifies an existing sub-graph for replacement). It is important to mention that the replacement happens before shape inference and creation of data nodes representing tensors with values. At this stage of model conversion pipeline, all nodes in the graph are operation nodes or nodes of type Const that produce tensor with fixed value embedded into the node.
  • Line 4: Imports class Node representing a single node in the computation graph.
  • Lines 5 - 6: Import classes representing operations Power and Eltwise. These classes are inherited from base class mo.ops.Op that represents operation and stores its attributes.
  • Line 9: Defines class SquaredDifference inherited from FrontReplacementOp. This is a replacer class that is automatically registered and executed by Model Optimizer. Since the class is located in the common (not framework) specific directory <INSTALL_DIR>/deployment_tools/model_optimizer/extensions/front, it is used for replacement for all supported frameworks.
  • Line 15: Defines the class variable op that stores the name of the operation to be replaced. In this case, it is SquaredDifference.
  • Line 16: Defines class variable enabled that controls whether the replacer is enabled or not. The only function that should be implemented in the class is replace_op. It gets graph to operate on and an instance of node of desired operation (SquaredDifference in this case). This function performs step two and three of the sub-graph replacement (generates a new sub-graph to replace with and connects a new sub-graph to the graph).
  • Lines 19 - 21: Create instances of operations classes with required attributes.
  • Line 23: Creates a sub-graph from the operations defined above. The create_node method of the Op class generates Node from the Op and uses single mandatory argument - the list of input nodes (represented as instances of Node class) to create input edges to the node being generated. Inputs of the SquaredDifference node are retrieved using node.in_node(0) and node.in_node(1) method calls. The Eltwise Add node gets first input as initial first input of SquaredDifference node, the second input of add is the result of negation of the second input of SquaredDifference node: [add.create_node([node.in_node(0), negate.create_node([node.in_node(1)])])]. Then the result of Add node is squared. out_node node performs this calculation.

The replace_op function returns a list of node names used to create output edges of the sub-graph to connect it with the rest of the graph. Each element of the list describes mapping between old output edge of the matched node and new sub-graph node and output edge index. The i-th element of the list corresponds to the i-th output tensor of the matched node. In this case, SquaredDifference produces single tensor through output port 0, so the returned list contains single element. In general, each element is a tuple, where the first element is the name of a new node producing required tensor and the second is the output port for that tensor. If the output port is 0, it is possible to use shortcut - just the name of the node instead of a tuple. Line 26 uses this shortcut. The returned value is used to create the new sub-graph output edges (step 4 of the sub-graph replacement).

Default implementation of the FrontReplacementOp class removes matched node and all its input/output edges (step 5 of the sub-graph replacement).

Another example of such kind of replacement is in the <INSTALL_DIR>/deployment_tools/model_optimizer/extensions/front/Sub.py class where all instances of Sub operations are replaced with two operations: Power to negate the second argument and the Eltwise to perform elementwise add.

Replace Sub-graph of Operations with a New Sub-graph of Operations

The previous example considered situation when one single node of a specific type is replaced. When it is necessary to replace a sub-graph of operations it is necessary to tell Model Optimizer how to identify this sub-graph. There are three ways to achieve that:

  1. Use graph isomorphism pattern of the networkx module
  2. Use nodes name pattern to identify scope (according to TensorFlow* terminology) to be replaced
  3. Use sets of start and end node names to match all nodes "between" them

The next sections explain each option using real examples.

Replace Sub-graph of Operations Using Graph Isomorphism Pattern

networkx Python* module provides methods to find graph isomorphic to the given one using nodes and edges match: for example, networkx.algorithms.isomorphism.categorical_node_match, networkx.algorithms.isomorphism.categorical_multiedge_match. Model Optimizer uses these methods and provides simple API to use that feature.

For example, the Caffe* has layer called Mean-Variance Normalization (MVN), which is also supported by the Inference Engine. This layer is implemented with low-level operations in TensorFlow*: Mean, StopGradient, SquaredDifference, Squeeze and FusedBatchNorm. Model Optimizer should replace sub-graph with these operations with a single Inference Engine layer of type MVN.

The file <INSTALL_DIR>/deployment_tools/model_optimizer/extensions/front/tf/mvn.py perform such a replacement. The first part of the file is:

class MVN(FrontReplacementSubgraph):
    enabled = True
    def pattern(self):
        log.debug('Enabled MVN replacement')
        return dict(
            nodes=[
                ('mean', dict(op='Mean')),
                ('stop_grad', dict(op='StopGradient')),
                ('sqdiff', dict(op='SquaredDifference')),
                ('variance', dict(op='Mean')),
                ('squeeze_mean', dict(op='Squeeze')),
                ('squeeze_variance', dict(op='Squeeze')),
                ('fbn', dict(op='FusedBatchNorm')),
            ],
            edges=[
                ('mean', 'stop_grad', {'in': 0}),
                ('stop_grad', 'sqdiff', {'in': 1}),
                ('sqdiff', 'variance', {'in': 0}),
                ('mean', 'squeeze_mean', {'in': 0}),
                ('variance', 'squeeze_variance', {'in': 0}),
                ('squeeze_mean', 'fbn', {'in': 3}),
                ('squeeze_variance', 'fbn', {'in': 4}),
            ],
            node_attrs=['op'],
            edge_attrs=['in'])

In this file:

  • Line 1: Defines class MVN inherited from class FrontReplacementSubgraph that performs sub-graph replacement using sub-graph isomorphism pattern.
  • Line 3: Sets class variable enabled to value True meaning that this replacer is enabled.
  • The function pattern defines the sub-graph constraints to be matched. It returns a dictionary with four keys:
    • the nodes defines a list of nodes to be matched. Each element in the list is a tuple. The first element is the alias name assigned for the matched node, the second element is a dictionary with desired attributes of the node.
    • the edges defines a list of edges to be matched. Each element in the list is a tuple. The first and the second elements are the start and end edge nodes alias names respectively. The third element is a dictionary with desired edge attributes.
    • the node_attrs contains the names of nodes attributes to use during sub-graph isomorphism search.
    • the edge_attrs contains the names of edges attributes to use during sub-graph isomorphism search.
      The sub-graph is matched if all provided constraints are satisfied. If at least one node with desired attributes is missing or at least one defined edge is absent, the sub-graph is not matched.
  • Line 9: Adds constraint that sub-graph should contain node with attribute op with value Mean. The matched node gets an alias name mean. The same way the line 10 add constrain for node StopGradient, the matched node gets an alias name stop_grad.
  • Line 18: Defines edge from node with alias name mean to node with alias name stop_grad having attribute in equal to 0. This means that the output of node mean is connected to the node stop_grad as a first input (Model Optimizer uses zero-based indexing that is why in is 0). Another example of defining the edges constraints is in line 25 where the edge from squeeze_mean is connected to the fbn node as fourth input.
  • Lines 26 - 27: Specify a list of attributes to be checked. In fact, these lists are just list of all keys in the dictionaries for node and edge attributes.

Now when the Model Optimizer knows how to find sub-graph (step 1 of the sub-graph replacement), it is necessary to implement function that will perform actual sub-graph replacement (step 2 and 3). The code for this function is:

def replace_sub_graph(self, graph: nx.MultiDiGraph, match: dict):
    fbn = match['fbn']
    input = fbn.in_node(0)
    log.debug('Found potential MVN pattern after {} with name {}'.format(input.op, input.name))
    if input.id != match['mean'].in_node(0).id or input.id != match['sqdiff'].in_node(0).id:
        return
    log.debug('Confirmed MVN pattern after {} with name {}'.format(input.op, input.name))
    MVN = Op.get_op_class_by_name('MVN')
    mvn = MVN(graph, dict(
        name=fbn.name + '/MVN_',
        eps=fbn.eps,
        required_reduction_indices=[1,2] if fbn.data_format == b'NHWC' else [2,3]
    ))
    mvn.attrs['old_infer'] = mvn.attrs['infer']
    mvn.attrs['infer'] = __class__.infer
    mul = Eltwise(graph, dict(operation='mul', name=fbn.name + '/Mul_'))
    add = Eltwise(graph, dict(operation='sum', name=fbn.name + '/Add_'))
    input_gamma = fbn.in_node(1)
    input_beta = fbn.in_node(2)
    mean_reduction = match['mean'].in_node(1)
    variance_reduction = match['mean'].in_node(1)
    new_subgraph = add.create_node([
        mul.create_node([
            mvn.create_node([input, mean_reduction, variance_reduction]),
            input_gamma
        ]),
        input_beta
    ])
    replace_node(fbn, new_subgraph)

The function accepts two arguments - the graph and the dictionary match. The keys in the dictionary are the alias names of matched nodes (defined in the nodes list in the function pattern) and the values are the matched node of the graph (the instance of Node object).

The function generates new sub-graph with node of type MVN and two nodes of the type Eltwise calculating sum and product. There is nothing interesting in how the graph is generated and mathematics behind that, so attention will be put to two aspects of this function.

The first one is the call to function replace_node in line 36. FusedBatchNorm node is replaced with the output node of the generated sub-graph: all input edges of the FusedBatchNorm node are re-connected to the new_subgraph node, all consumers of the FusedBatchNorm node are updated to get inputs from the new_subgraph node. This action connects newly generated sub-graph with an existing graph (step 4 of the sub-graph replacement).

The second one is that the default implementation of the inference function for MVN operation is overwritten. In line 16, the default implementation of the inference function for MVN is saved to attribute old_infer. In line 17, the new inference function is saved to the instance of the MVN operation class. The new inference function code look the following way:

@staticmethod
def infer(node: Node):
    if not(node.in_node(1).has_valid('value') and node.in_node(2).has_valid('value')):
        log.warning('Reduction indices for mean and variance for MVN node {} are not constants'.format(node.name))
        return
    if not(all(node.in_node(1).value == node.required_reduction_indices) and
        all(node.in_node(2).value == node.required_reduction_indices)):
        log.warning('Reduction indices for mean {} and variance {} do not match required ones {}'.format(
            node.in_node(1).value,
            node.in_node(2).value,
            node.required_reduction_indices
        ))
        return
    node.graph.remove_edge(node.in_node(1).id, node.id)
    node.graph.remove_edge(node.in_node(2).id, node.id)
    node.old_infer(node)

The infer function is needed to infer value of the node (if it is possible) and to infer shapes of the output tensors of the node (mandatory). The custom infer function performs additional checks that describe limitations of the MVN layer implementation in the Inference Engine. For example, reduction indices for mean and variance must be constants (line 10), while in TensorFlow* they could be computed during model inference. In addition, the function removes two edges from the graph (lines 17 and 18) because all required information is already stored in the MVN node attributes. This is due to different MVN layer implementation in Inference Engine and TensorFlow*: mean and variance are attributes of the node in Inference Engine while in TensorFlow they are input tensors. Edges are not removed in the replace_sub_graph function, because these edges are used in the infer function (lines 7-12).

The last action in the infer method (line 19) is to call default infer function for the MVN, which is saved in the attribute old_infer of the node to infer output tensors shapes.

On the step 5 of the sub-graph replacement, six matching nodes are automatically removed during the dead code elimination pass that is performed after applying of custom sub-graph replacements defined. Six matching nodes are no more connected to the inputs of the network after replacing node fbn with a newly created sub-graph node. Since they are not marked as output nodes (using --output command line parameter), they could be removed.

The replacement works for all sub-graph isomorphism instances found in the network.

Replace Sub-graph of Operations Using Nodes Name Pattern

TensorFlow* uses mechanism of scope to group related operation nodes. It is a good practice to put nodes performing particular task into the scope. This approach divides graph into logical blocks that are easier to review in TensorBoard*. The scope, in fact, just defines common prefix for the node names in the scope.

For example, Inception topologies contain several types of so-called "Inception blocks". Some of them are exactly equal to each other, but located in different places of the network. For example, Inception V4 from tensorflow.contrib.slim module has inception blocks Mixed_5b, Mixed_5c and Mixed_5d with exactly the same nodes with the same attributes.

Now consider situation when someone implemented these Inception blocks extremely efficiently using single Inference Engine custom layer called InceptionBlock and would like to replace these blocks with instances of the layer to decrease inference time. Model Optimizer provides mechanism to replace sub-graph of operations defined by the regular expressions for the node names prefixes (scope). In this particular case, some of the patterns are: .*InceptionV4/Mixed_5b, .*InceptionV4/Mixed_5c and .*InceptionV4/Mixed_5d. Each pattern starts with .*, because a prefix InceptionV4 is added to all nodes names during a model freeze.

The sub-graph replacement using nodes name pattern is a bit trickier than replacements of single operation and networkx isomorphism pattern described above. You should do the following additional steps in comparison with previously described replacements:

  1. Prepare configuration file template defining node names patterns and information about custom layer attributes
  2. Run Model Optimizer with command line parameter to add information about input and output nodes of the specified sub-graphs

Consider the following possible configuration file for the Inception Block replacer:

[
    {
        "custom_attributes": {
            "attr1_key": "attr1_value",
            "attr2_key": 123456
        },
        "id": "InceptionBlockReplacer",
        "op": "InceptionBlock",
        "instances": [
            ".*InceptionV4/Mixed_5b",
            ".*InceptionV4/Mixed_5c",
            ".*InceptionV4/Mixed_5d"
        ],
        "match_kind": "scope"
    }
]

The .json file contains list of dictionaries. Each dictionary defines one replacement. Each replacement is defined with several keys:

  • id (mandatory) is a unique identifier of the replacer. It is used in the Python* code that implements sub-graph replacement to link the class and the replacement description from the configuration file.
  • match_kind (mandatory) is a string that specifies what matching algorithm is used. Currently supported scope and points. In this example, the first one is considered. The points match kind is described below.
  • instances (mandatory) specifies instances of the sub-graph to be matched. It contains a list of node names prefixes patterns for the match kind scope.
  • custom_attributes (optional) is a dictionary with static attributes of the layer to be dumped to Inference Engine Intermediate Representation .xml file.
  • op (optional) is used only if the sub-graph replacement Python code is not needed, because the sub-graph should be replaced with a single node of type op. If this attribute is not set, it is necessary to implement Python code with sub-graph generation code. Both options are considered in this example.

When the configuration file is ready, run the Model Optimizer with regular command line parameters pointing to the file with model and input shapes (if necessary) and additional parameter --tensorflow_custom_operations_config_update pointing to the generated configuration file. If the file is correct, Model Optimizer adds two keys to the InceptionBlockReplacer dictionary: inputs and outputs with the following content:

[
    {
        "id": "InceptionBlockReplacer",
        ...
        "inputs": [
            [
                {
                    "node": "Branch_2/Conv2d_0a_1x1/Conv2D$",
                    "port": 0
                },
                {
                    "node": "Branch_3/AvgPool_0a_3x3/AvgPool$",
                    "port": 0
                },
                {
                    "node": "Branch_1/Conv2d_0a_1x1/Conv2D$",
                    "port": 0
                },
                {
                    "node": "Branch_0/Conv2d_0a_1x1/Conv2D$",
                    "port": 0
                }
            ]
        ],
        "outputs": [
            {
                "node": "concat$",
                "port": 0
            }
        ]
    }
]

The value for key inputs is a list of lists describing input tensors of the sub-graph. Each element of the top-level list corresponds to one unique input tensor of the sub-graph. Each internal list describes a list of nodes consuming this tensor and port numbers where the tensor is consumed. Model Optimizer generates regular expressions for the input nodes names to uniquely identify them in each instance of the sub-graph defined by the instances. Denote these nodes as input nodes of the sub-graph.

In the InceptionV4 topology, the InceptionV4/Mixed_5b block has four input tensors from outside of the sub-graph, but all of them are produced by the node InceptionV4/Mixed_5a/concat. Therefore, the top-level list of the inputs contains one list corresponding to this tensor. Four input nodes of the sub-graph consume the tensor produced by InceptionV4/Mixed_5a/concat node. In this case, all four input nodes consume input tensor into port 0.

The order of items in the internal list describing nodes does not matter, but the order of elements in the top-level list is important. This order defines the order in which the Model Optimizer attaches input tensors to a new generated node if the sub-graph is replaced with a single node. The i-th input node of the sub-graph is obtained using call match.single_input_node(i) in the sub-graph replacer code. More information about API is given below. If you need to change the order of input tensors, you can edit the configuration file in the text-editor.

The value for the key outputs is a list describing nodes of the sub-graph producing tensor that goes outside of the sub-graph or does not have child nodes. Denote these nodes as output nodes of the sub-graph. The order of elements in the list is important. The i-th element of the list describes the i-th output tensor of the sub-graph, which could be obtained using call match.output_node(i). The order of elements can be manually changed in the configuration file. Model Optimizer uses this order to connect output edges if the sub-graph is replaced with a single node.

Now, when meaning of inputs and outputs attributes is clean, return back to the replacer implementation. The replacer InceptionBlockReplacer contains attribute op with the value InceptionBlock, which means that the identified sub-graph should be replaced with a single layer of type InceptionBlock. This layer is not known for the Model Optimizer, so it is necessary to define it. See Extending the Model Optimizer with New Primitives. You must create file extension/ops/InceptionBlock.py with the following content:

import numpy as np
from mo.graph.graph import Node
from mo.ops.op import Op
class InceptionBlock(Op):
    op = "InceptionBlock"
    enabled = True
    def __init__(self, graph, attrs):
        super().__init__(graph, attrs, {
            'type': __class__.op,
            'op': __class__.op,
        })

The shape inference function is not defined. In this case, Model Optimizer uses TensorFlow* fallback to calculate shapes of the sub-graph output tensors.

Run the Model Optimizer with the regular command line parameters, path to the model file and input shape (if necessary), and the parameter --tensorflow_use_custom_operations_config and point to the created configuration file. Model Optimizer generates Intermediate Representation .xml file with three sequential layers of type InceptionBlock like in the following example:

<layer id="1658" name="InceptionBlock1877" precision="FP32" type="InceptionBlock">
    <input>
        <port id="0">
            <dim>1</dim>
            <dim>384</dim>
            <dim>35</dim>
            <dim>35</dim>
        </port>
    </input>
    <output>
        <port id="1">
            <dim>1</dim>
            <dim>384</dim>
            <dim>35</dim>
            <dim>35</dim>
        </port>
    </output>
</layer>

The implementation of the sub-graph replacement by scope with a single layer is complete. The next subsection explains how Model Optimizer replaces sub-graph identified by start/end nodes (points) with another sub-graph.

Replace Sub-graph of Operations Using Points

In this scenario, for the matching algorithm, you should define the sub-graph via a set of start and end nodes. Given the set, the Model Optimizer performs the following steps:

  1. Starts graph traversal from every start node following the direction of the graph edges. The search stops in end nodes or in case of nodes without further children. All visited nodes are added to the matched sub-graph.
  2. Starts another graph traversal from each non-start node of the sub-graph, that is every node except nodes from start set. In this step, the edges are traversed in the opposite edge direction. All newly visited nodes are added to the matched sub-graph. This step is needed to add nodes required for calculation values of internal nodes of the matched sub-graph.
  3. Checks that all end nodes were reached from input nodes. If no, exits with error.
  4. Check that there are no Placeholder operations among added nodes. If it is not true, some side branch of the sub-graph (added in step 2) depends on inputs of the network. Such configuration is not correct, so exits with error.

This algorithm finds all nodes "between" start and end nodes. Nodes needed for calculation of non-input nodes of the matched sub-graph produce constant values, because they do not depend on input of the network. This sub-graph match has a limitation that each start node must have only one input. Therefore, it is not possible to specify, for example, convolution node as input, because it has two inputs: data tensor and tensor with weights.

For example of replacement with points, see Case Study: Converting SSD Models Created With a TensorFlow Object Detection API.

Offloading Computations to TensorFlow*

Model Optimizer cannot generate an Intermediate Representation from unsupported TensorFlow* operations, as is the case with some custom layers. However, you can still successfully create an Intermediate Representation if you offload the unsupported operations to TensorFlow for computation.

Limitations:

  • You can only offload operations to TensorFlow from a Linux* OS computer.
  • The custom layer supports inference only on a CPU, not on GPU or on FPGA.
  • The Inference Engine uses NCHW layout for tensors, but TensorFlow uses usually NHWC. Model Optimizer performs conversion between these layouts to correctly infer the model. Model Optimizer adds transpose operations to convert sub-graph 4D input tensors from NCHW layout to NHWC and vice versa for the output nodes. These operations are embedded in the protobuf string that describes the TensorFlow sub-graph in the Intermediate Representation .xml file.
    Sometimes, this approach fails. For example, the offload convolution to TensorFlow fails, because the convolution layout weights in TensorFlow do not correspond to the layout weights in the Inference Engine. However, offloading convolution nodes plus nodes with weights succeeds, because the node with weights is a part of offloaded sub-graph, so there are no transposes for the weights tensor. The successful nodes are usually of type Const.
How to Build a Custom Layer to Offload Computations to TensorFlow

NOTE: You need to perform this step only once.

  1. Clone the TensorFlow* r1.4 Git repository.
  2. Set the environment variable TF_ROOT_DIR to point to the cloned directory:
    export TF_ROOT_DIR=<TENSORFLOW_DIR>
  3. Set the Intel Distribution of OpenVINO toolkit environment variables by running the setupvars.sh script:
    source <INSTALL_DIR>/bin/setupvars.sh
  4. Build an Inference Engine layer with TensorFlow runtime. This might take about 20 minutes:
    ./tf_call_ie_layer/build.sh.
  5. A shared library is generated:
    $TF_ROOT_DIR/bazel-bin/tensorflow/cc/inference_engine_layer/libtensorflow_call_layer.so
    This library is the Inference Engine custom layer, which is used to offload inference to TensorFlow.
How to Run a Model with Operations Offloaded to TensorFlow
  1. Compile extensibility_sample
  2. Run extensibility_sample:
    ./extensibility_sample -i <path_to_image_file> -m <path_to_IR.xml> -d CPU -l <path_to_libtensorflow_call_layer.so>

Three command-line options are available to offload part of the inference to TensorFlow:

NOTE: Use the command-line options on the line with the command: python3 mo.py --input_model model-file.pb.

  • Use node name patterns to offload a sub-graph of operations using the command-line option:
    -- tensorflow_subgraph_patterns 

    This option uses a comma-separated list of regular expressions to match node names. This offload has two primary characteristics:

    • All nodes that match a specific regular expression are merged into a single Inference Engine node that TensorFlow executes.
    • All patterns are applied independently, which means two nodes that match two different patterns are not merged into one node.​ For example, the option --tensorflow_subgraph_patterns "Scope_1/.*,Scope_2.*" is merged with all nodes whose names start from Scope_1/ to a new node, and all nodes whose names start from Scope_2 are merged to a different node.
  • Offload specific types of operations using the command-line option:
    --tensorflow_operation_patterns

    This option specifies a comma-separated list of regular expressions to match node types. This offload has a primary characteristic: all nodes that match a specific regular expression are merged into a single Inference Engine node that TensorFlow executes. For example, the following command offloads all operations of type Concat, ConcatV2, Add, and BiasAdd to Tensorflow:

    --tensorflow_operation_patterns "Concat.*,.*Add"
  • Offload all unsupported operations automatically, using the command-line option:
    --offload_unsupported_operations_to_tf

    With this option, the Model Optimizer analyzes a network graph and finds unsupported operations. Model Optimizer finds and offloads the connected sub-graphs of unsupported operations. The unsupported operations are offloaded to TensorFlow.

You can use all three options by issuing the commands in this order:

python3 mo.py --input_model model-file.pb --tensorflow_subgraph_patterns
python3 mo.py --input_model model-file.pb --tensorflow_operation_patterns
python3 mo.py --input_model model-file.pb --offload_unsupported_operations_to_tf

(Deprecated) Case Study: Converting SSD Models Created with TensorFlow* Object Detection API

NOTE: This is a deprecated section. Please, consider reading the Converting TensorFlow* Object Detection API Models section that describes a new approach to convert Object Detection API models giving closer to TensorFlow inference results.

This chapter describes how to convert SSD MobileNet V1 and SSD Inception V2 models from the TensorFlow* Object Detection API Zoo version prior 1.6.0. The information on how to enable MobileNet V2 model is given in the dedicated section.

As explained in Sub-graph Replacement in Model Optimizer, you have multiple ways to setup the sub-graph matching. This example focuses on defining the sub-graph via a set of start and end nodes. The result of matching is two buckets of nodes:

  • Nodes "between" start and end nodes
  • Nodes connected to the first list, but just on the constant path (for example, these nodes are not connected to the inputs of the entire graph).

For more information on the SSD models from the TensorFlow* detection model zoo, refer to SSD MobileNet and SSD InceptionV2.

A distinct layer of any SSD topology is the DetectionOutput layer. This layer is implemented with a dozens of primitive operations in TensorFlow, while in Inference Engine, it is one layer. Thus, to convert an SSD model from the TensorFlow, the Model Optimizer should replace the entire sub-graph of operations that implement the DetectionOutput layer with a single well-known DetectionOutput node.

The Inference Engine DetectionOutput layer consumes three tensors in the following order:

  1. Tensor with locations of bounding boxes
  2. Tensor with confidences for each bounding box
  3. Tensor with prior boxes (anchors in TensorFlow terminology)

DetectionOutput layer produces one tensor with seven numbers for each actual detection. There are more output tensors in the TensorFlow Object Detection API, but the values in them are consistent with the Inference Engine ones.

The difference with other examples is that here the DetectionOutput sub-graph is replaced not with a single layer, but with a new sub-graph.

Look at sub-graph replacement configuration file <INSTALL_DIR>/deployment_tools/model_optimizer/extensions/front/tf/ssd_support.json that is used to enable two models listed above:

[
    {
        "custom_attributes": {
            "code_type": "caffe.PriorBoxParameter.CENTER_SIZE",
            "confidence_threshold": 0.01,
            "keep_top_k": 200,
            "nms_threshold": 0.45,
            "pad_mode": "caffe.ResizeParameter.CONSTANT",
            "resize_mode": "caffe.ResizeParameter.WARP"
        },
        "id": "TFObjectDetectionAPIDetectionOutput",
        "include_inputs_to_sub_graph": true,
        "include_outputs_to_sub_graph": true,
        "instances": {
            "end_points": [
                "detection_boxes",
                "detection_scores",
                "num_detections"
            ],
            "start_points": [
                "Postprocessor/Shape",
                "Postprocessor/Slice",
                "Postprocessor/ExpandDims",
                "Postprocessor/Reshape_1"
            ]
        },
        "match_kind": "points"
    },
    {
        "custom_attributes": {
        },
        "id": "PreprocessorReplacement",
        "inputs": [
            [
                {
                    "node": "map/Shape$",
                    "port": 0
                },
                {
                    "node": "map/TensorArrayUnstack/Shape$",
                    "port": 0
                },
                {
                    "node": "map/TensorArrayUnstack/TensorArrayScatter/TensorArrayScatterV3$",
                    "port": 2
                }
            ]
        ],
        "instances": [
            ".*Preprocessor/"
        ],
        "match_kind": "scope",
        "outputs": [
            {
                "node": "sub$",
                "port": 0
            },
            {
                "node": "map/TensorArrayStack_1/TensorArrayGatherV3$",
                "port": 0
            }
        ]
    }
]

Key lines:

  • Lines 3-10 define static attributes that will be saved to the Intermediate Representation .xml file for DetectionOutput layer.

  • Lines 12 and 13 define values for attributes that must be always set to "true" for this release of the Model Optimizer. These two attributes are specific for sub-graph matching by points only.

  • Lines 14-26 define one instance of the sub-graph to match. It is an important difference between sub-graph matching by scope and points. Several instances could be specified for matching by scope, but matching by points allows to specify just one instance. So the full node names (not regular expressions like in case of match withscope) are specified in the instances dictionary.

The second sub-graph replacer with identifier PreprocessorReplacement is used to remove the Preprocessing block from the graph. The replacer removes all nodes from this scope except nodes performing mean value subtraction and scaling (if applicable). Implementation of the replacer is in the <INSTALL_DIR>/deployment_tools/model_optimizer/extensions/front/tf/Preprocessor.py file.

The topologies generated with the Object Detection API include several blocks performing particular task:

  • Preprocessor block resizes, scales, and subtracts mean values from the input image.
  • FeatureExtractor block is a MobileNet or other backbone to extract features.
  • MultipleGridAnchorGenerator block creates initial bounding boxes locations (anchors).
  • Postprocessor block acts as a DetectionOutput layer. So we need to replace Postprocessor block with DetectionOutput layer. It is necessary to add all input nodes of the Postprocessor scope to the list start_points. Consider inputs of each of these nodes:
    • Postprocessor/Shape consumes tensor with locations.
    • Postprocessor/Slice consumes tensor with confidences.
    • Postprocessor/ExpandDims consumes tensor with prior boxes.
    • Postprocessor/Reshape_1 consumes tensor with locations similarly to the Postprocessor/Shape node. Despite the fact that the last node Postprocessor/Reshape_1 gets the same tensor as the node Postprocessor/Shape, it must be explicitly added to the list.

Object Detection API Postprocessor block generates output nodes: detection_boxes, detection_scores, num_detections, detection_classes.

Consider the implementation of the sub-graph replacer available in the <INSTALL_DIR>/deployment_tools/model_optimizer/extensions/front/tf/SSDs.py. The file is rather big, so only some code snippets are explained in this section:

class PostprocessorReplacement(FrontReplacementFromConfigFileSubGraph):
    replacement_id = 'TFObjectDetectionAPIDetectionOutput'

These lines define the new PostprocessorReplacement class inherited from FrontReplacementFromConfigFileSubGraph. FrontReplacementFromConfigFileSubGraph is designed to replace sub-graph of operations described in the configuration file. There are methods to override for implementing custom replacement logic that we need:

  • generate_sub_graph performs new sub-graph generation and returns dictionary where key is an alias name for the node and value is a node objects. The dictionary has the same format as the parameter match in the replace_sub_graph method in the example with networkx sub-graph isomorphism pattern. This dictionary is passed as an argument to the next three methods, so it should contain entries for nodes that the functions need.
  • input_edges_match specifies mapping between input edges to sub-graph before replacement and after replacement. The key of the dictionary is a tuple that specifies input tensor of the sub-graph before replacement: sub-graph input node name and input port number for this node. The value for this key is also a tuple that specifies the node that this tensor should be attached to during replacement: the node name (or alias name of the node) and the input port for this node. If the port number is zero, the parameter can be omitted so the key or value is just a node name (alias). Default implementation of the method returns an empty dictionary, so the Model Optimizer does not create new edges.
  • output_edges_match returns mapping between old output edges of the matched nodes and new sub-graph node and output edge index. The format is similar to the dictionary returned in the input_edges_match method. The only difference is that instead of specifying input port numbers for the nodes it is necessary to specify output port number. Of course, this mapping is required for the output nodes only. Default implementation of the method returns an empty dictionary, so the Model Optimizer does not create new edges.
  • nodes_to_remove specifies list of nodes that the Model Optimizer should remove after sub-graph replacement. Default implementation of the method removes all sub-graph nodes.

Review the replacer code considering details of the DetectionOutput layer implementation in the Inference Engine. There are several constraints to the input tensors of the DetectionOutput layer:

  • The tensor with locations must be of shape [#batch, #prior_boxes * 4] or [#batch, #prior_boxes * 5] depending on shared locations between different batches or not.
  • The tensor with confidences must be of shape [#batch, #prior_boxes * #classes] and confidences values are in range [0, 1], that is passed through SoftMax layer.
  • The tensor with prior boxes must be of shape [#batch, 2, #prior_boxes * 4]. Inference Engine expects that it contains variance values which TensorFlow* Object Detection API does not add.

To enable these models, add Reshape operations for locations and confidences tensors and update the values for the prior boxes to include the variance constants (they are not there in TensorFlow Object Detection API).

Look at the generate_sub_graph method:

def generate_sub_graph(self, graph: nx.MultiDiGraph, match: SubgraphMatch):
    log.debug('PostprocessorReplacement.generate_sub_graph')
    log.debug('matched_nodes = {}'.format(match.matched_nodes_names()))
    # softmax to be applied to the confidence
    softmax_conf_op = Softmax(graph, {'axis': 2, 'nchw_layout': True})
    softmax_conf_node = softmax_conf_op.add_node(dict(name='DetectionOutput_SoftMax_conf_'))
    # Inference Engine DetectionOutput layer consumes flattened tensors
    # reshape operation to flatten locations tensor
    reshape_loc_op = Reshape(graph, {'dim': np.array([0, -1])})
    reshape_loc_node = reshape_loc_op.add_node(dict(name='DetectionOutput_Reshape_loc_'))
    # Inference Engine DetectionOutput layer consumes flattened tensors
    # reshape operation to flatten confidence tensor
    reshape_conf_op = Reshape(graph, {'dim': np.array([0, -1])})
    reshape_conf_node = reshape_conf_op.add_node(dict(name='DetectionOutput_Reshape_conf_'))
    # create Node object from Op class
    detection_output_op = DetectionOutput(graph, match.custom_replacement_desc.custom_attributes)
    detection_output_op.attrs['old_infer'] = detection_output_op.attrs['infer']
    detection_output_op.attrs['infer'] = __class__.do_infer
    detection_output_node = detection_output_op.add_node(dict(name=detection_output_op.attrs['type'] + '_'))
    # create internal edges of the sub-graph. In this case we add edges to connect input port 0 and 1 of the
    # detection output with output of reshape of locations and reshape of confidence
    create_edge(softmax_conf_node, reshape_conf_node, 0, 0)
    create_edge(reshape_loc_node, detection_output_node, 0, 0)
    create_edge(reshape_conf_node, detection_output_node, 0, 1)
    return {'detection_output_node': detection_output_node, 'reshape_conf_node': softmax_conf_node,
            'reshape_loc_node': reshape_loc_node}

The method has two inputs: the graph to operate on and the instance of SubgraphMatch object, which describes matched sub-graph. The latter class has several useful methods to get particular input/output node of the sub-graph by input/output index or by node name pattern. Examples of these methods usage are given below.

Key lines:

  • Lines 6 and 7 create new instance of operation of type Softmax and graph Node object corresponding to that operation.
  • Lines 11-12 and 16-17 create new instance of operation of type Reshape to reshape locations and confidences tensors correspondingly.
  • Lines 20-23 create new instance of operation DetectionOutput and graph Node object corresponding to that operation.
  • Lines 27-29 connect softmax node with reshape node and connect two reshaped locations and confidences tensors with DetectionOutput node.
  • Lines 30-31 define dictionary with aliases for detection output node, reshape locations and confidences nodes. These aliases are used in the input_edges_match and output_edges_match methods.

The input_edges_match method is the following:

def input_edges_match(self, graph: nx.DiGraph, match: SubgraphMatch, new_sub_graph: dict):
    locs_consumer_node, locs_consumer_node_port = match.input_nodes(0)[0]
    conf_consumer_node, conf_consumer_node_port = match.input_nodes(1)[0]
    priors_consumer_node, priors_consumer_node_port = match.input_nodes(2)[0]
    # create matching nodes for locations and confidence tensors using simple scheme "old_node_name: new_node_name"
    # which in fact means "(old_node_name, 0): (new_node_name, 0)", while first '0' means old_port and the second
    # zero defines 'new_port'.
    return {locs_consumer_node.id: new_sub_graph['reshape_loc_node'].id,
            conf_consumer_node.id: new_sub_graph['reshape_conf_node'].id,
            priors_consumer_node.id: (new_sub_graph['detection_output_node'].id, 2),
            }

The method has three parameters: input graph, match objects describing matched sub-graph and new_sub_graph dictionary with alias names returned from the generate_sub_graph method.

Key lines:

  • Lines 2-4 initialize Node objects and input ports for the nodes where the input tensors for the sub-graph are consumed. The method match.input_nodes(ind) returns list of tuples where the first element is a Node object and the second is the input port for this node which consumes the ind-th input tensor of the sub-graph. input_points list in the configuration file defines the order of input tensors to the sub-graph. For example, the locs_consumer_node object of type Node is a node that consumes tensor with locations in the port with number locs_consumer_node_port.
  • Lines 8-11 define dictionary with the mapping of tensors as described above. Note that the attribute id of the Node object contains the name of the node in the graph.

The output_edges_match method is the following:

def output_edges_match(self, graph: nx.DiGraph, match: SubgraphMatch, new_sub_graph: dict):
    # the DetectionOutput in IE produces single tensor, but in TF it produces two tensors, so we need to create only
    # one output edge match
    return {match.output_node(0)[0].id: new_sub_graph['detection_output_node'].id}

The method has the same three parameters as input_edges_match method. The returned dictionary contains mapping just for one tensor initially produces by the first output node of the sub-graph (which is detection_boxes according to the configuration file) to a single output tensor of the created DetectionOutput node. In fact, it is possible to use any output node of the initial sub-graph in mapping, because the sub-graph output nodes are the output nodes of the whole graph (their output is not consumed by any other nodes).

Now the Model Optimizer knows how to replace the sub-graph. The last step for enabling the model is to cut off some parts of the graph not needed for inference.

It is necessary to remove the Preprocessor block where image is resized. Inference Engine does not support dynamic input shapes, so the Model Optimizer must froze the input image size, and thus, resizing the image is not necessary. This is achieved by the replacer <INSTALL_DIR>/deployment_tools/model_optimizer/extensions/front/tf/Preprocessor.py, which is executed automatically.

There are several Switch operations in the Postprocessor block without output edges. For example, Postprocessor/BatchMultiClassNonMaxSuppression/map/while/PadOrClipBoxList/cond/cond/switch_t, Postprocessor/BatchMultiClassNonMaxSuppression/map/while/PadOrClipBoxList/cond/cond/switch_f, Postprocessor/BatchMultiClassNonMaxSuppression/map/while/PadOrClipBoxList/cond_1/cond/switch_t, Postprocessor/BatchMultiClassNonMaxSuppression/map/while/PadOrClipBoxList/cond_1/cond/switch_f.

Model Optimizer marks these nodes as output nodes of the topology. Some parts of the Posprocessor blocks are not removed during sub-graph replacement because of that. In order to fix this issue, it is necessary to specify output nodes of the graph manually using the --output command-line parameter.

Example Model Optimizer Command-Line for TensorFlow* SSD

The final command line to convert SSDs from the TensorFlow* Object Detection Zoo is:

./mo_tf.py --input_model=<path_to_frozen.pb> --tensorflow_use_custom_operations_config extensions/front/tf/ssd_support.json --output="detection_boxes,detection_scores,num_detections"
Converting MobileNet V2 Model Created with TensorFlow Object Detection API

The MobileNet V2 model differs from the previous version, so converting the model requires a new sub-graph replacement configuration file and new command line parameters. The major differences are:

  • The Preprocessor block has two outputs: the pre-processed image and the pre-processed image size.
  • The Postprocessor block has one more input (in comparison to the models created with TensorFlow Object Detection API version 1.6 or lower): the pre-processed image size.
  • Some node names have been changed in the Postprocessor block.

The updated sub-graph replacement configuration file extensions/front/tf/ssd_v2_support.json reflecting these changes is the following:

[
    {
        "custom_attributes": {
            "code_type": "caffe.PriorBoxParameter.CENTER_SIZE",
            "confidence_threshold": 0.01,
            "keep_top_k": 200,
            "nms_threshold": 0.6,
            "pad_mode": "caffe.ResizeParameter.CONSTANT",
            "resize_mode": "caffe.ResizeParameter.WARP"
        },
        "id": "TFObjectDetectionAPIDetectionOutput",
        "include_inputs_to_sub_graph": true,
        "include_outputs_to_sub_graph": true,
        "instances": {
            "end_points": [
                "detection_boxes",
                "detection_scores",
                "num_detections"
            ],
            "start_points": [
                "Postprocessor/Shape",
                "Postprocessor/scale_logits",
                "Postprocessor/ExpandDims",
                "Postprocessor/Reshape_1",
                "Postprocessor/ToFloat"
            ]
        },
        "match_kind": "points"
    },
    {
        "custom_attributes": {
        },
        "id": "PreprocessorReplacement",
        "inputs": [
            [
                {
                    "node": "map/Shape$",
                    "port": 0
                },
                {
                    "node": "map/TensorArrayUnstack/Shape$",
                    "port": 0
                },
                {
                    "node": "map/TensorArrayUnstack/TensorArrayScatter/TensorArrayScatterV3$",
                    "port": 2
                }
            ]
        ],
        "instances": [
            ".*Preprocessor/"
        ],
        "match_kind": "scope",
        "outputs": [
            {
                "node": "sub$",
                "port": 0
            },
            {
                "node": "map/TensorArrayStack_1/TensorArrayGatherV3$",
                "port": 0
            }
        ]
    }
]
Example of Model Optimizer Command Line for TensorFlow SSD MobileNet V2

The final command line to convert SSD MobileNet V2 from the TensorFlow Object Detection Zoo is the following:

./mo_tf.py --input_model=<path_to_frozen.pb> --tensorflow_use_custom_operations_config extensions/front/tf/ssd_v2_support.json --output="detection_boxes,detection_scores,num_detections"

(Deprecated) Case Study: Converting Faster R-CNN models created with TensorFlow* Object Detection API

NOTE: This is a deprecated section. Please consider reading the Converting TensorFlow* Object Detection API Models section that describes a new approach to convert Object Detection API models giving closer to TensorFlow inference results.

This chapter describes how to convert selected Faster R-CNN models from the TensorFlow Object Detection API zoo version equal or higher than 1.6.0. The full list of supported models is provided in the table below. Note that currently only batch size equal to 1 is supported. The only Inference Engine plugin supporting these topologies inference is CPU.

The Faster R-CNN models contain several building blocks similar to building blocks from SSD models so it is highly recommended to read the chapter about enabling TensorFlow Object Detection API SSD models first. Detailed information about Faster R-CNN topologies is provided here.

The TensorFlow network consists of a number of big blocks grouped by scope:

  • Preprocessor performs scaling/resizing of the image and converts input data to [0, 1] interval. Has two outputs: the first one is modified input image and the second one is a constant tensor with shape (batch_size, 3) and values (resized_image_height, resized_image_width, 3).
  • FirstStageFeatureExtractor is a backbone feature extractor.
  • FirstStageBoxPredictor calculates boxes and classes predictions.
  • GridAnchorGenerator generates anchors coordinates.
  • ClipToWindow crops anchors to the resized image size.
  • Decode decodes coordinates of boxes using anchors and data from the FirstStageBoxPredictor.
  • BatchMultiClassNonMaxSuppression performs non maximum suppression.
  • map scales coordinates of boxes to [0, 1] interval by dividing coordinates by (resized_image_height, resized_image_width).
  • map_1 scales coordinates from [0, 1] interval to resized image sizes.
  • SecondStageFeatureExtractor is a feature extractor for predicted Regions of interest (ROIs).
  • SecondStageBoxPredictor refines box coordinates according SecondStageFeatureExtractor.
  • SecondStagePostprocessor is DetectionOutput layer performing final boxes predictions.
Sub-graph replacements

There are three sub-graph replacements defined in the extensions/front/tf/legacy_faster_rcnn_support.json used to convert these models:

  • The first one replaces the Preprocessor block. The implementation of this replacer is in the <INSTALL_DIR>/deployment_tools/model_optimizer/extensions/front/tf/Preprocessor.py
  • The second one replaces a number of blocks in the the graph including GridAnchorGenerator, ClipToWindow, Decode, BatchMultiClassNonMaxSuppression, Tile, Tile_1, and map with Proposal and ROIRooling layers and some additional layers to pre-process input data
  • The third one replaces SecondStagePostprocessor with a DetectionOutput layer

The second replacer is defined using the following configuration that matches sub-graph by points:

{
    "custom_attributes": {
        "nms_threshold": 0.7,
        "feat_stride": 16,
        "max_proposals": 100,
        "anchor_base_size": 256,
        "anchor_scales": [0.25, 0.5, 1.0, 2.0],
        "anchor_aspect_ratios": [0.5, 1.0, 2.0],
        "roi_spatial_scale": 0.0625
    },
    "id": "TFObjectDetectionAPIFasterRCNNProposalAndROIPooling",
    "include_inputs_to_sub_graph": true,
    "include_outputs_to_sub_graph": true,
    "instances": {
        "end_points": [
            "CropAndResize",
            "map_1/TensorArrayStack/TensorArrayGatherV3",
            "map_1/while/strided_slice/Enter",
            "BatchMultiClassNonMaxSuppression/map/TensorArrayStack_4/TensorArrayGatherV3"
        ],
        "start_points": [
            "FirstStageBoxPredictor/concat",
            "FirstStageBoxPredictor/concat_1",
            "GridAnchorGenerator/Identity",
            "Shape",
            "CropAndResize"
        ]
    },
    "match_kind": "points"
}

The start_points list contains the following nodes:

  • FirstStageBoxPredictor/concat node produces box coordinates predictions.
  • FirstStageBoxPredictor/concat_1 node produces classes predictions which will be used for the ROIs.
  • GridAnchorGenerator/Identity node produces anchors coordinates.
  • Shape and CropAndResize nodes are specified as inputs to correctly isolate the required sub-graph. Refer to the Sub-Graph Replacement in the Model Optimizer chapter for more information about replacements by points.

The end_points list contains the following nodes:

  • CropAndResize is the node that performs ROI pooling operation.
  • map_1/TensorArrayStack/TensorArrayGatherV3, map_1/while/strided_slice/Enter and BatchMultiClassNonMaxSuppression/map/TensorArrayStack_4/TensorArrayGatherV3 are specified to correctly isolate the sub-graph.

The custom_attributes dictionary contains attributes where most values are taken from the topology-specific configuration file samples/configs/faster_rcnn_*.config of the TensorFlow Object Detection API repository:

  • nms_threshold is the value of the first_stage_nms_iou_threshold parameter.
  • feat_stride is the value of the height_stride and width_stride parameters. Inference Engine supports case when these two values are equal that is why the replacement configuration file contains just one parameter.
  • max_proposals is the value of the max_total_detections parameter which is a maximum number of proposal boxes from the Proposal layer and detected boxes.
  • anchor_base_size is the base size of the generated anchor. The 256 is the default value for this parameter and it is not specified in the configuration file.
  • anchor_scales is the value of thescales attrbite.
  • anchor_aspect_ratios is the value of the aspect_ratios attribute.
  • roi_spatial_scale is needed for the Inference Engine ROIPooling layer. It is the default value that is not actually used.

The identifier for this replacer is TFObjectDetectionAPIFasterRCNNProposalAndROIPooling. The Python implementation of this replacer is in the file <INSTALL_DIR>/deployment_tools/model_optimizer/extensions/front/tf/FasterRCNNs.py.

The first four functions of the replacer class are the following:

class TFObjectDetectionAPIFasterRCNNProposalAndROIPooling(FrontReplacementFromConfigFileSubGraph):
    """
    This class replaces sub-graph of operations with Proposal and ROIPooling layers and additional layers transforming
    tensors from layout of TensorFlow to layout required by Inference Engine.
    Refer to comments inside the function for more information about performed actions.
    """
    replacement_id = 'TFObjectDetectionAPIFasterRCNNProposalAndROIPooling'
    def run_after(self):
        return [PreprocessorReplacement]
    def run_before(self):
        return [SecondStagePostprocessorReplacement]
    def output_edges_match(self, graph: nx.DiGraph, match: SubgraphMatch, new_sub_graph: dict):
        return {match.output_node(0)[0].id: new_sub_graph['roi_pooling_node'].id}
    def nodes_to_remove(self, graph: nx.MultiDiGraph, match: SubgraphMatch):
        new_list = match.matched_nodes_names().copy()
        # do not remove nodes that produce box predictions and class predictions
        new_list.remove(match.single_input_node(0)[0].id)
        new_list.remove(match.single_input_node(1)[0].id)
        return new_list

The function run_after returns list of Python classes inherited from one of the replacer classes (FrontReplacementOp, FrontReplacementPattern, FrontReplacementFromConfigFileSubGraph etc) those current sub-graph replacement class must be run after. In this case the replacer must be run after the Preprocessor is removed by the PreprocessorReplacement replacer. Similar way the run_before function is used to tell Model Optimizer to execute SecondStagePostprocessorReplacement before this replacer.

The output_edges_match function describes matching between the output nodes of the sub-graph before replacement and after. In this case, the only needed output node of the sub-graph is the CropAndResize node which is identified with match.output_node(0)[0]. The new output node which is created in the generate_sub_graph function is identified with new_sub_graph['roi_pooling_node'].

The nodes_to_remove function takes the default list of nodes to be removed which contains all matched nodes and remove from them two input nodes which are identified with match.single_input_node(0)[0] and match.single_input_node(1)[0]. These nodes will be connected as inputs to new nodes being generated in the generate_sub_graph function so they should node be removed.

The code generating new sub-graph is the following:

def generate_sub_graph(self, graph: nx.MultiDiGraph, match: SubgraphMatch):
    log.debug('TFObjectDetectionAPIFasterRCNNProposal: matched_nodes = {}'.format(match.matched_nodes_names()))
    config_attrs = match.custom_replacement_desc.custom_attributes
    nms_threshold = config_attrs['nms_threshold']
    feat_stride = config_attrs['feat_stride']
    max_proposals = config_attrs['max_proposals']
    anchor_base_size = config_attrs['anchor_base_size']
    roi_spatial_scale = config_attrs['roi_spatial_scale']
    proposal_ratios = config_attrs['anchor_aspect_ratios']
    proposal_scales = config_attrs['anchor_scales']
    anchors_count = len(proposal_ratios) * len(proposal_scales)

This code gets parameters defined in the sub-graph replacement configuration file and calculates initial anchors count.

# get the ROIPool size from the CropAndResize which performs the same action
if 'CropAndResize' not in graph.nodes():
    raise Error('Failed to find node with name "CropAndResize" in the topology. Probably this is not Faster'
                ' RCNN topology or it is not supported')
roi_pool_size = Node(graph, 'CropAndResize').in_node(3).value[0]

The code above gets the ROIPooling spatial output dimension size as a value from the fourth argument of the node with name CropAndResize.

# Convolution/matmul node that produces classes predictions
# Permute result of the tensor with classes permissions so it will be in a correct layout for Softmax
predictions_node = match.single_input_node(1)[0].in_node(0).in_node(0)
permute_predictions_op = Permute(graph, {'order': np.array([0, 2, 3, 1])})
permute_predictions_node = permute_predictions_op.create_node([], dict(name=predictions_node.name + '/Permute_'))
insert_node_after(predictions_node, permute_predictions_node, 0)
reshape_classes_op = Reshape(graph, {'dim': np.array([0, -1, 2])})
reshape_classes_node = reshape_classes_op.create_node([permute_predictions_node],
                                                      dict(name='Reshape_FirstStageBoxPredictor_Class_'))
update_attrs(reshape_classes_node, 'shape_attrs', 'dim')
softmax_conf_op = Softmax(graph, {'axis': 1})
softmax_conf_node = softmax_conf_op.create_node([reshape_classes_node],
                                                dict(name='FirstStageBoxPredictor_SoftMax_Class_'))

The output with class predictions from the FirstStageBoxPredictor is generated with a convolution operation. The convolution output data layout in TensorFlow is NHWC while Inference Engine uses NCHW layout. Model Optimizer by default converts the weights of TensorFlow convolutions to produce output tensor in NCHW layout required by Inference Engine. The issue arises because the class predictions tensor is passed through the Softmax operation to produce class probabilities. The Inference Engine Softmax is performed over the fastest-changing dimension which is 'W' in Inference Engine. Thus, the Softmax operation will be performed over a wrong dimension after conversion of the convolution node producing classes predictions. The solution is to add Permute and Reshape operations to prepare the input data for Softmax. The Reshape operation is required to make the size of the fastest-changing dimension equal to 2, because there are two classes being predicted: background and foreground.

Another issue is that layout of elements in the predicted classes tensor is different between TensorFlow and Inference Engine Proposal layer requirements. In TensorFlow, the tensor has the following virtual layout [N, H, W, num_anchors, num_classes], while the Inference Engine Proposal layer requires in the following virtual layout [N, num_classes, num_anchors, H, W]. Thus, it is necessary to reshape, permute and then reshape again output from the Softmax to the required shape for the Proposal layer:

reshape_softmax_op = Reshape(graph, {'dim': np.array([1, anchors_count, 2, -1])})
reshape_softmax_node = reshape_softmax_op.create_node([softmax_conf_node], dict(name='Reshape_Softmax_Class_'))
update_attrs(reshape_softmax_node, 'shape_attrs', 'dim')
permute_reshape_softmax_op = Permute(graph, {'order': np.array([0, 1, 3, 2])})
permute_reshape_softmax_node = permute_reshape_softmax_op.create_node([reshape_softmax_node],
                                                                      dict(name='Permute_'))
# implement custom reshape infer function because we need to know the input convolution node output dimension
# sizes but we can know it only after partial infer
reshape_permute_op = Reshape(graph, {'dim': np.ones([4]), 'anchors_count': anchors_count,
                                     'conv_node': predictions_node})
reshape_permute_op.attrs['old_infer'] = reshape_permute_op.attrs['infer']
reshape_permute_op.attrs['infer'] = __class__.classes_probabilities_reshape_shape_infer
reshape_permute_node = reshape_permute_op.create_node([permute_reshape_softmax_node],
                                                      dict(name='Reshape_Permute_Class_'))
update_attrs(reshape_permute_node, 'shape_attrs', 'dim')

The Proposal layer has three inputs: classes probabilities, boxes predictions, and an input shape of the image. The first two tensors are ready so it is necessary to create the Const operation that produces the desired third input tensor.

# create constant input with the image height, width and scale H and scale W (if present) required for Proposal
const_value = np.array([[input_height, input_width, 1]], dtype=np.float32)
const_op = Const(graph, dict(value=const_value, shape=const_value.shape))
const_node = const_op.create_node([], dict(name='Proposal_const_image_size_'))

Now add the Proposal layer:

proposal_op = ProposalOp(graph, dict(min_size=10, framework='tensorflow', box_coordinate_scale=10,
                                     box_size_scale=5, post_nms_topn=max_proposals, feat_stride=feat_stride,
                                     ratio=proposal_ratios, scale=proposal_scales, base_size=anchor_base_size,
                                     pre_nms_topn=2**31 - 1,
                                     nms_thresh=nms_threshold))
proposal_node = proposal_op.create_node([reshape_permute_node,
                                         match.single_input_node(0)[0].in_node(0).in_node(0),
                                         const_node],
                                        dict(name=proposal_op.attrs['type'] + '_'))

The box coordinates in the TensorFlow are in the following layout "YXYX" while Inference Engine uses "XYXY" layout so it is necessary to swap coordinates produced by Proposal layer. It is implemented with help of a convolution node with a special filter of a size [5, 5]:

proposal_reshape_4d_op = Reshape(graph, {'dim': np.array([max_proposals, 1, 1, 5])})
proposal_reshape_4d_node = proposal_reshape_4d_op.create_node([proposal_node], dict(name="reshape_4d_"))
update_attrs(proposal_reshape_4d_node, 'shape_attrs', 'dim')
# create convolution node to swap X and Y coordinates in the proposals
conv_filter_const_data = np.array(np.array([[1, 0, 0, 0, 0],
                                            [0, 0, 1, 0, 0],
                                            [0, 1, 0, 0, 0],
                                            [0, 0, 0, 0, 1],
                                            [0, 0, 0, 1, 0]],
                                           dtype=np.float32).reshape([1, 1, 5, 5]), dtype=np.float32)
conv_filter_const_op = Const(graph, dict(value=conv_filter_const_data, spatial_dims=np.array([2, 3])))
conv_filter_const_node = conv_filter_const_op.create_node([], dict(name="conv_weights"))
conv_op = Op(graph, {
                'op': 'Conv2D',
                'bias_addable': False,
                'spatial_dims': np.array([1, 2]),
                'channel_dims': np.array([3]),
                'batch_dims': np.array([0]),
                'pad': None,
                'pad_spatial_shape': None,
                'input_feature_channel': 2,
                'output_feature_channel': 2,
                'output_shape': [max_proposals, 1, 1, 5],
                'dilation': np.array([1, 1, 1, 1], dtype=np.int64),
                'stride': np.array([1, 1, 1, 1]),
                'type': 'Convolution',
                'group': None,
                'layout': 'NHWC',
                'infer': __class__.fake_conv_shape_infer})
predictions_node = conv_op.create_node([proposal_reshape_4d_node, conv_filter_const_node], dict(name="conv_"))
update_ie_fields(graph.node[predictions_node.id])
proposal_reshape_2d_op = Reshape(graph, {'dim': np.array([max_proposals, 5])})
proposal_reshape_2d_node = proposal_reshape_2d_op.create_node([predictions_node], dict(name="reshape_2d_"))
# set specific name for this Reshape operation so we can use it in the DetectionOutput replacer
proposal_reshape_2d_node['name'] = 'swapped_proposals'

The ROIPooling layer in TensorFlow is implemented with operation called CropAndResize with bi-linear filtration. Inference Engine implementation of the ROIPooling layer with bi-linear filtration requires input boxes coordinates be scaled to [0, 1] interval. Adding element-wise multiplication of box coordinates solves this issue:

# the TF implementation of Proposal with bi-linear filtration need proposals scaled by image size
proposal_scale_const = np.array([1.0, 1 / input_height, 1 / input_width, 1 / input_height, 1 / input_width],
                                dtype=np.float32)
proposal_scale_const_op = Const(graph, dict(value=proposal_scale_const, shape=proposal_scale_const.shape))
proposal_scale_const_node = proposal_scale_const_op.create_node([], dict(name='Proposal_scale_const_'))
scale_proposals_op = Eltwise(graph, {'operation': 'mul'})
scale_proposals_node = scale_proposals_op.create_node([proposal_reshape_2d_node, proposal_scale_const_node],
                                                      dict(name='scale_proposals_'))

The last step is to create the ROIPooling node with two inputs: the identified feature maps from the FirstStageFeatureExtractor and the scaled output of the Proposal layer:

feature_extractor_output_nodes = scope_output_nodes(graph, 'FirstStageFeatureExtractor')
if len(feature_extractor_output_nodes) != 1:
    raise Error("Failed to determine FirstStageFeatureExtractor output node to connect it to the ROIPooling."
                "Found the following nodes: {}".format([node.name for node in feature_extractor_output_nodes]))
roi_pooling_op = ROIPooling(graph, dict(method="bilinear", framework="tensorflow",
                                        pooled_h=roi_pool_size, pooled_w=roi_pool_size,
                                        spatial_scale=roi_spatial_scale))
roi_pooling_node = roi_pooling_op.create_node([feature_extractor_output_nodes[0], scale_proposals_node],
                                              dict(name='ROI_Pooling_'))
return {'roi_pooling_node': roi_pooling_node}

The are two additional methods implemented in the replacer class:

  • The fake_conv_shape_infer is a simple infer function for the convolution that permutes X and Y coordinates of the Proposal output, which avoids setting a lot of internal attributes required for proper shape inference.
  • The classes_probabilities_reshape_shape_infer function is used to update the output dimension of the reshape operation. The output spatial dimensions depends on the convolution output spatial dimensions thus they are not known until the shape inference pass which is performed after this sub-graph replacement class. So this custom infer function is called instead of default Reshape shape inference function, updates the required attribute dim of the node with the convolution output spatial dimensions which are known at the time of calling this inference function and then call the default Reshape inference function.
@staticmethod
def fake_conv_shape_infer(node: Node):
    node.out_node(0).shape = node.in_node(0).shape
    # call functions to update internal attributes required for correct IR generation
    mark_input_bins(node)
    assign_dims_to_weights(node.in_node(1), [0, 1], node.input_feature_channel, node.output_feature_channel, 4)
@staticmethod
def classes_probabilities_reshape_shape_infer(node: Node):
    # now we can determine the reshape dimensions from Convolution node
    conv_node = node.conv_node
    conv_output_shape = conv_node.out_node().shape
    # update desired shape of the Reshape node
    node.dim = np.array([0, conv_output_shape[1], conv_output_shape[2], node.anchors_count * 2])
    node.old_infer(node)

The second replacer defined in the sub-graph replacement configuration file replaces the SecondStagePostprocessor block and is defined using scope:

{
    "custom_attributes": {
        "code_type": "caffe.PriorBoxParameter.CENTER_SIZE",
        "confidence_threshold": 0.01,
        "keep_top_k": 300,
        "nms_threshold": 0.6,
        "pad_mode": "caffe.ResizeParameter.CONSTANT",
        "resize_mode": "caffe.ResizeParameter.WARP",
        "max_detections_per_class": 100,
        "num_classes": 90
    },
    "id": "SecondStagePostprocessorReplacement",
    "inputs": [
        [
            {
                "node": "Reshape$",
                "port": 0
            }
        ],
        [
            {
                "node": "Reshape_1$",
                "port": 0
            }
        ],
        [
            {
                "node": "ExpandDims$",
                "port": 0
            }
        ]
    ],
    "instances": [
        ".*SecondStagePostprocessor/"
    ],
    "match_kind": "scope",
    "outputs": [
        {
            "node": "BatchMultiClassNonMaxSuppression/map/TensorArrayStack/TensorArrayGatherV3$",
            "port": 0
        }
    ]
}

The replacement code is similar to the SecondStagePostprocessor replacement for the SSDs topologies. The are two major difference:

  • The tensor with bounding boxes does not contain locations for class 0 (background class), but Inference Engine DetectionOutput layer requires it. The Const node with some dummy values are created and concatenated with the tensor.
  • The priors tensor is not constant like in SSDs so the bounding boxes tensor must be scaled with variances [0.1, 0.1, 0.2, 0.2].

The differences described above are resolved with the following code:

# TF produces locations tensor without boxes for background.
# Inference Engine DetectionOutput layer requires background boxes so we generate them with some values
# and concatenate with locations tensor
fake_background_locs_blob = np.tile([[[1, 1, 2, 2]]], [max_detections_per_class, 1, 1])
fake_background_locs_const_op = Const(graph, dict(value=fake_background_locs_blob,
                                                  shape=fake_background_locs_blob.shape))
fake_background_locs_const_node = fake_background_locs_const_op.create_node([])
reshape_loc_op = Reshape(graph, {'dim': np.array([max_detections_per_class, num_classes, 4])})
reshape_loc_node = reshape_loc_op.create_node([match.single_input_node(0)[0].in_node(0)],
                                              dict(name='Reshape_loc_'))
concat_loc_op = Concat(graph, {'axis': 1})
concat_loc_node = concat_loc_op.create_node([fake_background_locs_const_node, reshape_loc_node],
                                            dict(name='Concat_fake_loc_'))
# blob with variances
variances_blob = np.array([0.1, 0.1, 0.2, 0.2])
variances_const_op = Const(graph, dict(value=variances_blob, shape=variances_blob.shape))
variances_const_node = variances_const_op.create_node([])
# reshape locations tensor to 2D so it could be passed to Eltwise which will be converted to ScaleShift
reshape_loc_2d_op = Reshape(graph, {'dim': np.array([-1, 4])})
reshape_loc_2d_node = reshape_loc_2d_op.create_node([concat_loc_node], dict(name='reshape_locs_2d_'))
# element-wise multiply locations with variances
eltwise_locs_op = Eltwise(graph, {'operation': 'mul'})
eltwise_locs_node = eltwise_locs_op.create_node([reshape_loc_2d_node, variances_const_node],
                                                dict(name='scale_locs_'))
Example of Model Optimizer Command-Line for TensorFlow Faster R-CNNs

The final command line to convert Faster R-CNNs from the TensorFlow* Object Detection Zoo is the following:

./mo.py --input_model= --output=detection_boxes,detection_scores,num_detections --tensorflow_use_custom_operations_config extensions/front/tf/legacy_faster_rcnn_support.json

Note that there are minor changes that should be made to the and sub-graph replacement configuration file <INSTALL_DIR>/deployment_tools/model_optimizer/extensions/front/tf/legacy_faster_rcnn_support.json before converting particular Faster R-CNN topology. Refer to the table below.

Sub-Graph Replacement Configuration File Parameters to Convert Different Faster R-CNN Models
Model NameConfiguration File Changes
faster_rcnn_inception_v2_cocoNone
faster_rcnn_resnet50_cocoNone
faster_rcnn_resnet50_lowproposals_cocoNone
faster_rcnn_resnet101_cocoNone
faster_rcnn_resnet101_lowproposals_cocoNone
faster_rcnn_inception_resnet_v2_atrous_coco"feat_stride: 8"
faster_rcnn_inception_resnet_v2_atrous_lowproposals_coco"feat_stride: 8"

MXNet* Models with Custom Layers

There are two options to convert your MXNet* model that contains custom layers:

  • Register the custom layers as extensions to the Model Optimizer. For instructions, see Extending MXNet Model Optimizer with New Primitives. When your custom layers are registered as extensions, the Model Optimizer generates a valid and optimized Intermediate Representation. You can create Model Optimizer extensions for both MXNet layers with op Custom and layers which are not standard MXNet layers.
  • If you have sub-graphs that should not be expressed with the analogous sub-graph in the Intermediate Representation, but another sub-graph should appear in the model, the Model Optimizer provides such option. In MXNet, the function is actively used for SSD* models and provides an opportunity to replace the necessary subgraph sequences. To read more, see Sub-Graph Replacement in the Model Optimizer.

Extending the MXNet Model Optimizer with New Primitives

  1. Create the file custom_proposal_ext.py in the folder <INSTALL_DIR>/deployment_tools/model_optimizer/extensions/front/mxnet. If your MXNet layer has op Custom, create the CustomProposalFrontExtractor class inherited from MXNetCustomFrontExtractorOp:
    from mo.front.extractor import MXNetCustomFrontExtractorOp
    class CustomProposalFrontExtractor(MXNetCustomFrontExtractorOp):
    pass

    Otherwise, for layers that are not standard MXNet layers, create the ProposalFrontExtractor class inherited from FrontExtractorOp:

    from mo.front.extractor import FrontExtractorOp
    class ProposalFrontExtractor(FrontExtractorOp):
        pass
  2. Specify the operation that the extractor refers to and a specific flag. The flag represents whether the operation should be used by the Model Optimizer or should be excluded from processing:
    from mo.front.extractor import MXNetCustomFrontExtractorOp
    class CustomProposalFrontExtractor(MXNetCustomFrontExtractorOp):
    op = '_contrib_Proposal'
    enabled = True
  3. Register a mapping rule between the original model and the PythonProposalOp attributes by overriding the following function:
    from mo.front.mxnet.extractors.utils import get_mxnet_layer_attrs
    from mo.front.extractor import MXNetCustomFrontExtractorOp
    from mo.ops.op import Op
    
    class CustomProposalFrontExtractor(MXNetCustomFrontExtractorOp):
        op = '_contrib_Proposal'
        enabled = True
        @staticmethod
        def extract(node):
        attrs = get_mxnet_layer_attrs(node.symbol_dict)
            node_attrs = {
                'feat_stride': attrs.float('feat_stride', 16)
            }
    
            # update the attributes of the node
            Op.get_op_class_by_name('Proposal').update_node_stat(node, node_attrs) # <------ here goes the name ('Proposal') of the Operation that was implemented before
            return __class__.enabled
    

Generate Extensions with the extgen Tool

The Intel® Distribution of OpenVINO™ toolkit provides the extgen tool that facilitates creating Model Optimizer and Inference Engine extensions. The tool generates extension source files with stubs for the core functions. To get a workable extension, you should only add your implementation of these functions to the generated files.

Generating Extension Files

To generate extension files, you can run the extgen tool in either of two available modes:

  • Interactive mode - The tool prompts you to input information. To run the interactive mode, use the new command-line option. For example:
    python extgen.py new mo-op	
  • Silent mode - The tool reads the input information from a configuration file. To run the silent mode, pass the configuration file as the only argument. For example:
    python extgen.py config.extgen.json

You can find the sample configuration file in the extgen tool directory: <INSTALL_DIR>/deployment_tools/extension_generator/config.extgen.json.example.

To run the tool in the interactive mode, specify the following parameters:

  • mo-op - To generate a Model Optimizer operation
  • mo-caffe-ext - To generate a Model Optimizer Caffe* extractor
  • mo-mxnet-ext - To generates a Model Optimizer MXNet* extractor
  • mo-tf-ext to generate a Model Optimizer TensorFlow* extractor
  • ie-cpu-ext - To generate an Inference Engine CPU extension
  • ie-gpu-ext - To generate an Inference Engine GPU extension
  • output_dir - To set an output directory. If not specified, the current directory is used by default.

You can use any combination of the parameters to generate Model Optimizer and/or Inference Engine extension files. For example:

python extgen.py new mo-caffe-ext mo-op ie-cpu-ext
Generating Model Optimizer Extension Files

To generate Model Optimizer Extension files, run the tool in the interactive mode with necessary parameters or in the silent mode with the configuration file. For example, to generate operation and extractor files for a Caffe model in the <output_directory> in the interactive mode:

python extgen.py new mo-op mo-caffe-ext <output_dir>

The extension stub files are generated in the <output_dir>/user_mo_extensions directory, which has the following structure:

/front
	    | caffe - Folder with Caffe extractors
	    | mxnet - Folder with MXNet extractors
/ops - Folder with operation files

Specific paths to the generated files appear on the screen. For example, for the Caffe Proposal layer, the files are <output_dir>/user_mo_extensions/front/caffe/proposal_ext.py and <output_dir>/user_mo_extensions/ops/proposal.py.

Usually, you can use an extractor file without changes. Exceptions are the cases when you want to transform parameters from a Caffe prototxt file in the IR. In this case, you should add these transformations to the extract method. Do not forget to add parameter names to the supported_attrs and backend_attrs methods to the operation file.

An operation file can be used without changes if your layer does not change the shape. Otherwise, you should implement the shape calculation in the <op_name>_infer method. Also, you can add default values to the __init__ method. You can find more details in the Extending the Model Optimizer with New Primitives.

Generating Inference Engine Extension Files

To generate stub files for GPU or CPU Inference Engine extensions, run the tool and provide input information interactively or in the configuration file. For example, to generate an Inference Engine CPU extension files in the <output_directory> in the interactive mode:

python extgen.py new mo-op mo-caffe-ext <output_dir>

The extension stub files are generated in the <output_dir>/user_ie_extensions directory.

For CPU, several files are generated in the cpu subdirectory. You must change only <output_dir>/user_ie_extensions/cpu/ext_<op_name>.cpp with adding inference implementation.

For GPU, <op_name>.cl and <op_name>.xml are generated in the gpu subdirectory. You must update both:

  • In <op_name>.cl, implement an OpenCL™ kernel to infer the model.
  • In <op_name>.xml, fill information about input/output buffers and worksize for your kernel.

More details about implementing Inference Engine extensions see in Inference Engine Kernels Extensibility.

Example of Creating a Custom Layer Extension Using extgen

This section provides step-by-step examples of extension generation for conversion Caffe and TensorFlow models. The Caffe example describes the Inference Engine extension creation. The TensorFlow example uses existing Inference Engine operation. If you need Inference Engine extension to infer a TensorFlow-based model, look at steps 6-7 in Caffe example, because Inference Engine extension generation does not depend on the framework is it based on.

Caffe* Example

This section provides a sample for generating and implementing Model Optimizer and Inference Engine custom layer extensions for the Proposal layer of a Caffe example model. The model (.prototxt and .caffemodel) used in the example is described in the Extending the Model Optimizer with New Primitives chapter.

  1. Go to folder with extgen tool:
    cd <INSTALL_DIR>\deployment_tools\extension_generator\
  2. Run the extgen.py file with the following parameters to generate extension stub files:
    python extgen.py new mo-caffe-ext mo-op ie-cpu-ext
  3. The tool asks you to provide input information to generate accurate stub files for extensions. Questions and sample answers are the following:
    • For generating stub files for Caffe extractor file:
      Is your layer Pythonic (y/n)?   y
      Please enter module name:   rpn.proposal_layer
      Please enter layer name:   ProposalLayer	
    • For generating a Model Optimizer operation file:
      Please enter operation name:    Proposal
      Does your operation change shape? (y/n)    y
    • For generating an Inference Engine CPU extension:
      Please enter operation name:    Proposal
      Please enter all parameters in format
      <param1> <type>
      <param2> <type>
      etc
      Supported cpu types: int, bool, listint, float, listfloat, string
      When you finish please enter 'q'
      feat_stride int
      post_nms_topn int
      q
    Find the generated files in the user_mo_extensions and user_ie_extensions directories:
    • /user_mo_extensions
      • __init__.py
      • /front
        • /caffe
          • __init__.py
          • proposallayer_ext.py
        • /mxnet
          • __init__.py
      • /ops
        • __init__.py
        • proposal.py
    • /user_ie_extensions
      • /cpu
        • CMakeLists.txt
        • ext_base.cpp
        • ext_base.hpp
        • ext_lists.cpp
        • ext_lists.hpp
        • ext_proposal.cpp
      • /gpu
  4. Implement extension functions in the generated files:
    1. Extractor proposallayer_ext.py can be used without changes.
    2. Add the shape calculation logic to the operation file proposal.py. According to IR catalog, the Proposal layer shape dynamically depends on the post_nms_topn parameter.
      Add this parameter with the default value to __init__ method:
      def __init__(self, graph, attrs):
      		        mandatory_props = dict(
      		            type=__class__.op,
      		            op=__class__.op,
      		            post_nms_topn=300,
      		            infer=ProposalPythonOp.infer
      		        )
      		        super().__init__(graph, mandatory_props, attrs)
    3. Add supported attributes to the method supported_attrs:
      def supported_attrs(self):
              # =====================================
              # List all attributes of the layer
              # all other attributes that are not in
              # the list are ignored
              # =====================================
              return [
                  'feat_stride',
                  'post_nms_topn'
              ]
      
    4. Add shape calculation to infer function:
      @staticmethod
      def infer(node):
      input_shape = node.in_node(0).shape
      out_shape = np.array([0, 0], dtype=np.int64)
      # rois blob: holds R regions of interest, each is a 5 - tuple
      # (n, x1, y1, x2, y2) specifying an image batch index n and a
      # rectangle(x1, y1, x2, y2)
      out_shape[0] = input_shape[0] * node.post_nms_topn
      out_shape[1] = 5
      node.out_node(0).shape = out_shape
    5. Implement the output calculation formula in Python*:
      @staticmethod
          def infer(node):
              input_shape = node.in_node(0).shape
              out_shape = np.array([0, 0], dtype=np.int64)
              # rois blob: holds R regions of interest, each is a 5 - tuple
              # (n, x1, y1, x2, y2) specifying an image batch index n and a
              # rectangle(x1, y1, x2, y2)
              out_shape[0] = input_shape[0] * node.post_nms_topn
              out_shape[1] = 5
              node.out_node(0).shape = out_shape
  5. Once these steps are completed, the Model Optimizer extension is ready to use. To run the Model Optimizer with this extension, use the following command line:
    python mo.py --input_model ZF_faster_rcnn_final.caffemodel --input_proto test.prototxt --extensions ./user_mo_extensions/
    

    For details on the sample model .caffemodel and .prototxt files, refer to the Extending the Model Optimizer with New Primitives section.

  6. To complete the CPU Inference Engine extension creation, add the implementation of the Proposal layer inference to the execute method in the ext_proposal.cpp file. You can find sample code for this extension in the <INSTALL_DIR>/deployment_tools/inference_engine/samples/extension/ext_proposal.cpp file. For more information about implementation of Inference Engine extensions, refer to Inference Engine Kernels Extensibility.
  7. Build a library with CPU extension to use it with the Inference Engine:
    1. Create a new build directory:
      mkdir build
      
    2. Go to the created build directory:
      cd ./build
      
    3. Set the environment variables:
      • on Linux* OS:
        source <INSTALL_DIR>/bin/setupvars.sh
      • on Windows* OS:
        <INSTALL_DIR>/bin/setupvars.bat
    4. Run CMake to generate the Make files:
      • on Linux OS:
        cmake ..
      • on Windows OS:
         cmake -G "<VisualStudio_version>"..
    5. Build the library:
      • on Linux OS:
        make
      • on Windows OS: use the generated Microsoft Visual Studio* project

TensorFlow* Example

This section provides an example for generating and implementing Model Optimizer extension on TensorFlow* example model.

If you already have a model with unrecognized operation, you can omit Model Preparation and go to Extension Generation chapter.

In the example, the Pooling layer will be used to illustrate extension generation. The Model Optimizer already supports this layer, but you will remove it to illustrate how it can be created with the extgen tool. This process is described in the Model Preparation chapter.

For information on how to generate Operation and Inference Engine extension, refer to Caffe Example as generation does not depend on framework. This chapter explain only how to generate TensorFlow extractor.

Model Preparation
  1. Download the pre-trained model ResNet-50 from TensorFlow* Model Zoo. Follow the instructions in Using the Model Optimizer to Convert TensorFlow* Models to prepare the model for converting.
  2. If you try to convert the ResNet-50 model, it will be converted successfully. To demonstrate extension generation, remove the existing implementation of Pooling layer from the Model Optimizer:
    cd <INSTALL_DIR>/deploymment_tools/model_optimizer	move extensions/front/tf/pooling_ext.py extensions/front/tf/pooling_ext.py_del
  3. Run the Model Optimizer to be sure that MaxPool has become an unrecognized operation:
    python mo.py --input_model resnet50.pb --input_shape [1,3,224,224]

    You will see an error:

    [ ERROR ]  List of operations that cannot be converted to IE IR:
    [ ERROR ]      MaxPool (4)
    [ ERROR ]          resnet50/pool1/MaxPool
    [ ERROR ]          resnet50/block1/unit_3/bottleneck_v1/shortcut/MaxPool
    [ ERROR ]          resnet50/block2/unit_4/bottleneck_v1/shortcut/MaxPool
    [ ERROR ]          resnet50/block3/unit_6/bottleneck_v1/shortcut/MaxPool
    [ ERROR ]  Part of the nodes was not translated to IE. Stopped.
    For more information please refer to Model Optimizer FAQ (<INSTALL_DIR>/deployment_tools/documentation/docs/MO_FAQ.html), question #24.
    	

Now the sample model is ready for extension generation.

Extension Generation
  1. Go to extension generator directory:
    cd <INSTALL_DIR>/deployment_tools/extension_generator
  2. Run the extgen.py file with the following parameters to generate extension stub files:
    python extgen.py new mo-tf-ext

    The tool asks you to provide input information to generate accurate stub files for extensions. Questions and sample answers are the following:

    Please enter layer name:       Pooling
    Do you want automatically parse all parameters from proto file
       (parameters will be parsed as is, without any renaming or omitting) (y/n)       n
    Please enter all parameters in format
       <param1> <new name1> <type1>
       <param2> <new name2> <type2>
       etc
       where type is one of the following types:
       s - String, i - Int, f - Float, b - Bool, type - DataType, shape - TensorShapeProto,
       padding - Padding type, spatial - Get spatial from dataFormat, channel - Get channel from dataFormat,
       batch - Get batch from dataFormat, list.s - List of strings, list.i - List of ints, list.f - List of floats,
       list.b - list of bools, list.type - list of DataType, list.shape - list of TensorShapeProto,
       if your attribute type is not in list or you want implement your own attribute parsing just omit <type>
       When you finish please enter 'q'
    padding auto_pad padding
    ksize window list.i
    data_format spatial_dims spatial
    strides stride list.i
    q
    Please enter operation name to use with this extractor:       MaxPool
    Please enter class with operation to use with this extractor:       Pooling
    Please enter import path to class with operation:       extensions.ops.pooling

    Find the generated files in the user_mo_extensions directory, which has the following structure:

    • /user_mo_extensions
      • __init__.py
      • /front
        • /caffe
          • __init__.py
        • /mxnet
          • __init__.py
        • /tf
          • __init__.py
          • pooling_ext.py
      • /ops
        • __init__.py
  3. Implement extension functions in the generated files.

    The extractor pooling_ext.py requires additional attribute conversion. Several attributes should be initialized by constants, real values will be calculated during inference. These changes are needed because you use existing operation that was written for several frameworks.

    @staticmethod
    def extract(node):
    proto_layer = node.pb
    param = proto_layer.attr
    attrs = {
        'auto_pad':convert_tf_padding_to_str(param["padding"]),
        'window':param["ksize"].list.i,
        'spatial_dims':tf_data_format_spatial(param["data_format"]),
        'stride':param["strides"].list.i,
        'op': __class__.op
    }
    attrs['window'] = np.array(attrs['window'])
    attrs['pad'] = None
    attrs['stride'] = np.array(attrs['stride'])
    attrs['pad_spatial_shape'] = None
    attrs['output_spatial_shape'] = None
    attrs['pool_method']='max'
    attrs['type'] = 'Pooling'
    attrs['exclude_pad'] = 'true'
    
    # update the attributes of the node
    Op.get_op_class_by_name(__class__.op).update_node_stat(node, attrs)
    
    return __class__.enabled
  4. Once you complete these steps, the Model Optimizer extension is ready to use. To run the Model Optimizer with this extension, use the command line below:
    cd ../model_optimizer
    python mo.py --input_model resnet50.pb --input_shape [1,3,224,224] --extensions ../extension_generator/user_mo_extensions

Conversion should finish successfully.

Advanced Topics about the Model Optimizer Internals

Cutting Off Parts of a Model

In some cases, some parts of a model must be removed while the Model Optimizer is converting models to the Intermediate Representation. This chapter describes methods of cutting off parts of a model using Model Optimizer command-line options. Model cutting applies mostly to TensorFlow* models, but it is also useful for other frameworks.This chapter uses TensorFlow examples for illustration.

Purpose of Model Cutting

The following examples are the situations when model cutting is useful or even required:

  • Model has pre- or post-processing parts that cannot be translated to existing Inference Engine layers.
  • Model has a training part that is convenient to be kept in the model, but not used during inference.
  • Model is too complex (contains lots of unsupported operations that cannot be easily implemented as custom layers), so the complete model cannot be converted in one shot.
  • Model is one of the supported SSD models. In this case, you need to cut a post-processing part off.
  • Problem with model conversion in the Model Optimizer or inference in the Inference Engine occurred. To localize the issue, limit the scope for conversion by iteratively searching for problematic places in the model.
  • Single custom layer or a combination of custom layers is isolated for debugging purposes.

Command-Line Options

Model Optimizer provides command line options --input and --output to specify new entry and exit nodes, while ignoring the rest of the model:

  • --input option accepts a comma-separated list of layer names of the input model that should be treated as new entry points to the model.
  • --output option accepts a comma-separated list of layer names of the input model that should be treated as new exit points from the model.

The --input option is required for cases unrelated to model cutting. For example, when the model contains several inputs and --input_shape or --mean_values options are used, you should use the --input option to specify the order of input nodes for correct mapping between multiple items provided in --input_shape and --mean_values and the inputs in the model. This is out of scope.

Model cutting is illustrated with Inception V1. This model is in models/research/slim repository. This section describes pre-work to prepare the model for the Model Optimizer to be ready to proceed with this chapter.

Default Behavior Without --input and --output

The input model is converted as a whole if neither --input nor --output command line options are used. All Placeholder operations in a TensorFlow* graph are automatically identified as entry points. The Input layer type is generated for each of them. All nodes that have no consumers are automatically identified as exit points.

For Inception_V1, there is one Placeholder: input. If the model is viewed in the TensorBoard*, the input operation is easy to find:

InceptionV1 placeholder

There is only one output operation, which enclosed in a nested name scope InceptionV1/Logits/Predictions, the Reshape operation has a full name InceptionV1/Logits/Predictions/Reshape_1.

In the TensorBoard*, it looks the following way together with some predecessors:

TensorBoard with predecessors

Convert this model:

mo.py --input_model=inception_v1.pb -b 1

The output .xml file with an Intermediate Representation contains the Input layer among other layers in the model:

<layer id="286" name="input" precision="FP32" type="Input">
    <output>
        <port id="0">
            <dim>1</dim>
            <dim>3</dim>
            <dim>224</dim>
            <dim>224</dim>
        </port>
    </output>
</layer>

The input layer is converted from the TensorFlow graph Placeholder operation input and has the same name.

The -b option is used here for conversion to override a possible undefined batch size (coded as -1 in TensorFlow models). If a model was frozen with a defined batch size, you may omit this option in all the examples.

The last layer in the model is InceptionV1/Logits/Predictions/Reshape_1, which matches an output operation in the TensorFlow graph:

<layer id="389" name="InceptionV1/Logits/Predictions/Reshape_1" precision="FP32" type="Reshape">
    <data axis="0" dim="1,1001" num_axes="-1"/>
    <input>
        <port id="0">
            <dim>1</dim>
            <dim>1001</dim>
        </port>
    </input>
    <output>
        <port id="1">
            <dim>1</dim>
            <dim>1001</dim>
        </port>
    </output>
</layer>

Due to automatic identification of inputs and outputs, you do not need to provide the --input and --output options to convert the whole model. The following commands are equivalent for the Inception V1 model:

mo.py --input_model=inception_v1.pb -b 1

mo.py --input_model=inception_v1.pb -b 1 --input=input --output=InceptionV1/Logits/Predictions/Reshape_1

The Intermediate Representations are identical for both conversions. The same is true if the model has multiple inputs and/or outputs.

Model Cutting

Now consider how to cut some parts of the model off. This chapter uses the first convolution block InceptionV1/InceptionV1/Conv2d_1a_7x7 of the Inception V1 model to illustrate cutting:

Cutting at the End

If you want to cut your model at the end, you have the following options:

  1. The following command cuts off the rest of the model after the InceptionV1/InceptionV1/Conv2d_1a_7x7/Relu making this node the last in the model:
    mo.py --input_model=inception_v1.pb -b 1 --output=InceptionV1/InceptionV1/Conv2d_1a_7x7/Relu

    The resulting Intermediate Representation has three layers:

    <?xml version="1.0" ?>
    <net batch="1" name="model" version="2">
        <layers>
            <layer id="3" name="input" precision="FP32" type="Input">
                <output>
                    <port id="0">
                        <dim>1</dim>
                        <dim>3</dim>
                        <dim>224</dim>
                        <dim>224</dim>
                    </port>
                </output>
            </layer>
            <layer id="5" name="InceptionV1/InceptionV1/Conv2d_1a_7x7/convolution" precision="FP32" type="Convolution">
                <data dilation-x="1" dilation-y="1" group="1" kernel-x="7" kernel-y="7" output="64" pad-x="2" pad-y="2" stride="1,1,2,2" stride-x="2" stride-y="2"/>
                <input>
                    <port id="0">
                        <dim>1</dim>
                        <dim>3</dim>
                        <dim>224</dim>
                        <dim>224</dim>
                    </port>
                </input>
                <output>
                    <port id="3">
                        <dim>1</dim>
                        <dim>64</dim>
                        <dim>112</dim>
                        <dim>112</dim>
                    </port>
                </output>
                <blobs>
                    <weights offset="0" size="37632"/>
                    <biases offset="37632" size="256"/>
                </blobs>
            </layer>
            <layer id="6" name="InceptionV1/InceptionV1/Conv2d_1a_7x7/Relu" precision="FP32" type="ReLU">
                <input>
                    <port id="0">
                        <dim>1</dim>
                        <dim>64</dim>
                        <dim>112</dim>
                        <dim>112</dim>
                    </port>
                </input>
                <output>
                    <port id="1">
                        <dim>1</dim>
                        <dim>64</dim>
                        <dim>112</dim>
                        <dim>112</dim>
                    </port>
                </output>
            </layer>
        </layers>
        <edges>
            <edge from-layer="3" from-port="0" to-layer="5" to-port="0"/>
            <edge from-layer="5" from-port="3" to-layer="6" to-port="0"/>
        </edges>
    </net>

    As you can see in the TensorBoard picture, the original model has more nodes than Intermediate Representation. Model Optimizer has fused batch normalization InceptionV1/InceptionV1/Conv2d_1a_7x7/BatchNorm to the convolution InceptionV1/InceptionV1/Conv2d_1a_7x7/convolution, and it is not present in the final Intermediate Representation. This is not an effect of the --output option, it is usual behavior of the Model Optimizer for batch normalization and convolution. The effect of the --output is that the ReLU layer becomes the last one in the converted model.

  2. The following command cuts the edge that comes from 0 output port of the InceptionV1/InceptionV1/Conv2d_1a_7x7/Relu and the rest of the model, making this node the last one in the model:
    mo.py --input_model=inception_v1.pb -b 1 --output=InceptionV1/InceptionV1/Conv2d_1a_7x7/Relu:0
    

    The resulting Intermediate Representation has three layers, which are the same as in the previous case:

    <?xml version="1.0" ?>
    <net batch="1" name="model" version="2">
        <layers>
            <layer id="3" name="input" precision="FP32" type="Input">
                <output>
                    <port id="0">
                        <dim>1</dim>
                        <dim>3</dim>
                        <dim>224</dim>
                        <dim>224</dim>
                    </port>
                </output>
            </layer>
            <layer id="5" name="InceptionV1/InceptionV1/Conv2d_1a_7x7/convolution" precision="FP32" type="Convolution">
                <data dilation-x="1" dilation-y="1" group="1" kernel-x="7" kernel-y="7" output="64" pad-x="2" pad-y="2" stride="1,1,2,2" stride-x="2" stride-y="2"/>
                <input>
                    <port id="0">
                        <dim>1</dim>
                        <dim>3</dim>
                        <dim>224</dim>
                        <dim>224</dim>
                    </port>
                </input>
                <output>
                    <port id="3">
                        <dim>1</dim>
                        <dim>64</dim>
                        <dim>112</dim>
                        <dim>112</dim>
                    </port>
                </output>
                <blobs>
                    <weights offset="0" size="37632"/>
                    <biases offset="37632" size="256"/>
                </blobs>
            </layer>
            <layer id="6" name="InceptionV1/InceptionV1/Conv2d_1a_7x7/Relu" precision="FP32" type="ReLU">
                <input>
                    <port id="0">
                        <dim>1</dim>
                        <dim>64</dim>
                        <dim>112</dim>
                        <dim>112</dim>
                    </port>
                </input>
                <output>
                    <port id="1">
                        <dim>1</dim>
                        <dim>64</dim>
                        <dim>112</dim>
                        <dim>112</dim>
                    </port>
                </output>
            </layer>
        </layers>
        <edges>
            <edge from-layer="3" from-port="0" to-layer="5" to-port="0"/>
            <edge from-layer="5" from-port="3" to-layer="6" to-port="0"/>
        </edges>
    </net>

    This type of cutting is useful to cut edges in case of multiple output edges.

  3. The following command cuts the edge that comes to 0 input port of the InceptionV1/InceptionV1/Conv2d_1a_7x7/Relu and the rest of the model including InceptionV1/InceptionV1/Conv2d_1a_7x7/Relu, deleting this node and making the previous node InceptionV1/InceptionV1/Conv2d_1a_7x7/Conv2D the last in the model:
    mo.py --input_model=inception_v1.pb -b 1 --output=0:InceptionV1/InceptionV1/Conv2d_1a_7x7/Relu
    

    The resulting Intermediate Representation has two layers, which are the same as the first two layers in the previous case:

    <?xml version="1.0" ?>
    <net batch="1" name="inception_v1" version="2">
    <layers>
        <layer id="0" name="input" precision="FP32" type="Input">
            <output>
                <port id="0">
                    <dim>1</dim>
                    <dim>3</dim>
                    <dim>224</dim>
                    <dim>224</dim>
                </port>
            </output>
        </layer>
        <layer id="1" name="InceptionV1/InceptionV1/Conv2d_1a_7x7/Conv2D" precision="FP32" type="Convolution">
            <data auto_pad="same_upper" dilation-x="1" dilation-y="1" group="1" kernel-x="7" kernel-y="7" output="64" pad-b="3" pad-r="3" pad-x="2" pad-y="2" stride="1,1,2,2" stride-x="2" stride-y="2"/>
            <input>
                <port id="0">
                    <dim>1</dim>
                    <dim>3</dim>
                    <dim>224</dim>
                    <dim>224</dim>
                </port>
            </input>
            <output>
                <port id="3">
                    <dim>1</dim>
                    <dim>64</dim>
                    <dim>112</dim>
                    <dim>112</dim>
                </port>
            </output>
            <blobs>
                <weights offset="0" size="37632"/>
                <biases offset="37632" size="256"/>
            </blobs>
        </layer>
    </layers>
    <edges>
        <edge from-layer="0" from-port="0" to-layer="1" to-port="0"/>
    </edges>
    </net>
    

Cutting from the Beginning

If you want to go further and cut the beginning of the model leaving only the ReLU layer, you have the following options:

  1. You can use the following command line, where --input and --output specify the same node in the graph:
    mo.py --input_model=inception_v1.pb -b 1 --output=InceptionV1/InceptionV1/Conv2d_1a_7x7/Relu --input=InceptionV1/InceptionV1/Conv2d_1a_7x7/Relu

    The resulting Intermediate Representation looks as follows:

    <xml version="1.0">
    <net batch="1" name="model" version="2">
        <layers>
            <layer id="0" name="InceptionV1/InceptionV1/Conv2d_1a_7x7/Relu/placeholder_port_0" precision="FP32" type="Input">
                <output>
                    <port id="0">
                        <dim>1</dim>
                        <dim>64</dim>
                        <dim>112</dim>
                        <dim>112</dim>
                    </port>
                </output>
            </layer>
            <layer id="2" name="InceptionV1/InceptionV1/Conv2d_1a_7x7/Relu" precision="FP32" type="ReLU">
                <input>
                    <port id="0">
                        <dim>1</dim>
                        <dim>64</dim>
                        <dim>112</dim>
                        <dim>112</dim>
                    </port>
                </input>
                <output>
                    <port id="1">
                        <dim>1</dim>
                        <dim>64</dim>
                        <dim>112</dim>
                        <dim>112</dim>
                    </port>
                </output>
            </layer>
        </layers>
        <edges>
            <edge from-layer="0" from-port="0" to-layer="2" to-port="0"/>
        </edges>
    </net>

    Input layer is automatically created to feed the layer that is converted from the node specified in --input, which is InceptionV1/InceptionV1/Conv2d_1a_7x7/Relu in this case. Model Optimizer does not replace the ReLU node by the Input layer, it produces such Intermediate Representation to make the node be the first executable node in the final Intermediate Representation. So the Model Optimizer creates enough Inputs to feed all input ports of the node that is passed in --input.

    Even though --input_shape is not specified in the command line, the shapes for layers are inferred from the beginning of the original TensorFlow* model to the point at which the new input is defined. It has the same shape [1,64,112,112] as the model converted as a whole or without cutting off the beginning.

  2. You can use the following command line to cut edge incoming to layer by port number:
    mo.py --input_model=inception_v1.pb -b 1 --output=InceptionV1/InceptionV1/Conv2d_1a_7x7/Relu --input=0:InceptionV1/InceptionV1/Conv2d_1a_7x7/Relu
    

    The resulting Intermediate Representation looks as follows:

    <xml version="1.0">
    <net batch="1" name="model" version="2">
    <layers>
        <layer id="0" name="InceptionV1/InceptionV1/Conv2d_1a_7x7/Relu/placeholder_port_0" precision="FP32" type="Input">
            <output>
                <port id="0">
                    <dim>1</dim>
                    <dim>64</dim>
                    <dim>112</dim>
                    <dim>112</dim>
                </port>
            </output>
        </layer>
        <layer id="2" name="InceptionV1/InceptionV1/Conv2d_1a_7x7/Relu" precision="FP32" type="ReLU">
            <input>
                <port id="0">
                    <dim>1</dim>
                    <dim>64</dim>
                    <dim>112</dim>
                    <dim>112</dim>
                </port>
            </input>
            <output>
                <port id="1">
                    <dim>1</dim>
                    <dim>64</dim>
                    <dim>112</dim>
                    <dim>112</dim>
                </port>
            </output>
        </layer>
    </layers>
    <edges>
        <edge from-layer="0" from-port="0" to-layer="2" to-port="0"/>
    </edges>
    </net>
    

    Input layer is automatically created to feed the layer that is converted from the node specified in --input, which is InceptionV1/InceptionV1/Conv2d_1a_7x7/Relu in this case. Model Optimizer does not replace the ReLU node by the Input layer, it produces such Intermediate Representation to make the node be the first executable node in the final Intermediate Representation. So the Model Optimizer creates enough Inputs to feed all input ports of the node that is passed in --input.

    Even though --input_shape is not specified in the command line, the shapes for layers are inferred from the beginning of the original TensorFlow* model to the point at which the new input is defined. It has the same shape [1,64,112,112] as the model converted as a whole or without cutting off the beginning.

Shape Override for New Inputs

The input shape can be overridden with --input_shape. In this case, the shape is applied to the node referenced in --input, not to the original Placeholder in the model. For example, this command line

mo.py --input_model=inception_v1.pb --input_shape=[1,5,10,20] --output=InceptionV1/InceptionV1/Conv2d_1a_7x7/Relu --input=InceptionV1/InceptionV1/Conv2d_1a_7x7/Relu

gives the following shapes in the Input and ReLU layers:

<layer id="0" name="InceptionV1/InceptionV1/Conv2d_1a_7x7/Relu/placeholder_port_0" precision="FP32" type="Input">
    <output>
        <port id="0">
            <dim>1</dim>
            <dim>20</dim>
            <dim>5</dim>
            <dim>10</dim>
        </port>
    </output>
</layer>
<layer id="3" name="InceptionV1/InceptionV1/Conv2d_1a_7x7/Relu" precision="FP32" type="ReLU">
    <input>
        <port id="0">
            <dim>1</dim>
            <dim>20</dim>
            <dim>5</dim>
            <dim>10</dim>
        </port>
    </input>
    <output>
        <port id="1">
            <dim>1</dim>
            <dim>20</dim>
            <dim>5</dim>
            <dim>10</dim>
        </port>
    </output>
</layer>

The input shape [1,20,5,10] in the final Intermediate Representation differs from the shape [1,5,10,20] specified in the command line, because the original TensorFlow* model uses NHWC layout, but the Intermediate Representation uses NCHW layout. So usual NHWC to NCHW layout conversion occurred.

When --input_shape is specified, shape inference inside the Model Optimizer is not performed for the nodes in the beginning of the model that are not included in the translated region. It differs from the case when --input_shape is not specified as noted in the previous section where the shape inference is still performed for such nodes to deduce shape for the layers that should fall into the final Intermediate Representation. So --input_shape should be used for a model with a complex graph with loops, which are not supported by the Model Optimizer, to exclude such parts from the Model Optimizer shape inference process completely.

Inputs with Multiple Input Ports

There are operations that contain more than one input ports. In the example considered here, the convolution InceptionV1/InceptionV1/Conv2d_1a_7x7/convolution is such operation. When --input_shape is not provided, a new Input layer is created for each dynamic input port for the node. If a port is evaluated to a constant blob, this constant remains in the model and a corresponding input layer is not created. TensorFlow convolution used in this model contains two ports:

  • port 0: input tensor for convolution (dynamic)
  • port 1: convolution weights (constant)

Following this behavior, the Model Optimizer creates Input layer for port 0 only, leaving port 1 as a constant. So the result of:

mo.py --input_model=inception_v1.pb -b 1 --input=InceptionV1/InceptionV1/Conv2d_1a_7x7/convolution

is identical to the result of conversion of the model as a whole, because this convolution is the first executable operation in Inception V1.

Different behavior occurs when --input_shape is also used as an attempt to override the input shape:

mo.py --input_model=inception_v1.pb--input=InceptionV1/InceptionV1/Conv2d_1a_7x7/convolution --input_shape=[1,224,224,3]

An error occurs (for more information, see FAQ #30):

[ ERROR ]  Node InceptionV1/InceptionV1/Conv2d_1a_7x7/convolution has more than 1 input and input shapes were provided.
Try not to provide input shapes or specify input port with PORT:NODE notation, where PORT is an integer.
For more information, see FAQ #30

In the case when --input_shape is specified and the node contains multiple input ports, you need to specify an input port index together with an input node name. The input port index is specified in front of the node name with ':' as a separator (PORT:NODE). In the considered case, the port index 0 of the node InceptionV1/InceptionV1/Conv2d_1a_7x7/convolution should be specified as 0:InceptionV1/InceptionV1/Conv2d_1a_7x7/convolution.

The correct command line is:

mo.py --input_model=inception_v1.pb --input=0:InceptionV1/InceptionV1/Conv2d_1a_7x7/convolution --input_shape=[1,224,224,3]

Model Optimization Techniques

Optimization offers methods to accelerate inference with the convolution neural networks (CNN) that do not require model retraining.

Linear Operation Fusing

Many convolution neural networks include BatchNormalization and ScaleShift layers (for example, Resnet*, Inception*) that can be fused into previous Convolution or FullyConnected layers.

Usage

In the Model Optimizer, this optimization is turned on by default. To disable it, you can pass -–disable_fusing parameter to the Model Optimizer.

Optimization Description

This optimization method consists of three stages:

  1. BatchNormalization and ScaleShift decomposition: on this stage, BatchNormalization layer is decomposed to Mul → Add → Mul → Add sequence, and ScaleShift layer is decomposed to Mul → Add layers sequence.
  2. Linear operations merge: on this stage, we merge sequences of Mul and Add operations to the single Mul → Add instance.
    For example, if we have BatchNormalization → ScaleShift sequence in our topology, it is replaced with Mul → Add (by the first stage). On the next stage, the latter will be replaced with ScaleShift layer if we have no available Convolution or FullyConnected layer to fuse into (next).
  3. Linear operations fusion: on this stage, the tool fuses Mul and Add operations to Convolution or FullyConnected layers. Notice that it searches for Convolution and FullyConnected layers both backward and forward in the graph (except for Add operation that cannot be fused to Convolution layer in forward direction).
Usage Examples

The first picture below shows the depicted part of Caffe* Resnet269 topology where BatchNorm and ScaleShift layers will be fused to Convolution layers shown in the second picture.

Pic.1 Caffe Resnet269 block (from Netscope)

[[{"fid":"639504","view_mode":"default","fields":{"format":"default","field_file_image_alt_text[und][0][value]":"Part of Caffe Resnet269 topology","field_file_image_title_text[und][0][value]":false,"field_style":"","media[field_big_photo]":"","field_big_photo[fid]":"0"},"link_text":null,"type":"media","field_deltas":{"1":{"format":"default","field_file_image_alt_text[und][0][value]":"Part of Caffe Resnet269 topology","field_file_image_title_text[und][0][value]":false,"field_style":"","media[field_big_photo]":"","field_big_photo[fid]":"0"}},"attributes":{"alt":"Part of Caffe Resnet269 topology","height":541,"width":314,"class":"media-element file-default"}}]]

Pic.2 Fused Caffe Resnet269 block (from Netscope)

BatchNorm and ScaleShift layers fused to Convolution layers


Grouped Convolution Fusing

Grouped convolution fusing is a specific optimization that applies for TensorFlow* topologies. The main idea of this optimization is to combine convolutions results for the Split outputs and then recombine them using Concat operation in the same order as they were out from Split (pic.3).

Pic.3 Split→Convolutions→Concat block from TensorBoard*Split->Convolutions->Concat block from TensorBoard


Intermediate Representation Notation Reference Catalog 

Convolution Layer

Name:Convolution

Short description:Reference

Detailed description: Reference

Parameters: Convolution layer parameters should be specified in the convolution_data node, which is a child of the layer node.

  • Parameter name: stride (stride-x, stride-y)
    • Description:stride (stride-x, stride-y) is a distance (in pixels) to slide the filter on the feature map over the (x, y) axis. For example, stride equal 1 (1, 1) means sliding the filter 1 pixel at a time over the (x, y) axis
    • Range of values: integer values starting from 0
  • Parameter name: pad (pad-x, pad-y)
    • Description:pad (pad-x, pad-y) is a number of pixels to add to the left (top) of the input. For example, pad (pad-x, pad-y) equal 1 (1, 1) means adding 1 pixel to the left of the input. Right and bottom padding should be calculated from the expected output width (height)
    • Range of values: integer values starting from 0
  • Parameter name: kernel (kernel-x, kernel-y)
    • Description: kernel (kernel-x, kernel-y) is a width (height) of each filter. For example, kernel (kernel-x, kernel-y) equal 3 (3, 3) means that each filter has width (height) equals 3
    • Range of values: integer values starting from 0
  • Parameter name: output
    • Description:output is a number of output feature maps per whole output (when group > 1, output still matches the number of output features regardless of group value). For example, output equals 1 means that there is 1 output feature map in a layer
    • Range of values: integer values starting from 0
  • Parameter name:group
    • Description: group denotes the number of groups to which output and input should be split. For example, group equal 1 means that all the filters are applied to full input (usual convolution), group equals 2 means that both input and output channels are separated into 2 groups and i-th output group is connected to i-th input group channels. group equals number of output feature maps denotes depth-wise separable convolution (Reference)
    • Range of values: integer values starting from 0
  • Parameter name: dilation (dilation-x, dilation-y)
    • Description: dilation (dilation-x, dilation-y) denotes the distance in width (height) between elements (weights) in the filter. For example, dilation-x and dilation-y equal 1 means that all the elements in the filter are neighbors, so it is the same as for the usual convolution. dilation-x and dilation-y equal 2 means that all the elements in the filter are matched not to adjacent elements in the input matrix, but to those that are adjacent with distance 1
    • Range of values: integer values starting from 0

Weights Layout: Weights layout is GOIYX, which means that X is changing the fastest, then Y, then Input, Output, then Group.

Mathematical Formulation

  • For the convolutional layer, the number of output features in each dimension is calculated using the formula:
  • The receptive field in each layer is calculated using the formulas:
    • Jump in the output feature map:
      \[ j_{out} = j_{in} * s \]
    • Size of the receptive field of output feature:
      \[ r_{out} = r_{in} + \left ( k - 1 \right ) * j_{in} \]
    • Center position of the receptive field of the first output feature:
      \[ start_{out} = start_{in} + \left ( \frac{k - 1}{2} - p \right ) * j_{in} \]
    • Output is calculated using the following formula:
      \[ out = \sum_{i = 0}^{n}w_{i}x_{i} + b \]

Example

<layer ... type="Convolution" ... >
        <convolution_data stride-x="4" stride-y="4" pad-x="0" pad-y="0" kernel-x="11" kernel-y="11" output="96" group="1" dilation-x="2" dilation-y="2"/>
        <input> ... </input>
        <output> ... </output>
        <weights ... />
        <biases ... />
    </layer>

Pooling Layer

Name: Pooling

Short description: Reference

Detailed description: Reference

Parameters: Specify pooling layer parameters in the pooling_data node, which is a child of the layer node.

NOTE: A subset of pooling parameters, in particular, pad-x, pad-y, kernel-x, kernel-y, stride-x, stride-y are described in the  Convolution layer.

  • Parameter name:pool-method
    • Description:pool-method is a type of pooling strategy for values
    • Range of values:
      • max - chooses the biggest value in a feature map for each filter position
      • avg - takes the average value in a feature map for each filter position
  • Parameter name:exclude-pad
    • Description:exclude-pad is a type of pooling strategy for values in the padding area. For example, if exclude-pad is "true", zero-values in the padding are not used
    • Range of values: "true" or "false"
  • Parameter name: rounding_type

    • Description: rounding_type is a type of rounding to be applied.

    • Range of values:
      • ceil

      • floor

Mathematical Formulation

  • For max pool-method
    \[ output_{j} = MAX\left \{ x_{0}, ... x_{i} \right \} \]
  • For avg pool-method:
    \[ output_{j} = \frac{\sum_{i = 0}^{n}x_{i}}{n} \]

Example

<layer ... type="Pooling" ... >
        <pooling_data kernel-x="3" kernel-y="3" pad-x="0" pad-y="0" stride-x="2" stride-y="2" pool-method="max" exclude-pad="true"/>
        <input> ... </input>
        <output> ... </output>
    </layer>

ROIPooling Layer

Name: ROIPooling

Short description: It is a pooling layer with max pooling strategy (see max option in the Pooling layer parameters description). It is used over feature maps of non-uniform sizes and outputs another feature map of a fixed size.

Detailed description: Reference

Parameters: Specify ROIPooling layer parameters in the data node, which is a child of the layer node.

  • Parameter name:pooled_h (pooled_w)
    • Description: pooled_h (pooled_w) is a height of the ROI output feature map. For example, pooled_h (pooled_w) equal 6 means that the height (width) of the output of ROIpooling is 6
    • Range of values: integer values starting from 0
  • Parameter name:spatial_scale
    • Description: spatial_scale is a ratio of the input feature map over the input image size
    • Range of values: positive floating point value

Mathematical Formulation

\[ output_{j} = MAX\left \{ x_{0}, ... x_{i} \right \} \]

Example

<layer ... type="ROIPooling" ... >
        <data pooled_h="6" pooled_w="6" spatial_scale="0.062500"/>
        <input> ... </input>
        <output> ... </output>
    </layer>

FullyConnected Layer

Name: FullyConnected

Short description: Reference

Detailed description: Reference

Parameters: Specify FullyConnected layer parameters in the fc_data node, which is a child of the layer node.

  • Parameter name: out-size
    • Description: out-size is a length of the output vector. For example, out-size equal 4096 means that the output vector length is 4096
    • Range of values: integer values starting from 0

Mathematical Formulation

  • If previous layer is FullyConnected:
    \[ y_{i} = f\left ( z_{i} \right ) \quad with \quad z_{i} = \sum_{j=1}^{m_{1}^{\left ( l-1 \right )}}w_{i,j}^{\left ( l \right )}y_{i}^{\left ( l -1 \right )} \]
  • Otherwise:
    \[ y_{i} = f\left ( z_{i} \right ) \quad with \quad z_{i}^{\left ( l \right )} = \sum_{j=1}^{m_{1}^{\left ( l-1 \right )}}\sum_{r=1}^{m_{2}^{\left ( l-1 \right )}}\sum_{s=1}^{m_{3}^{\left ( l-1 \right )}}w_{i,j,r,s}^{\left ( l \right )}\left ( Y_{i}^{\lef

Example

<layer ... type="FullyConnected" ... >
        <fc_data out-size="4096"/>
        <input> ... </input>
        <output> ... </output>
    </layer>

Weights layout: OI, which means that Input is changing the fastest, then Output.


ReLU Layer

Name: ReLU

Short description: Reference

Detailed description: Reference

Parameters: ReLU layer parameters can be (not mandatory) specified in the data node, which is a child of the layer node.

  • Parameter name: negative_slope
    • Description: negative_slope is a multiplier, which is used if the unit is not active (that is negative). For example, negative_slope equal 0.1 means that an inactive unit value would be multiplied by 0.1 and this is the Leaky ReLU. If negative_slope is equal to 0, this is the usual ReLU
    • Range of values: double values starting from 0

Mathematical Formulation

Example

<layer ... type="ReLU" ... >
    <data negative_slope="0.100000"/>
    <input> ... </input>
    <output> ... </output>
</layer>

Activation Layer

Name: Activation

Short description: Activation layer represents an activation function of each neuron in a layer, which is used to add non-linearity to the computational flow.

Detailed description: Reference

Parameters: Activation layer parameters should be specified in the data node, which is a child of the layer node.

  • Parameter name: type
    • Description: type represents particular activation function. For example, type equal sigmoid means that neurons of this layer have a sigmoid activation function
    • Range of values:
      • sigmoid - Sigmoid activation function. For more information, refer to the Detailed description section.
      • tanh - Tanh activation function. For more information, refer to the Detailed description section
      • elu - Elu activation function. For more information, refer to the Detailed description section.
      • relu6 - Relu6 activation function.

Mathematical Formulation

  • Sigmoid function:
    \[ f\left ( x \right ) = \frac{1}{1+e^{-x}} \]
  • Tahn function:
    \[ f\left ( x \right ) = \frac{2}{1+e^{-2x}} - 1 = 2sigmoid(2x) - 1 \]
  • Elu function:
     Elu function
  • Relu6 function:
     Relu6 function

Example

<layer ... type="Activation" ... >
    <data type="sigmoid" />
    <input> ... </input>
    <output> ... </output>
</layer>

SoftMax Layer

Name: SoftMax

Short description: Reference

Detailed description: Reference

Parameters: SoftMax layer parameters can be (not mandatory) specified in the data node, which is a child of the layer node.

  • Parameter name: axis
    • Description: axis represents the axis of which the SoftMax is calculated. axis equal 1 is a default value
    • Range of values: positive integer values

Mathematical Formulation

\[ y_{c} = \frac{e^{Z_{c}}}{\sum_{d=1}^{C}e^{Z_{d}}} \]

where C is a number of classes

Example

<layer ... type="SoftMax" ... >
    <data axis="1" />
    <input> ... </input>
    <output> ... </output>
</layer>

Deconvolution Layer

Name: Deconvolution

Short description: Deconvolution layer is applied for upsampling the output to the higher image resolution.

Detailed description: Reference

Parameters: Deconvolution layer parameters should be specified in the deconvolution_data node, which is a child of the layer node.

NOTE:Deconvolution layer has the same way of parameters definition in XML as a Convolution layer.

Weights layout: Weights layout is the following: GOIYX, which means that X is changing the fastest, then Y, then Input, Output, then Group.

Mathematical formulation:
Deconvolution is also called transpose convolution and performs operation reverse to convolution.

The number of output features for each dimensions is calculated:
\[ S_{o}=stride\left (S_{i} - 1 \right ) + S_{f} - 2pad \]

Where S is size of output, input, and filter

Output is calculated in the same way as for convolution layer:
\[ out = \sum_{i = 0}^{n}w_{i}x_{i} + b \]

Example

<layer ... type="Deconvolution" ... >
    <deconvolution_data stride-x="2" stride-y="2" pad-x="1" pad-y="1" kernel-x="4" kernel-y="4" output="19" group="1"/>
    <input> ... </input>
    <output> ... </output>
</layer>

Local Response Normalization (LRN) Layer

Name: Norm

Short description: Reference

Detailed description: Reference

Parameters: Norm layer parameters should be specified in the norm_data node, which is a child of the layer node.

  • Parameter name: alpha
    • Description: alpha represents the scaling parameter for the normalizing sum. For example, alpha equal 0.0001 means that the normalizing sum is multiplied by 0.0001
    • Range of values: floating point positive number
  • Parameter name: beta
    • Description: beta represents the exponent for the normalizing sum. For example, beta equal 0.75 means that the normalizing sum is raised to the power of 0.75
    • Range of values: floating point positive number
  • Parameter name: region
    • Description: region represents strategy of local regions extension. For example, region equal across means that the normalizing sum is performed over adjacent channels
    • Range of values:
      • across - normalizing sum is performed over adjacent channels
      • same - normalizing sum is performed over nearby spatial locations
  • Parameter name: local-size
    • Description: local-size represents the side length of the region to be used for the normalization sum or number of channels depending on the strategy specified in the region parameter. For example, local-size equal 5 for the across strategy means application of sum across 5 adjacent channels
    • Range of values: positive integer bigger than zero

Mathematical Formulation

\[ o_{i} = \left( 1 + \left( \frac{\alpha}{n} \right)\sum_{i}x_{i}^{2} \right)^{\beta} \]

Where n is the size of each local region.

Example

<layer ... type="Norm" ... >
    <norm_data alpha="9.9999997e-05" beta="0.75" local-size="5" region="across"/>
    <input> ... </input>
    <output> ... </output>
</layer>

Concat Layer

Name: Concat

Short description: Reference

Parameters: Concat layer parameters should be specified in the concat_data node, which is a child of the layer node.

  • Parameter name: axis
    • Description: axis is the number of axis over which input blobs are concatenated. For example, axis equal 1 means that input blobs are concatenated over the first axis
    • Range of values: positive number greater or equal to 0

Mathematical Formulation
Axis parameter specifies a blob dimension to concat values. For example, for two input blobs B1xC1xH1xW1 and B2xC2xh4xW2 if axis: 1, output blob is****: B1xC1+C2xH1xW1. This is only possible if B1=B2, H1=H4, W1=W2

Example

<layer ... type="Concat" ... >
    <concat_data axis="1"/>
    <input> ... </input>
    <output> ... </output>
</layer>

Split Layer

Name: Split

Short description: Split layer splits the input into several output groups. Group sizes are denoted by the number and the size of output ports.

Detailed description: Reference

Parameters: None

Mathematical Formulation

Splits input blob among children. For example, blob is BxC+CxHxW and there are two children. Then, output blob is BxCxHxW.

Example

<layer ... type="Split" ... >
    <input> ... </input>
    <output> ... </output>
</layer>

Reshape Layer

Name: Reshape

Short description: Reshape layer changes dimensions of the input blob according to the specified order. Input blob volume is equal to output blob volume, where volume is the product of dimensions.

Detailed description: Reference

Parameters: Reshape layer parameters should be specified in the data node, which is a child of the layer node.

  • Parameter name: axis
    • Description: axis is the number of the starting axis for reshape. For example, axis equal 1 means that Reshape replaces dimensions starting from the next after the first dimension
    • Range of values: positive number greater or equal to 0
  • Parameter name: dim
    • Description: dim is a set of numbers separated with comma, which denote the dimensions of output blob. For example, dim equal 88,1,71 means that output blob gets following dimensions: first dimension equals 88, second dimension equals 1, third dimension equals 71. For more information, refer to the Description block. If dim is equal to two numbers, it performs flattening
    • Range of values: set of positive integer numbers separated with comma
  • Parameter name: num_axes
    • Description: num_axes is the number of dimensions to be replaced with a reshaped blob starting from the dimension number specified in axis property. For example, num_axes equal 2 means that 2 dimensions are replaced with reshaped blob
    • Range of values:
      • -1 - all dimensions are taken starting from the dimension number specified in axis property
      • positive number greater than the value in the axis parameter

Mathematical Formulation

If you want to reshape input blob BxCxHxW into Bx1x(C*H)xW, the dim parameters of your layer should be:

 layer {
    name: "reshape"
    type: "Reshape"
    bottom: "input"
    top: "output"
    reshape_param {
      shape {
        dim: 0  # copy the dimension from below
        dim: 1
        dim: -1 # infer it from the other dimensions
        dim: 0
      }
    }
  }

Example

<layer ... type="Reshape" ... >
    <data axis="0" dim="1, 1001" num_axes="-1"/>
    <input> ... </input>
    <output> ... </output>
</layer>

Eltwise Layer

Name: Eltwise

Short description: Eltwise layer performs element-wise operation, which is specified in parameters, over given inputs.

Parameters: Eltwise layer parameters should be specified in the elementwise_data node, which is placed as a child of the layer node.

  • Parameter name: operation
    • Description: operation is the simple mathematical operation to be performed over inputs. For example, operation equal mul means that input blobs are multiplied
    • Range of values:
      • sum - summation of given values
      • max - select maximum from given values
      • mul - multiplication of given values

Mathematical Formulation Eltwise accepts two inputs of any number of dimensions - from 1 to 4, however, it is required for both of them to have absolutely same dimensions. The produced blob is also of the same dimension as each of its parents

Eltwise does the following with the input blobs:

\[ o_{i} = f(b_{i}^{1}, b_{i}^{2}) \]

where - first blob i-th element, - second blob i-th element, $o_{i}$ - output blob i-th element, $f(a,b)$ - is a function that performs an operation over its two arguments $a, b$.

  • For sum operation, $f(a,b)$ is defined as
    \[ f(a,b) = a + b \]
  • For mul operation, $f(a,b)$ is defined as
    \[ f(a,b) = a * b \]
  • For max operation, $f(a,b)$ is defined as
    \[ f(a,b) = \left\{\begin{array}{ll} a \quad \mbox{if } a \geq b \\ b \quad \mbox{if } b > a \end{array}\right. \]

Example

<layer ... type="Eltwise" ... >
    <elementwise_data operation="sum"/>
    <input> ... </input>
    <output> ... </output>
</layer>

ScaleShift Layer

Name: ScaleShift

Short description: ScaleShift layer performs linear transformation of the input blobs. Weights denote scaling parameter, biases - a shift.

Parameters: ScaleShift layer does not have additional parameters.

Mathematical Formulation

\[ o_{i} =\gamma b_{i} + \beta \]

Example

<layer ... type="ScaleShift" ... >
    <input> ... </input>
    <output> ... </output>
</layer>

Crop Layer

Name: Crop

Short description: Crop layer changes selected dimensions of the input blob according to the specified parameters.

Parameters: Crop layer parameters should be specified in data section, which is placed as a child of the layer node. Due to various representation of Crop attributes in existing frameworks, this layer can be described in three independent ways:

  • First type: Crop layer takes two input blobs, and the shape of the second blob specifies the Crop size. The layer has two attributes: axis and offset.
  • Second type: Crop layer takes one input blob to Crop and has three attributes: axis, offset, and dim.
  • Third type: Crop layer takes one input blob to Crop and has three attributes: axis, crop_begin, and crop_end.

     

First Type

Crop layer takes two input blobs, and the shape of the second blob specifies the Crop size.

The Crop layer of this type supports shape inference.

Attributes

axis

  • Description: axis is a number of a dimension to be used for cropping. For example, axis equal to 1 means that cropping is performed over the first dimension.

  • Range of values: a list of unique integers, where each element is greater than or equal to 0 and less than input shape length.

offset

  • Description: offset denotes the starting point for crop in the input blob. For example, offset equal to 2 means that crop is starting from the second value in the given axis.

  • Range of values: a list of integers of the length equal to the length of axis attribute. In the list, offset[i] is greater than or equal to 0 and less than or equal to input_shape[axis[i]] - crop_size[axis[i]], where crop_size is the shape of the second input.

Inputs

  • 1: Multidimensional input blob (for example, NCHW, NCH, or NC)

  • 2: Shape of this input will be used for crop

Example

<layer id="39" name="score_pool4c" precision="FP32" type="Crop">
    <data axis="2,3" offset="0,0"/>
    <input>
        <port id="0">
            <dim>1</dim>
            <dim>21</dim>
            <dim>44</dim>
            <dim>44</dim>
        </port>
        <port id="1">
            <dim>1</dim>
            <dim>21</dim>
            <dim>34</dim>
            <dim>34</dim>
        </port>
    </input>
    <output>
        <port id="2">
            <dim>1</dim>
            <dim>21</dim>
            <dim>34</dim>
            <dim>34</dim>
        </port>
    </output>
</layer>
Second Type

Crop layer takes one input blob to Crop and has axis, offset, and dim attributes.

The Crop layer of this type supports shape inference only when shape propagation is applied to dimensions that are not specified in the axis attribute.

Attributes

axis

  • Description: axis is a number of a dimension to be used for cropping. For example, axis equal to 1 means that cropping is performed over the first dimension.

  • Range of values: a list of unique integers, where each element is greater than or equal to 0 and less than input shape length

offset

  • Description: offset denotes the starting point for crop in the input blob. For example, offset equal to 2 means that cropping starts from the second value in the given axis.

  • Range of values: a list of integers with the length equal to length of axis attribute, where offset[i] is greater than or equal to 0 and less or equal to input_shape[axis[i]] - dim[i]

dim

  • Description: dim is the resulting size of the output blob for the given axis. For example, dim equal to 88 means that the output blob gets the dimension equal to 88 for the given axis.

  • Range of values: a list of integers

Example

<layer id="39" name="score_pool4c" precision="FP32" type="Crop">
    <data axis="2,3" offset="0,0" dim="34,34"/>
    <input>
        <port id="0">
            <dim>1</dim>
            <dim>21</dim>
            <dim>44</dim>
            <dim>44</dim>
        </port>
    </input>
    <output>
        <port id="1">
            <dim>1</dim>
            <dim>21</dim>
            <dim>34</dim>
            <dim>34</dim>
        </port>
    </output>
</layer>
Third Type

Crop layer takes one input blob to Crop and has axis, crop_begin, and crop_end attributes.

The Crop layer of this type supports shape inference.

Attributes

axis

  • Description: axis is the number of the dimension to be used for cropping. For example, axis equal 1 means that cropping is performed over the first dimension.

  • Range of values: a list of unique integers, where each element is greater than or equal to 0 and less than input shape length

crop_begin

  • Description: crop_begin specifies the starting offset for crop in the input blob for given axes.

  • Range of values: a list of integers, where crop_begin[i] is greater than or equal to 0 and less than input_shape[axis[i]] - crop_end[i]

crop_end

  • Description: crop_end specifies the ending offset for crop in the input blob for given axes.

  • Range of values: a list of integers, where crop_end[i] is greater than or equal to 0 and less than input_shape[axis[i]] - crop_begin[i]

Example

<layer id="39" name="score_pool4c" precision="FP32" type="Crop">
    <data axis="2,3" crop_begin="4,4" crop_end="6,6"/>
    <input>
        <port id="0">
            <dim>1</dim>
            <dim>21</dim>
            <dim>44</dim>
            <dim>44</dim>
        </port>
    </input>
    <output>
        <port id="1">
            <dim>1</dim>
            <dim>21</dim>
            <dim>34</dim>
            <dim>34</dim>
        </port>
    </output>
</layer>

Batch Normalization Layer

Name: BatchNormalization

Short description: Reference

Detailed description: Reference

Parameters: BatchNormalization layer parameters should be specified as the batch_norm_data node, which is a child of the layer node.

  • Parameter name: epsilon
    • Description: epsilon is the number to be added to the variance to avoid division by zero when normalizing the value. For example, epsilon equal 0.001 means that 0.001 is added to the variance
    • Range of values: positive floating point number

Mathematical Formulation

BatchNormalization is the normalization of the output in each hidden layer.

  • Input: Values of x over a mini-batch: $ \beta = \left \{ x_{1...m} \right \} $

  • Parameters to learn:$ \gamma, \beta$

  • Output:
    $ \left \{ o_{i} = BN_{\gamma, \beta} \left ( b_{i} \right ) \right \} $
  • Mini-batch mean:
    $ \mu_{\beta} \leftarrow \frac{1}{m}\sum_{i=1}^{m}b_{i} $
  • Mini-batch variance:
    $ \sigma_{\beta }^{2}\leftarrow \frac{1}{m}\sum_{i=1}^{m}\left ( b_{i} - \mu_{\beta} \right )^{2} $
  • Normalize:
    $ \hat{b_{i}} \leftarrow \frac{b_{i} - \mu_{\beta}}{\sqrt{\sigma_{\beta }^{2} + \epsilon }} $
  • Scale and shift:
    $ o_{i} \leftarrow \gamma\hat{b_{i}} + \beta = BN_{\gamma ,\beta }\left ( b_{i} \right ) $

Example

<layer ... type="BatchNormalization" ... >
    <batch_norm_data epsilon="9.99e-06" />
    <input> ... </input>
    <output> ... </output>
</layer>

Normalize Layer

Name: Normalize

Short description: Normalize layer performs l-p normalization of 1 of input blob.

Parameters: Normalize layer parameters should be specified as the data node, which is a child of the layer node.

  • Parameter name: across_spatial
    • Description: across_spatial is a flag that denotes if normalization is performed over CHW or HW. For example, across_spatial equals 0 means that normalization is not shared across channels
    • Range of values:
      • 0
      • 1 - not supported
  • Parameter name: channel_shared
    • Description: channel_shared is a flag that denotes if scale parameters are shared across channels. For example, channel_shared equal 0 means that scale parameters are not shared across channels
    • Range of values:
      • 0 - scale parameters are not shared across channels
      • 1 - not supported
  • Parameter name: eps
    • Description: eps is the epsilon used to avoid division by zero when normalizing the value. For example, eps equals 0.001 means that 0.001 is used if all the values in normalization are equal to zero
    • Range of values: positive floating point number

Mathematical Formulation

\[ o_{i} = \sum_{i}^{H*W}\frac{\left ( n*C*H*W \right )* scale}{\sqrt{\sum_{i=0}^{C*H*W}\left ( n*C*H*W \right )^{2}}} \]

Example

<layer ... type="Normalize" ... >
    <data across_spatial="0" channel_shared="0" eps="0.000000"/>
    <input> ... </input>
    <output> ... </output>
</layer>

Tile Layer

Name: Tile

Short description: Tile layer extends input blob with copies of data along specific axis.

Detailed description: Reference

Parameters: Tile layer parameters should be specified as the tile_data node, which is a child of the layer node.

  • Parameter name: axis
    • Description: axis is the index of the axis to tile. For example, axis equals 3 means that fourth axis is used for tiling
    • Range of values: positive integer number
  • Parameter name: tiles
    • Description: tiles is a size of the specified axis in the output blob. For example, tiles equal 88 means that output blob gets 88 copies of data from specified axis
    • Range of values: positive integer number

Mathematical Formulation

Tile extends input blobs and filling in output blobs following rules:

\[ out_i=input_i[inner\_dim*t] \]

\[ t \in \left ( 0, \quad tiles \right ) \]

Example

<layer ... type="Tile" ... >
    <tile_data axis="3" tiles="88"/>
    <input> ... </input>
    <output> ... </output>
</layer>

Permute Layer

Name: Permute

Short description: Permute layer performs reordering of input blob dimensions.

Detailed description: Reference

Parameters: Permute layer parameters should be specified as the data node, which is a child of the layer node.

NOTE: Model Optimizer (Beta 2) does not use the data node for retrieving parameters and currently supports only the following order for permutation: 0,2,3,1.

  • Parameter name: order
    • Description: order is the set of dimensions indexes for output blob. For example, order equal 0,2,3,1 means that the output blob has following dimensions: first dimension from the input blob, third dimension from the input blob, fourth dimension from the input blob, second dimension from the input blob
    • Range of values: set of positive integer numbers separated by comma

Mathematical Formulation

Permute layer performs reordering input blob. Source indexes and destination indexes are bound by formula:

\[ src\_ind_{offset} = n * ordered[1] * ordered[2] * ordered[3] + (h * ordered[3] + w) \]

\[ n \in \left ( 0, order[0] \right ) \]

\[ w \in \left ( 0, order[3] \right ) \]

Example

<layer ... type="Permute" ... >
    <data order="0,2,3,1"/>
    <input> ... </input>
    <output> ... </output>
</layer>

PriorBox Layer

Name: PriorBox

Short description: PriorBox layer generates prior boxes of specified sizes and aspect ratios across all dimensions.

Parameters: PriorBox layer parameters should be specified as the data node, which is a child of the layer node.

  • Parameter name: min_size (max_size)
    • Description: min_size (max_size) is the minimum (maximum) box size (in pixels). For example, min_size (max_size) equal 15 means that the minimum (maximum) box size is 15
    • Range of values: positive integer number
  • Parameter name: aspect_ratio
    • Description: aspect_ratio is a variance of aspect ratios. Duplicate values are ignored. For example, aspect_ratio equal 2.000000,3.000000 means that for the first box aspect_ratio is equal to 2 and for the second box - 3
    • Range of values: set of positive integer numbers
  • Parameter name: flip
    • Description: flip is a flag that denotes that each aspect_ratio is duplicated and flipped. For example, flip equals 1 and aspect_ratio equals 3 mean that aspect_ratio is equal to 1/3
    • Range of values:
      • 0 - each aspect_ratio is flipped
      • 1 - each aspect_ratio is not flipped
  • Parameter name: clip
    • Description: clip is a flag that denotes if each value in the output blob is within [0,1]. For example, clip equal 1 means that each value in the output blob is within [0,1]
    • Range of values:
      • 0 - clipping is not performed
      • 1 - each value in the output blob is within [0,1]
  • Parameter name: step
    • Description: step is a distance between box centers. For example, step equal 85 means that the distance between neighborhood prior boxes centers is 85
    • Range of values: floating point positive number
  • Parameter name: offset
    • Description: offset is a shift of box respectively to top left corner. For example, offset equal 85 means that the shift of neighborhood prior boxes centers is 85
    • Range of values: floating point positive number
  • Parameter name: variance
    • Description: variance denotes a variance of adjusting bounding boxes. For example, variance equals 85 means that the shift of neighborhood prior boxes centers is 85
    • Range of values: floating point positive number
  • Parameter name: scale_all_sizes

    • Description: scale_all_sizes is a flag that denotes type of inference. For example, scale_all_sizes equals 0 means that priorbox layer is inferd in MXNet-like manner. In particular, max_size parameter is ignored.

    • Range of values:
      • 0 - max_size is ignored
      • 1 - default value. max_size is used

Mathematical formulation:
PriorBox computes coordinates of prior boxes by following:

  1. First calculates center_x and center_y of prior box:

    \[ W \equiv Width \quad Of \quad Image \]

    \[ H \equiv Height \quad Of \quad Image \]

    • If step equals 0:
      \[ center_x=(w+0.5) \]
      \[ center_y=(h+0.5) \]
    • else:
      \[ center_x=(w+offset)*step \]
      \[ center_y=(h+offset)*step \]
      \[ w \subset \left( 0, W \right ) \]
      \[ h \subset \left( 0, H \right ) \]
  2. Then, for each $ s \subset \left( 0, min_sizes \right )$ calculates coordinates of priorboxes:
    \[ xmin = \frac{\frac{center_x - s}{2}}{W}; \]
    \[ xmin = \frac{\frac{center_y - s}{2}}{H}; \]
    \[ xmax = \frac{\frac{center_x + s}{2}}{W}; \]
    \[ xmin = \frac{\frac{center_y + s}{2}}{H}; \]

Example

<layer ... type="PriorBox" ... >
    <data step="64.000000" min_size="162.000000" max_size="213.000000" offset="0.500000" flip="1" clip="0" aspect_ratio="2.000000,3.000000" variance="0.100000,0.100000,0.200000,0.200000" />
    <input> ... </input>
    <output> ... </output>
</layer>

SimplerNMS Layer

Name: SimplerNMS

Short description: SimplerNMS layer performs filtering of bounding boxes and outputs only those with the highest confidence of prediction.

Parameters: SimplerNMS layer parameters should be specified as the data node, which is a child of the layer node.

  • Parameter name: pre_nms_topn (post_nms_topn)
    • Description: pre_nms_topn (post_nms_topn) is the quantity of bounding boxes before (after) applying NMS operation. For example, pre_nms_topn (post_nms_topn) equals 15 means that the minimum (maximum) box size is 15
    • Range of values: positive integer number
  • Parameter name: cls_threshold
    • Description: cls_threshold is the minimum value of the proposal to be taken into consideration. For example, cls_threshold equal 0.5 means that all boxes with prediction probability less than 0.5 are filtered out
    • Range of values: positive floating point number
  • Parameter name: iou_threshold
    • Description: iou_threshold is the minimum ratio of boxes overlapping to be taken into consideration. For example, iou_threshold equal 0.7 means that all boxes with overlapping ratio less than 0.7 are filtered out
    • Range of values: positive floating point number
  • Parameter name: feat_stride
    • Description: feat_stride is the step size to slide over boxes (in pixels). For example, feat_stride equal 16 means that all boxes are analyzed with the slide 16
    • Range of values: positive integer number
  • Parameter name: min_bbox_size
    • Description: min_bbox_size is the minimum size of box to be taken into consideration. For example, min_bbox_size equal 35 means that all boxes with box size less than 35 are filtered out
    • Range of values: positive integer number
  • Parameter name: scale
    • Description: scale is array of scales for anchor boxes generating
    • Range of values: positive integer number

Mathematical Formulation

SimplerNMS accepts three inputs with four dimensions. Produced blob has two dimensions, the first one equals post_nms_topn.

SimplerNMS does the following with the input blob:

  1. Generates initial anchor boxes. Left top corner of all boxes is (0, 0). Width and height of boxes are calculated based on scaled (according to the scale parameter) default widths and heights
  2. For each point in the first input blob:
    • pins anchor boxes to picture according to the second input blob, which contains four deltas for each box: for x and y of center, for width, and for height
    • finds out score in the first input blob
  3. Filters out boxes with size less than min_bbox_size.
  4. Sorts all proposals (box, score) by score from highest to lowest
  5. Takes top pre_nms_topn proposals
  6. Calculates intersections for boxes and filters out all with intersection/union > iou_threshold
  7. Takes top post_nms_topn proposals
  8. Returns top proposals

Example

<layer ... type="SimplerNMS" ... >
    <data cls_threshold="0.500000" iou_threshold="0.700000" min_bbox_size="16" feat_stride="16" pre_nms_topn="6000" post_nms_topn="150"/>
    <input> ... </input>
    <output> ... </output>
</layer>

DetectionOutput Layer

Name: DetectionOutput

Short description: DetectionOutput layer performs non-maximum suppression to generate the detection output using information on location and confidence predictions.

Detailed description: Reference

Parameters: DetectionOutput layer parameters should be specified as the data node, which is a child of the layer node.

  • Parameter name: num_classes
    • Description: number of classes to be predicted
    • Range of values: positive integer values
  • Parameter name: background_label_id
    • Description: background label id. If there is no background class, set it to -1
    • Range of values: integer values
  • Parameter name: top_k
    • Description: maximum number of results to be kept on NMS stage
    • Range of values: integer values
  • Parameter name: variance_encoded_in_target
    • Description: if "true", variance is encoded in target; otherwise, we need to adjust the predicted offset accordingly
    • Range of values: logical values
  • Parameter name: keep_top_k
    • Description: number of total bboxes to be kept per image after NMS step. -1 means keeping all bboxes after NMS step
    • Range of values: integer values
  • Parameter name: num_orient_classes
    • Range of values: integer values
  • Parameter name: code_type
    • Description: type of coding method for bounding boxes
    • Range of values: caffe.PriorBoxParameter.CENTER_SIZE and others
  • Parameter name: share_location
    • Description: bounding boxes are shared among different classes
    • Range of values: logical values
  • Parameter name: interpolate_orientation
    • Range of values: integer values
  • Parameter name: nms_threshold
    • Description: threshold to be used in NMS stage
    • Range of values: floating point values
  • Parameter name: confidence_threshold
    • Description: only consider detections whose confidences are larger than a threshold. If not provided, consider all boxes
    • Range of values: floating point values

Mathematical Formulation

At each feature map cell, DetectionOutput predicts the offsets relative to the default box shapes in the cell, as well as the per-class scores that indicate the presence of a class instance in each of those boxes. Specifically, for each box out of k at a given location, DetectionOutput computes class scores and the four offsets relative to the original default box shape. This results in a total of $(c + 4)k$ filters that are applied around each location in the feature map, yielding $(c + 4)kmn$ outputs for a m × n feature map.

Example

<layer ... type="DetectionOutput" ... >
    <data num_classes="21" share_location="1" background_label_id="0" nms_threshold="0.450000" top_k="400" eta="1.000000" output_directory="" output_name_prefix="" output_format="" label_map_file="" name_size_file="" num_test_image="0" prob="1.000000" resize_mode="caffe.ResizeParameter.WARP" height="0" width="0" height_scale="0" width_scale="0" pad_mode="caffe.ResizeParameter.CONSTANT" pad_value="#" interp_mode="#" code_type="caffe.PriorBoxParameter.CENTER_SIZE" variance_encoded_in_target="0" keep_top_k="200" confidence_threshold="0.010000" visualize="0" visualize_threshold="0.000000" save_file=""/>
    <input> ... </input>
    <output> ... </output>
</layer>

Memory / Delay Object Layer

Name: Memory

Short description: Memory layer represents delay layer in terms of LSTM terminology. To read more about LSTM topologies please refer this link.

Detailed description: Memory layer saves state between two infer requests. In the topology, it is the single layer, however, in the Intermediate Representation, it is always represented as a pair of Memory layers. One of these layers does not have outputs and another does not have inputs (in terms of the Intermediate Representation).

Parameters: Memory layer parameters should be specified as the data node, which is a child of the layer node.

  • Parameter name: id
    • Description: id is the id of the pair of Memory layers. For example, id equals r_27-28 means that layers with id 27 and 28 are in one pair
    • Range of values: positive integer number
  • Parameter name: index
    • Description: index represents if the given layer is input or output. For example, index equal 0 means this layer is output one
    • Range of values:
      • 0 - current layer is output one
      • 1 - current layer is input one
  • Parameter name: size
    • Description: size represents the size of the group. For example, size equals 2 means this group is a pair
    • Range of values: only 2 is supported

Mathematical Formulation
Memory save data from the input blob.

Example

<layer ... type="Memory" ... >
    <data id="r_27-28" index="0" size="2" />
    <input> ... </input>
    <output> ... </output>
</layer>

Clamp Layer

Name: Clamp

Short description: Clamp layer represents clipping activation operation.

Detailed description: Reference

Parameters: Clamp layer parameters should be specified as the data node, which is a child of the layer node.

  • Parameter name: min
    • Description: min is the lower bound of values in the output shape. Any value in the input shape that is smaller than the bound, is replaced by the min value. For example, min equal 10 means that any value in the input shape that is smaller than the bound, is replaced by 10
    • Range of values: positive integer number
  • Parameter name: max
    • Description: max is the upper bound of values in the output shape. Any value in the input shape that is greater than the bound, is replaced by the max value. For example, max equals 50 means that any value in the input shape that is greater than the bound, is replaced by 50
    • Range of values: positive integer number

Mathematical Formulation

Clamp generally does the following with the input blobs:

\[ out_i=\left\{\begin{array}{ll} max\_value \quad if \quad input_i>max\_value, \\ min\_value \quad if \quad input_i

Example

<layer ... type="Clamp" ... >
    <data min="10" max="50" />
    <input> ... </input>
    <output> ... </output>
</layer>

ArgMax Layer

Name: ArgMax

Short description: ArgMax layer compute the index of the K maximum values for each datum across all dimensions CxHxW.

Detailed description: Intended for use after a classification layer to produce a prediction. If parameter out_max_val is set to "true", output is a vector of pairs (max_ind, max_val) for each image. The axis parameter specifies an axis along which to maximize.

Parameters: ArgMax layer parameters should be specified as the data node, which is a child of the layer node.

  • Parameter name: top_k
    • Description: top_k is the number K of maximum items to output
    • Range of values: positive integer number
  • Parameter name:out_max_val
    • Description: if out_max_val equals 1, output is a vector of pairs (max_ind, max_val), unless axis is set. Then output is max_val along the specified axis
    • Range of values: 0 or 1
  • Parameter name: axis
    • Description: if set, maximizes along the specified axis, else maximizes the flattened trailing dimensions for each index of the first / num dimension
    • Range of values: integer values

Mathematical Formulation

ArgMax generally does the following with the input blobs:

 f(y) \leq f(x) \right\} \]

Example

<layer ... type="ArgMax" ... >
    <data top_k="10" out_max_val="1" axis="-1"/>
    <input> ... </input>
    <output> ... </output>
</layer>

PSROIPooling Layer

Name: PSROIPooling

Short description: PSROIPooling layer compute position-sensitive max pooling on regions of interest specified by input, takes as input N position-sensitive score maps and a list of R regions of interest.

Detailed description: Reference

ParametersPSRoiPooling layer parameters should be specified as the data node, which is a child of the layer node.

  • Parameter name: output_dim
    • Description: pooled output channel number
    • Range of values: positive integer number
  • Parameter name: group_size
    • Description: a number of groups to encode position-sensitive score maps
    • Range of values: positive integer number
  • Parameter name: spatial_scale
    • Description: multiplicative spatial scale factor to translate ROI coordinates from their input scale to the scale used when pooling
    • Range of values: positive floating point value

Mathematical Formulation

The output value for $(i, j)$-th bin is obtained by summation from one score map $x_{i,j}$ corresponding to that bin. In short, the difference from RoIPooling is that a general feature map x is replaced by a specific positive-sensitive score map $x_{i,j}$

Example

<layer ... type="PSROIPooling" ... >
    <data output_dim="10" out_max_val="1" spatial_scale="0.1"/>
    <input> ... </input>
    <output> ... </output>
</layer>

GRN Layer

Name: GRN

Short description: GRN is Global Response Normalization with L2 norm (across channels only).

Parameters: GRN layer parameters should be specified as the data node, which is a child of the layer node.

  • Parameter name: bias
  • Description: bias is added to the variance
  • Range of values: floating point value

Mathematical Formulation

GRN computes L2 norm by channels for input blob. GRN generally does the following with the input blob:

\[ output_{i} = \frac{input_{i}}{\sqrt{\sum_{i}^{C} input_{i}}} \]

Example

<layer ... type="GRN" ... >
    <data bias="1.0"/>
    <input> ... </input>
    <output> ... </output>
</layer>

PReLU Layer

Name: PReLU

Short description: PReLU is the Parametric Rectifier Linear Unit. The difference from ReLU is that negative slopes can vary across channels.

Parameters:PReLU layer parameters should be specified as the data node, which is a child of the layer node.

  • Parameter name: channel_shared
    • Description: channel_shared shows if negative slope shared across channels or not
    • Range of values: 0 or 1
  • Parameter name: filler_type
    • Description: filler_type defines initialization type for negative slope
    • Range of values: string
  • Parameter name: filler_value
    • Description: filler_value defines the value in constant filler
    • Range of values: integer
  • Parameter name: min(max)
    • Description: min(max) defines the minimal(maximal) value in uniform filler
    • Range of values: integer
  • Parameter name: mean
    • Description: mean defines the mean value in Gaussian filler
    • Range of values: integer

Mathematical Formulation

PReLU accepts one input with four dimensions. The produced blob has the same dimensions as input.

PReLU does the following with the input blob:
\[ o_{i} = max(0, x_{i}) + w_{i} * min(0,x_{i}) \]

where $w_{i}$ is from weights blob.

Example

<layer ... type="PReLU" ... >
    <data bias="1.0"/>
    <input> ... </input>
    <output> ... </output>
</layer>

RegionYolo Layer

Name:RegionYolo

Short description:RegionYolo computes coordinates of regions with probability for each class.

Detailed description:Reference

Parameters:

RegionYolo layer parameters should be specified as the data node, which is a child of the layer node.

coords

      Parameter name:coords

      Description:coords is num coordinates for each region

      Range of values: integer value

classes

      Parameter name:classes

      Description:classes is num of classes for each region

      Range of values: integer value

num

      Parameter name:num

      Description:num is number of regions

      Range of values: integer value

do_softmax

      Parameter name:do_softmax

      Description:do_softmax is a flag which specifies the method of infer

      Range of values: 1. 0 - softmax is not performed 2. 1 - softmax is performed

anchors

      Parameter name:anchors

      Description:anchors coordinates of regions

      Range of values: floating point values

mask

      Parameter name:mask

      Description:mask specifies which anchors to use

      Range of values: integer values

axis

      Parameter name:axis

      Description:axis is the number of the dimension from which flattening is performed. For example, axis equals 1 means that flattening is started from the 1st dimension.

      Range of values: positive number greater or equal to 0

end_axis

      Parameter name:end_axis

      Description:end_axis is the number of the dimension on which flattening is ended. For example, end_axis equals -1 means that flattening is performed till the last dimension.

      Range of values: positive number greater or equal to 0

Mathematical formulation

RegionYolo calculates coordinates of regions by the rule:

RegionYolo function

RegionYolo function

where:

  • i is number of regions
  • w and h are dimensions of image
  • coords and classes are attributes of this layer
  • b is bacth

For each region, RegionYolo calculates probability by probability:

Example

<layer ... type="RegionYolo" ... >
    <data bias="1.0"/>
    <input> ... </input>
    <output> ... </output>
    <weights .../>
</layer>

ReorgYolo Layer

Name:ReorgYolo

Short description:ReorgYolo reorganizes input blob taking into account strides.

Detailed description:Reference

Parameters

ReorgYolo layer parameters should be specified as the data node, which is a child of the layer node.

stride

      Parameter name:stride

      Description:stride is distance of cut throws in output blobs.

      Range of values: integer values

Mathematical formulation

RegionYolo reorganized the blob.

Destination index of the data calculates the following rules:

Source index of the data calculates the following rules:

where:

Example

<layer ... type="ReorgYolo" ... >
    <data stride="1"/>
    <input> ... </input>
    <output> ... </output>
</layer>

PriorBoxClustered Layer

Name: PriorBoxClustered

Short description: PriorBoxClustered layer generates prior boxes of specified sizes.

Parameters: PriorBoxClustered layer parameters should be specified as the data node, which is a child of the layer node.

  • Parameter name: width (height)
    • Description: width (height) is a parameter that specifies desired boxes widths (heights) in pixels
    • Range of values: floating point positive number
  • Parameter name: clip
    • Description: clip is a flag that denotes if each value in the output blob is within [0,1]. For example, clip equal 1 means that each value in the output blob is within [0,1]
    • Range of values:
    • 0 - clipping is not performed
    • 1 - each value in the output blob is within [0,1]
  • Parameter name: flip
    • Description: flip is a flag that denotes whether the list of boxes is augmented with the flipped ones
    • Range of values:
      • 0 - list of boxes is not augmented with the flipped ones
      • 1 - list of boxes is augmented with the flipped ones
  • Parameter name: step (step_w, step_h)
    • Description: step (step_w, step_h) is a distance between box centers. For example, step equal 85 means that the distance between neighborhood prior boxes centers is 85
    • Range of values: floating point positive number
  • Parameter name: offset
    • Description: offset is a shift of box respectively to top left corner. For example, offset equal 85 means that the shift of neighborhood prior boxes centers is 85
    • Range of values: floating point positive number
  • Parameter name: variance
    • Description: variance denotes a variance of adjusting bounding boxes. For example, variance equal 85 means that the shift of neighborhood prior boxes centers is 85
    • Range of values: floating point positive number
  • Parameter name: img_h (img_w)
    • Description: img_h (img_w) specifies height (width) of input image. These parameters are calculated unless provided explicitly
    • Range of values: floating point positive number

Mathematical Formulation

PriorBoxClustered computes coordinates of prior boxes by following:

  1. Calculates the center_x and center_y of prior box:
    \[ W \equiv Width \quad Of \quad Image \]
    \[ H \equiv Height \quad Of \quad Image \]
    \[ center_x=(w+offset)*step \]
    \[ center_y=(h+offset)*step \]
    \[ w \subset \left( 0, W \right ) \]
    \[ h \subset \left( 0, H \right ) \]
  2. For each $ s \subset ( 0, W )$ calculates the prior boxes coordinates:
    $ xmin = \frac{center_x - \frac{width_s}{2}}{W}$
    $ ymin = \frac{center_y - \frac{height_s}{2}}{H} $
    $ xmax = \frac{center_x - \frac{width_s}{2}}{W} $
    $ ymax = \frac{center_y - \frac{height_s}{2}}{H} $

If clip is defined, the coordinates of prior boxes are recalculated with the formula:
$coordinate = \min(\max(coordinate,0), 1)$

Example

<layer ... type="PriorBoxClustered">
    <data clip="0" flip="0" height="44.0,10.0,30.0,19.0,94.0,32.0,61.0,53.0,17.0" offset="0.5" step="16.0" variance="0.1,0.1,0.2,0.2"
     width="86.0,13.0,57.0,39.0,68.0,34.0,142.0,50.0,23.0"/>
    <input>
        ...
    </input>
    <output>
        ...
    </output>
</layer>

MVN Layer

Name: MVN

Short description: Reference

Parameters: MVN layer parameters should be specified as the data node, which is a child of the layer node.

  • Parameter name: across_channels
    • Description: across_channels is a flag that denotes if mean values are shared across channels. For example, across_channels equal 0 means that mean values are not shared across channels
    • Range of values:
      • 0 - mean values are not shared across channels
      • 1 - mean values are shared across channels
  • Parameter name: normalize_variance
    • Description: normalize_variance is a flag that denotes whether to perform variance normalization
    • Range of values:
      • 0 - variance normalization is not performed
      • 1 - variance normalization is performed
  • Parameter name: eps
    • Description: eps is the number to be added to the variance to avoid division by zero when normalizing the value. For example, epsilon equal 0.001 means that 0.001 is added to the variance
    • Range of values: positive floating point number

Mathematical Formulation

MVN subtracts mean from the input blob:

$ o_{i} = i_{i} - \frac{\sum{i_{k}}}{C * H * W}$

If normalize_variance is set to 1, the output blob is divided by variance:

$ o_{i}=\frac{o_{i}}{\sum \sqrt {o_{k}^2}+\epsilon}$

Example

<layer ... type="MVN">
    <data across_channels="1" eps="9.999999717180685e-10" normalize_variance="1"/>
    <input>
        ...
    </input>
    <output>
        ...
    </output>
</layer>

CTCGreadyDecoder Layer

Name: CTCGreadyDecoder

Short description: CTCGreadyDecoder performs greedy decoding on the logits given in input (best path).

Detailed description: Reference

Parameters: CTCGreadyDecoder layer parameters should be specified as the data node, which is a child of the layer node.

  • Parameter name: ctc_merge_repeated
    • Description: ctc_merge_repeated is flag for collapsing the repeated labels during the ctc calculation
    • Range of values: 0 or 1

Mathematical formulation

Given an input sequence X of length T, CTCGreadyDecoder assumes the probability of a length T character sequence C is given by,

$p(C|X) = \prod_{t=1}^{T} p(c_{t}|X)$

Example

<layer ... type="CTCGreadyDecoder" ... >
    <data stride="1"/>
    <input> ... </input>
    <output> ... </output>
</layer>

Proposal Layer

Name: Proposal

Short description: Proposal layer performs filtering of only those bounding boxes and outputs with the highest confidence of prediction.

Parameters: Proposal layer parameters should be specified as the data node, which is a child of the layer node.

  • Parameter name: pre_nms_topn (post_nms_topn)
    • Description: pre_nms_topn (post_nms_topn) is the quantity of bounding boxes before (after) applying NMS operation. For example, pre_nms_topn (post_nms_topn) equal 15 means that the minimum (maximum) box size is 15
    • Range of values: positive integer number
  • Parameter name: nms_thresh
    • Description: nms_thresh is the minimum value of the proposal to be taken into consideration. For example, nms_thresh equal 0.5 means that all boxes with prediction probability less than 0.5 are filtered out
    • Range of values: positive floating point number
  • Parameter name: feat_stride
    • Description: feat_stride is the step size to slide over boxes (in pixels). For example, feat_stride equal 16 means that all boxes are analyzed with the slide 16
    • Range of values: positive integer number
  • Parameter name: min_size
    • Description: min_size is the minimum size of box to be taken into consideration. For example, min_size equal 35 means that all boxes with box size less than 35 are filtered out
    • Range of values: positive integer number
  • Parameter name: base_size
    • Description: base_size is the base size for anchor generation
    • Range of values: positive integer number
  • Parameter name: ratio
    • Description: ratio is the ratios for anchor generation
    • Range of values: array of float numbers
  • Parameter name: scale
    • Description: scale is the scales for anchor generation
    • Range of values: array of float numbers

Mathematical formulation

Proposal layer accepts three inputs with four dimensions. The produced blob has two dimensions: first one equals batch_size * post_nms_topn.

Proposal does the following with the input blob:

  1. Generates initial anchor boxes Left top corner of all boxes in (0, 0). Width and height of boxes are calculated from base_size with scale and ratio parameters
  2. For each point in the first input blob:
    • pins anchor boxes to the image according to the second input blob that contains four deltas for each box: for x and y of center, for width and for height
    • finds out score in the first input blob
  3. Filters out boxes with size less than min_size
  4. Sorts all proposals (box, score) by score from highest to lowest
  5. Takes top pre_nms_topn proposals
  6. Calculates intersections for boxes and filter out all with $intersection/union > nms_thresh$
  7. Takes top post_nms_topn proposals
  8. Returns top proposals

Example

<layer ... type="Proposal" ... >
    <data base_size="16" feat_stride="16" min_size="16" nms_thresh="0.6" post_nms_topn="200" pre_nms_topn="6000"
     ratio="2.67" scale="4.0,6.0,9.0,16.0,24.0,32.0"/>
    <input> ... </input>
    <output> ... </output>
</layer>

Resample Layer

Name: Resample

Short description: Resample layer scales the input blob by the specified parameters.

Parameters: Resample layer parameters should be specified as the data node, which is a child of the layer node.

  • Parameter name: type
    • Description: type parameter specifies type of blob interpolation
    • Range of values:
      • LINEAR - linear blob interpolation
      • CUBIC - cubic blob interpolation
      • NEAREST - nearest-neighbor blob interpolation
  • Parameter name: antialias
    • Description: antialias is a flag that denotes whether to perform anti-aliasing
    • Range of values:
      • 0 - anti-aliasing is not performed
      • 1 - anti-aliasing is performed
Mathematical formulation

Resample layer scales the input blob. Depending on the type parameter, Resample applies different blob interpolation algorithms and performs anti-aliasing if the antialias parameter is specified.

Example

<layer type="Resample">
  <data antialias="0" factor="1.0" height="227" type="caffe.ResampleParameter.LINEAR" width="227"/>
      <input>
      ...
      </input>
      <output>
      ...
      </output>
​</layer>

Frequently Asked Questions

If your question is not covered by the topics below, use the Intel® Distribution of OpenVINO™ Support page, where you can participate on a free forum.

1. What does the message "[ ERROR ]: Current caffe.proto does not contain field" mean?

Internally, the Model Optimizer uses a protobuf library to parse and load Caffe* models. This library requires a file grammar and a generated parser. For a Caffe fallback, the Model Optimizer uses a Caffe-generated parser for a Caffe-specific .proto file (which is usually located in the src/caffe/proto directory). So, if you have Caffe installed on your machine with Python* interface available, make sure that this is exactly the version of Caffe that was used to create the model.

If you just want to experiment with the Model Optimizer and test a Python extension for working with your custom layers without building Caffe, add the layer description to the caffe.proto file and generate a parser for it.

For example, to add the description of the CustomReshape layer, which is an artificial layer not present in any caffe.proto files:

  1. Add the following lines to of the caffe.proto file:
    package mo_caffe; // to avoid conflict with system Caffe* it is highly recommended to specify different package name
    ...
    message LayerParameter {
      // other layers parameters description
      ...
      optional CustomReshapeParameter custom_reshape_param = 546; // 546 - ID is any number not present in caffe.proto
    }
    // these lines to end of the file - describing contents of this parameter
    message CustomReshapeParameter {
      optional BlobShape shape = 1; // we just use the same parameter type as some other Caffe layers
    }
  2. Generate a new parser:
    cd <INSTALL_DIR>/deployment_tools/model_optimizer/mo/front/caffe/proto
    python3 generate_caffe_pb2.py --input_proto <PATH_TO_CUSTOM_CAFFE>/src/caffe/proto/caffe.proto

    where PATH_TO_CUSTOM_CAFFE is the path to the root directory of custom Caffe*.

  3. Now, the Model Optimizer is able to load the model into memory and start working with your extensions if there are any.

However, because your model has custom layers, you must register your custom layers as custom. To learn more about it, refer to the section Custom Layers in Model Optimizer.

2. How do I create a bare caffemodel, if I have only prototxt?

You need the Caffe* Python* interface. In this case, do the following:

python3
import caffe
net = caffe.Net('<PATH_TO_PROTOTXT>/my_net.prototxt', caffe.TEST)
net.save('<PATH_TO_PROTOTXT>/my_net.caffemodel')
3. What does the message "[ ERROR ]: Unable to create ports for node with id" mean?

Most likely, the Model Optimizer does not know how to infer output shapes of some layers in the given topology. To lessen the scope, compile the list of layers that are custom for the Model Optimizer: present in the topology, absent in list of supported layers for the target framework: Caffe*, TensorFlow*, MXNet*. Then refer to available options in the corresponding section: Caffe* Models with Custom Layers, TensorFlow* Models with Custom Layers, MXNet* Models with Custom Layers.

4. What does the message "Input image of shape is larger than mean image from file" mean?

Your model input shapes must be smaller than or equal to the shapes of the mean image file you provide. The idea behind the mean file is to subtract its values from the input image in an element-wise manner. When the mean file is smaller than the input image, there are not enough values to perform element-wise subtraction. Also, make sure that you use the mean file that was used during the network training phase. Note that the mean file is dataset dependent.

5. What does the message "Mean file is empty" mean?

Most likely, the mean file that you have is specified with --mean_file flag, while launching the Model Optimizer is empty. Make sure that this is exactly the required mean file and try to regenerate it from the given dataset if possible.

6. What does the message "Probably mean file has incorrect format" mean?

The mean file that you provide for the Model Optimizer must be in a .binaryproto format. You can try to check the content using recommendations from the BVLC Caffe* (#290).

7. What does the message "Invalid proto file: there is neither 'layer' nor 'layers' top-level messages" mean?

The structure of any Caffe* topology is described in the caffe.proto file of any Caffe version. For example, in the Model Optimizer, you can find the following proto file, used by default: <INSTALL_DIR>/deployment_tools/model_optimizer/mo/front/caffe/proto/my_caffe.proto. There you can find the structure:

message NetParameter {
  // ... some other parameters
  // The layers that make up the net.  Each of their configurations, including
  // connectivity and behavior, is specified as a LayerParameter.
  repeated LayerParameter layer = 100;  // ID 100 so layers are printed last.
  // DEPRECATED: use 'layer' instead.
  repeated V1LayerParameter layers = 2;
}

This means that any topology should contain layers as top-level structures in prototxt. For example, see the LeNet topology.

8. What does the message "Old-style inputs (via 'input_dims') are not supported. Please specify inputs via 'input_shape'" mean?

The structure of any Caffe* topology is described in the caffe.proto file for any Caffe version. For example, in the Model Optimizer, you can find the following .proto file, used by default: <INSTALL_DIR>/deployment_tools/model_optimizer/mo/front/caffe/proto/my_caffe.proto. There you can find the structure:

message NetParameter {

 optional string name = 1; // consider giving the network a name
  // DEPRECATED. See InputParameter. The input blobs to the network.
  repeated string input = 3;
  // DEPRECATED. See InputParameter. The shape of the input blobs.
  repeated BlobShape input_shape = 8;
  // 4D input dimensions -- deprecated.  Use "input_shape" instead.
  // If specified, for each input blob there should be four
  // values specifying the num, channels, height and width of the input blob.
  // Thus, there should be a total of (4 * #input) numbers.
  repeated int32 input_dim = 4;
  // ... other parameters
}

So, the input layer of the provided model must be specified in one of the following styles:

  • input: "data"
    input_shape
    {
        dim: 1
        dim: 3
        dim: 227
        dim: 227
    }
  • input: "data"
    input_shape
    {
        dim: 1
        dim: 3
        dim: 600
        dim: 1000
    }
    input: "im_info"
    input_shape
    {
         dim: 1
         dim: 3
    }
  • layer
    {
        name: "data"
        type: "Input"
        top: "data"
        input_param {shape: {dim: 1 dim: 3 dim: 600 dim: 1000}}
    }
    layer
    {
        name: "im_info"
        type: "Input"
        top: "im_info"
        input_param {shape: {dim: 1 dim: 3}}
    }
  • input: "data"
    input_dim: 1
    input_dim: 3
    input_dim: 500

However, if your model contains more than one input, the Model Optimizer is able to convert the model with inputs specified in the first, the second, and the third forms from the list above. The last form is not supported for multi-input topologies.

9. What does the message "Mean file for topologies with multiple inputs is not supported" mean?

Model Optimizer does not support mean file processing for topologies with more than one input. In this case, you need to perform preprocessing of the inputs for a generated Intermediate Representation in the Inference Engine to perform subtraction for every input of your multi-input model.

10. What does the message "Cannot load or process mean file: value error" mean?

There are multiple reasons why the Model Optimizer does not accept the mean file. See FAQ #4, #5, and #6.

11. What does the message "Invalid prototxt file: value error" mean?

There are multiple reasons why the Model Optimizer does not accept a Caffe* topology. See FAQs #7 and #20.

12. What does the message "Error happened while constructing caffe.Net in the Caffe fallback function" mean?

Model Optimizer tried to infer a specified layer via the Caffe* framework, however it cannot construct a net using the Caffe Python* interface. Make sure that your .caffemodel and .prototxt files are correct. To make sure that the problem is not in the .prototxt file, see FAQ #2.

13. What does the message "Cannot infer shapes due to exception in Caffe" mean?

Model Optimizer tried to infer a custom layer via the Caffe* framework, however an error occurred, meaning that the model could not be inferred using the Caffe. It might happen if you try to convert the model with some noise weights and biases resulting in problems with layers with dynamic shapes. You should write your own extension for every custom layer you topology might have. For more details, refer to Extending the Model Optimizer with New Primitives.

14. What does the message "Cannot infer shape for node {} because there is no Caffe available. Please register python infer function for op or use Caffe for shape inference" mean?

Your model contains a custom layer and you have correctly registered it with the CustomLayersMapping.xml file. These steps are required to offload shape inference of the custom layer with the help of the system Caffe*. However, the Model Optimizer could not import a Caffe package. Make sure that you have built Caffe with a pycaffe target and added it into the PYTHONPATH environment variable. For more information, please refer to the Configuring the Model Optimizer. At the same time, it is highly recommend to avoid dependency on Caffe and write your own Model Optimizer extension for your custom layer. For more information, refer to the FAQ #44.

15. What does the message "Framework name can not be deduced from the given options. Use --framework to choose one of Caffe, TensorFlow, MXNet" mean?

You have run the Model Optimizer without a flag --framework caffe|tf|mxnet. Model Optimizer tries to deduce the framework by the input model file extension (.pb for TensorFlow*, .caffemodel for Caffe*, .params for MXNet*). Your input model might have a different extension and you need to explicitly set the source framework. For example, use --framework caffe.

16. What does the message "Input shape is required to convert MXNet model. Please provide it with --input_shape" mean?

Input shape was not provided. That is mandatory for converting an MXNet* model to the Intermediate Representation, because MXNet models do not contain information about input shapes. Please, use the --input_shape flag to specify it. For more information about using the --input_shape, refer to the FAQ #56.

17. What does the message "Both --mean_file and mean_values are specified. Specify either mean file or mean values" mean?

--mean_file and --mean_values are two ways of specifying preprocessing for the input. However, they cannot be used together, as it would mean double subtraction and lead to ambiguity. Choose one of these options and pass it using the corresponding CLI option.

18. What does the message "Negative value specified for --mean_file_offsets option. Please specify positive integer values in format '(x,y)'" mean?

You might have specified negative values with --mean_file_offsets. Only positive integer values in format '(x,y)' must be used.

19. What does the message "Both --scale and --scale_values are defined. Specify either scale factor or scale values per input channels" mean?

--scale sets a scaling factor for all channels. --scale_values sets a scaling factor per each channel. Using both of them simultaneously produces ambiguity, so you must use only one of them. For more information, refer to the Using Framework-Agnostic Conversion Parameters section.

20. What does the message "Cannot find prototxt file: for Caffe please specify --input_proto - a protobuf file that stores topology and --input_model that stores pretrained weights" mean?

Model Optimizer cannot find a .prototxt file for a specified model. By default, it must be located in the same directory as the input model with the same name (except extension). If any of these conditions is not satisfied, use --input_proto to specify the path to the .prototxt file.

21. What does the message "Failed to create directory .. . Permission denied!" mean?

Model Optimizer cannot create a directory specified via --output_dir. Make sure that you have enough permissions to create the specified directory.

22. What does the message "Discovered data node without inputs and value" mean?

One of the layers in the specified topology might not have inputs or values. Please make sure that the provided .caffemodel and .protobuf files are correct.

23. What does the message "Part of the nodes was not translated to IE. Stopped" mean?

Some of the layers are not supported by the Model Optimizer and cannot be translated to an Intermediate Representation. You can extend the Model Optimizer by adding new primitives. For more information, refer to Extending the Model Optimizer with New Primitives page.

24. What does the message "While creating an edge from .. to .. : node name is undefined in the graph. Check correctness of the input model" mean?

Model Optimizer cannot build a graph based on a specified model. Most likely, it is incorrect.

25. What does the message "Node does not exist in the graph" mean?

You might have specified an output node via the --output flag that does not exist in a provided model. Make sure that the specified output is correct and this node exists in the current model.

26. What does the message "--input parameter was provided. Other inputs are needed for output computation. Provide more inputs or choose another place to cut the net" mean?

Most likely, the Model Optimizer tried to cut the model by a specified input. However, other inputs are needed.

27. What does the message "Placeholder node does not have an input port, but input port was provided" mean?

You might have specified a placeholder node with an input node, while the placeholder node does not have it the model.

28. What does the message "Port index is out of number of available input ports for node" mean?

This error occurs when an incorrect input port is specified with the --input command line argument. When using --input, you can optionally specify an input port in the form: X:node_name, where X is an integer index of the input port starting from 0 and node_name is the name of a node in the model. This error occurs when the specified input port X is not in the range [0,(n-1)], where n is the number of input ports for the node. Please, specify a correct port index or do not use it if it is not needed.

29. What does the message "Node has more than 1 input and input shapes were provided. Try not to provide input shapes or specify input port with PORT:NODE notation, where PORT is an integer" mean?

This error occurs when an incorrect combination of the --input and --input_shape command line options is used. Using both --input and --input_shape is valid only if --input points to the Placeholder node, a node with one input port or --input has the form PORT:NODE, where PORT is an integer port index of input for node NODE. Otherwise, the combination of --input and --input_shape is incorrect.

30. Input port > 0 in --input is not supported if --input_shape is not provided. Node: NAME_OF_THE_NODE. Omitted port index and all input ports will be replaced by placeholders. Or provide --input_shape

When using the PORT:NODE notation for the --input command line argument and PORT > 0, you should specify --input_shape for this input. This is a limitation of the current Model Optimizer implementation.

31. What does the message "No or multiple placeholders in the model, but only one shape is provided, cannot set it" mean?

Looks like you have provided only one shape for the placeholder, however there are no or multiple inputs in the model. Please, make sure that you have provided correct data for placeholder nodes.

32. What does the message "The amount of input nodes for port is not equal to 1" mean?

This error occurs when the SubgraphMatch.single_input_node function is used for an input port that supplies more than one node in a sub-graph. The single_input_node function can be used only for ports that has a single consumer inside the matching sub-graph. When multiple nodes are connected to the port, use the input_nodes function or node_by_pattern function instead of single_input_node. Please, refer to Sub-Graph Replacement in the Model Optimizer for more details.

33. What does the message "Output node for port has already been specified" mean?

This error occurs when the SubgraphMatch._add_output_node function is called manually from user's extension code. This is an internal function, and you should not call it directly.

34. What does the message "Unsupported match kind.... Match kinds "points" or "scope" are supported only" mean?

While using configuration file to implement a TensorFlow* front replacement extension, an incorrect match kind was used. Only points or scope match kinds are supported. Please, refer to Sub-Graph Replacement in the Model Optimizer for more details.

35. What does the message "Cannot write an event file for the TensorBoard to directory" mean?

Model Optimizer tried to write an event file in the specified directory but failed to do that. That could happen because the specified directory does not exist or you do not have enough permissions to write in it.

36. What does the message "There is no registered 'infer' function for node with op = .. . Please implement this function in the extensions" mean?

Most likely, you tried to extend Model Optimizer with a new primitive, but did not specify an infer function. For more information on extensions, see Extending the Model Optimizer with New Primitives.

37. What does the message "Stopped shape/value propagation at node" mean?

Model Optimizer cannot infer shapes or values for the specified node. It can happen because of a bug in the custom shape infer function, because the node inputs have incorrect values/shapes, or because the input shapes are incorrect.

38. What does the message "The input with shape .. does not have the batch dimension" mean?

Batch dimension is the first dimension in the shape and it should be equal to 1 or undefined. In your case, it is not equal to either 1 or undefined, which is why the -b shortcut produces undefined and unspecified behavior. To resolve the issue, specify full shapes for each input with the --input_shape option. Run Model Optimizer with the --help option to learn more about the notation for input shapes.

39. What does the message "Not all output shapes were inferred or fully defined for node" mean?

Most likely, the shape is not defined (partially or fully) for the specified node. You can use --input_shape with positive integers to override model input shapes.

40. What does the message "Shape for tensor is not defined. Can not proceed" mean?

This error occurs when the --input command line option is used to cut a model and --input_shape is not used to override shapes for a node and a shape for the node cannot be inferred by Model Optimizer. You need to help Model Optimizer and specify shapes with --input_shape for each node that is specified with the --input command line option.

41. What does the message "Module TensorFlow was not found. Please install TensorFlow 1.2 or higher" mean?

To convert TensorFlow* models with Model Optimizer, TensorFlow* 1.2 or newer must be installed. For more information on prerequisites, see Configuring the Model Optimizer.

42. What does the message "Cannot read the model file: it is incorrect TensorFlow model file or missing" mean?

The model file should contain a frozen TensorFlow* graph in the text or binary format. Make sure that --input_model_is_text is provided for a model in the text format. By default, a model is interpreted as binary file.

43. What does the message "Cannot pre-process TensorFlow graph after reading from model file. File is corrupt or has unsupported format" mean?

Most likely, there is a problem with the specified file for model. The file exists, but it has bad formatting or is corrupted.

44. What does the message "Found custom layer. Model Optimizer does not support this layer. Please, register it in CustomLayersMapping.xml or implement extension" mean?

This means that the layer {layer_name} is not supported in the Model Optimizer. You can find a list of all unsupported layers in the corresponding section. You should add this layer to CustomLayersMapping.xml (Legacy Mode for Caffe* Custom Layers) or implement the extensions for this layer (Extending the Model Optimizer with New Primitives).

45. What does the message "Custom replacement configuration file does not exist" mean?

Path to the custom replacement configuration file was provided with the --tensorflow_use_custom_operations_config flag, but the file could not be found. Please, make sure that the specified path is correct and the file exists.

46. What does the message "Extractors collection have case insensitive duplicates" mean?

When extending the Model Optimizer with new primitives keep in mind that their names are case insensitive. Most likely, another operation with the same name is already defined. For more information, see Extending the Model Optimizer with New Primitives.

47. What does the message "Input model name is not in an expected format, cannot extract iteration number" mean?

Model Optimizer can not load an MXNet* model in the specified file format. Please, use the .json or .param format.

48. What does the message "Cannot convert type of placeholder because not all of its outputs are 'Cast' to float operations" mean?

There are models where Placeholder has the UINT8 type and the first operation after it is 'Cast', which casts the input to FP32. Model Optimizer detected that the Placeholder has the UINT8 type, but the next operation is not 'Cast' to float. Model Optimizer does not support such a case. Please, change the model to have placeholder FP32 data type.

49. What does the message "Data type is unsupported" mean?

Model Optimizer cannot convert the model to the specified data type. Currently, FP16 and FP32 are supported. Please, specify the data type with the --data_type flag. The available values are: FP16, FP32, half, float.

50. What does the message "No node with name ..." mean?

Model Optimizer tried to access a node that does not exist. This could happen if you have incorrectly specified placeholder, input or output node name.

51. What does the message "Module mxnet was not found. Please install MXNet 1.0.0" mean?

To convert MXNet* models with Model Optimizer, MXNet 1.0.0 must be installed. For more information about prerequisites, see Configuring the Model Optimizer.

52. What does the message "The following error happened while loading MXNet model .." mean?

Most likely, there is a problem with loading of the MXNet* model. Please, make sure that the specified path is correct, the model exists, it is not corrupted, and you have sufficient permissions to work with it.

53. What does the message "The following error happened while processing input shapes: .." mean?

Please, make sure that inputs are defined and have correct shapes. You can use --input_shape with positive integers to override model input shapes.

54. What does the message "Attempt to register of custom name for the second time as class. Note that custom names are case-insensitive" mean?

When extending the Model Optimizer with new primitives keep in mind that their names are case insensitive. Most likely, another operation with the same name is already defined. For more information, see Extending the Model Optimizer with New Primitives .

55. What does the message "Both --input_shape and --batch were provided. Please, provide only one of them" mean?

You cannot specify the batch and the input shape at the same time. You should specify a desired batch as the first value of the input shape.

56. What does the message "Input shape .. cannot be parsed" mean?

The specified input shape cannot be parsed. Please, define it in one of the following ways:

  • python3 mo.py --input_model <INPUT_MODEL>.caffemodel --input_shape (1,3,227,227)
  • python3 mo.py --input_model <INPUT_MODEL>.caffemodel --input_shape [1,3,227,227]
  • In case of multi input topology you should also specify inputs:
    python3 mo.py --input_model /path-to/your-model.caffemodel --input data,rois --input_shape (1,3,227,227),(1,6,1,1)

Keep in mind that there is no space between and inside the brackets for input shapes.

57. What does the message "Please provide input layer names for input layer shapes" mean?

When specifying input shapes for several layers, you must provide names for inputs, whose shapes will be overwritten. For usage examples, see Converting a Caffe* Model. Additional information for --input_shape is in FAQ #56.

58. What does the message "Values cannot be parsed" mean?

Mean values for the given parameter cannot be parsed. It should be a string with a list of mean values. For example, in '(1,2,3)', 1 stands for the RED channel, 2 for the GREEN channel, 3 for the BLUE channel.

59. What does the message ".. channels are expected for given values" mean?

The number of channels and the number of given values for mean values do not match. The shape should be defined as '(R,G,B)' or '[R,G,B]'. The shape should not contain undefined dimensions (? or -1). The order of values is as follows: (value for a RED channel, value for a GREEN channel, value for a BLUE channel).

60. What does the message "You should specify input for each mean value" mean?

Most likely, you have not specified inputs using --mean_values. Please, specify inputs with the --input flag. For usage examples, please, refer to FAQ #62.

61. What does the message "You should specify input for each scale value" mean?

Most likely, you have not specified inputs using --scale_values. Please, specify inputs with the --input flag. For usage examples, please, refer to FAQ #63.

62. What does the message "Number of inputs and mean values does not match" mean?

The number of specified mean values and the number of inputs must be equal. Please, refer to Converting a Caffe* Model for a usage example.

63. What does the message "Number of inputs and scale values does not match" mean?

The number of specified scale values and the number of inputs must be equal. Please, refer to Converting a Caffe* Model for a usage example.

64. What does the message "No class registered for match kind ... Supported match kinds are .. " mean?

A replacement defined in the configuration file for sub-graph replacement using node names patterns or start/end nodes has the match_kind attribute. The attribute may have only one of the values: scope or points. If a different value is provided, this error is displayed.

65. What does the message "No instance(s) is(are) defined for the custom replacement" mean?

A replacement defined in the configuration file for sub-graph replacement using node names patterns or start/end nodes has the instances attribute. This attribute is mandatory, and it causes this error if it is missing. Refer to documentation with a description of the sub-graph replacement feature.

66. What does the message "The instance must be a single dictionary for the custom replacement with id .." mean?

A replacement defined in the configuration file for sub-graph replacement using start/end nodes has the instances attribute. For this type of replacement, the instance must be defined with a dictionary with two keys start_points and end_points. Values for these keys are lists with the start and end node names, respectively. Refer to documentation with a description of the sub-graph replacement feature.

67. What does the message "No instances are defined for replacement with id .. " mean?

A replacement for the specified id is not defined in the configuration file. Please, refer to FAQ #65 for more information.

68. What does the message "Custom replacements configuration file .. does not exist" mean?

Path to a custom replacement configuration file was provided with the --tensorflow_use_custom_operations_config flag, but it cannot be found. Please, make sure that the specified path is correct and the file exists.

69. What does the message "Failed to parse custom replacements configuration file .." mean?

The file for custom replacement configuration provided with the --tensorflow_use_custom_operations_config flag cannot be parsed. In particular, it should have a valid JSON structure. For more details, refer to JSON Schema Reference.

70. One of the custom replacements in the configuration file .. does not contain attribute id

Every custom replacement should declare a set of mandatory attributes and their values. For more details, refer to FAQ #72.

71. What does the message "File .. validation failed" mean?

The file for custom replacement configuration provided with the --tensorflow_use_custom_operations_config flag cannot pass validation. Make sure that you have specified id, instances and match_kind for all the patterns.

72. What does the message "Cannot update the file .. because it is broken" mean?

The custom replacement configuration file provided with the --tensorflow_custom_operations_config_update cannot be parsed. Please, make sure that the file is correct and refer to FAQs #68, #69, #70, and #71.

73. What does the message "End node .. is not reachable from start nodes: .." mean?

This error occurs when you try to make a sub-graph match. It is detected that between the start and end nodes that were specified as inputs/outputs of the subgraph to find, there are nodes that are marked as outputs but there is no path from them to the input nodes. Make sure that the subgraph you want to match does actually contain all the specified output nodes.

74. What does the message "Sub-graph contains network input node .." mean?

Start or end node for the sub-graph replacement using start/end nodes is specified incorrectly. Model Optimizer finds internal nodes of the sub-graph strictly "between" the start and end nodes. Then it adds all input nodes to the sub-graph (and inputs of their inputs and so on) for these "internal" nodes. The error reports, that the Model Optimizer reached input node during this phase. This means that the start/end points are specified incorrectly in the configuration file. Refer to documentation with a description of the sub-graph replacement feature.

75. What does the message "... elements of ... were clipped to infinity while converting a blob for node [...] to ..." mean?

This message may appear when the --data_type=FP16 command line option is used. This option implies conversion of all the blobs in the node to FP16. If a value in a blob is out of the range of valid FP16 values, the value is converted to positive or negative infinity. It may lead to incorrect results of inference or may not be a problem, depending on the model. The number of such elements and the total number of elements in the blob is printed out together with the name of the node, where this blob is used.

76. What does the message "... elements of ... were clipped to zero while converting a blob for node [...] to ..." mean?

This message may appear when the --data_type=FP16 command line option is used. This option implies conversion of all blobs in the mode to FP16. If a value in the blob is so close to zero that it cannot be represented as a valid FP16 value, it is converted to a true zero FP16 value. Depending on the model, it may lead to incorrect results of inference or may not be a problem. The number of such elements and the total number of elements in the blob are printed out together with a name of the node, where this blob is used.

77. What does the message "The amount of nodes matched pattern ... is not equal to 1" mean?

This error occurs when the SubgraphMatch.node_by_pattern function is used with a pattern that does not uniquely identify a single node in a sub-graph. Try to extend the pattern string to make unambiguous match to a single sub-graph node. For more details, refer to Sub-graph Replacement in the Model Optimizer.

78. What does the message "The topology contains no "input" layers" mean?

Your Caffe* topology .prototxt file is intended for training. Model Optimizer expects a deployment-ready .prototxt file. To fix the problem, prepare a deployment-ready .prototxt file. Usually, preparation of a deploy-ready topology results in removing data layer(s), adding input layer(s), and removing loss layer(s).

79. What does the message "Warning: please expect that Model Optimizer conversion might be slow" mean?

You are using an unsupported Python* version. Use only versions 3.4 - 3.6 for the C++ protobuf implementation that is supplied with the Intel Distribution of OpenVINO toolkit. You can still boost conversion speed by building protobuf library from sources. For complete instructions about building protobuf from sources, see Building the protobuf Library on Windows* OS.

80. What does the message "Arguments --nd_prefix_name, --pretrained_model_name and --input_symbol should be provided. Please provide all or do not use any." mean?

This error occurs if you do not provide --nd_prefix_name, --pretrained_model_name, and --input_symbol parameters. Model Optimizer requires both .params and .nd model files to merge into the result file (.params). Topology description (.json file) should be prepared (merged) in advance and provided with --input_symbol parameter.

If you add additional layers and weights that are in .nd files to your model, the Model Optimizer can build a model from one .params file and two additional .nd files (*_args.nd, *_auxs.nd). To do that, provide both CLI options or do not pass them if you want to convert an MXNet model without additional weights. For more information, refer to Converting a MXNet* Model.

81. What does the message "You should specify input for mean/scale values" mean?

In case when the model has multiple inputs and you want to provide mean/scale values, you need to pass those values for each input. More specifically, a number of passed values should be the same as the number of inputs of the model. For more information, refer to Using Framework-Agnostic Conversion Parameters.

82. What does the message "Input with name ... not found!" mean?

When you passed the mean/scale values and specified names of input layers of the model, you might have used the name that does not correspond to any input layer. Make sure that by passing values with --input option, you list only names of the input layers of your model. For more information, refer to the Using Framework-Agnostic Conversion Parameters.

83. What does the message "Specified input json ... does not exist" mean?

Most likely, .json file does not exist or has a name that does not match the notation of MXNet. Make sure that the file exists and it has a correct name. For more information, refer to Using the Model Optimizer to Convert MXNet* Models.

84. What does the message "Unsupported Input model file type ... Model Optimizer support only .params and .nd files format" mean?

Model Optimizer for MXNet supports only .params and .nd files formats. Most likely, you specified some unsupported file format in --input_model. For more information, refer to Using the Model Optimizer to Convert MXNet* Models.

85. What does the message "Operation ... not supported. Please register it as custom op" mean?

Model Optimizer tried to load the model that contains some unsupported operations. If you want to convert model that contains unsupported operations you need to prepare extension for all such operations. For more information, refer to Extending the Model Optimizer with New Primitives.

86. What does the message "Can not register Op ... Please, call function 'register_caffe_python_extractor' with parameter 'name'" mean?

This error appears if the class of implementation of op for Python Caffe layer could not be used by Model Optimizer. Python layers should be handled differently compared to ordinary Caffe layers.

In particular, you need to call the function register_caffe_python_extractor and pass name as the second argument of the function. The name should be the compilation of the layer name and the module name separated by a dot.

For example, your topology contains this layer with type Python:

layer {
  name: 'proposal'
  type: 'Python'
  ...
  python_param {
    module: 'rpn.proposal_layer'
    layer: 'ProposalLayer'
    param_str: "'feat_stride': 16"
  }
}

What you do first is implementing an extension for this layer in the Model Optimizer as an ancestor of Op class.

class ProposalPythonExampleOp(Op):
       op = 'Proposal'

       def __init__(self, graph: nx.MultiDiGraph, attrs: dict):
           ...

It is mandatory to call two functions right after the implementation of that class:

class ProposalPythonExampleOp(Op):
      ...

register_caffe_python_extractor(ProposalPythonExampleOp, 'rpn.proposal_layer.ProposalLayer')
Op.excluded_classes.append(ProposalPythonExampleOp)

Note that the first call register_caffe_python_extractor(ProposalPythonExampleOp, 'rpn.proposal_layer.ProposalLayer') registers extension of the layer in the Model Optimizer that will be found by the specific name (mandatory to join module name and layer name): rpn.proposal_layer.ProposalLayer.

The second call prevents Model Optimizer from using this extension as if it is an extension for a layer with type Proposal. Otherwise, this layer can be chosen as an implementation of extension that can lead to potential issues. For more information, refer to the Extending Model Optimizer with New Primitives.

87. What does the message "Model Optimizer is unable to calculate output shape of Memory node .." mean?

Model Optimizer supports only Memory layers, in which input_memory goes before ScaleShift or FullyConnected layer.
This error message means that in your model the layer after input memory is not of type ScaleShift or FullyConnected. This is a known limitation.

88. What do the messages "File ... does not appear to be a Kaldi file (magic number does not match)", "Kaldi model should start with <Nnet> tag" mean?

These error messages mean that the Model Optimizer does not support your Kaldi* model, because check sum of the model is not 16896 (the model should start with this number) or model file does not contain tag <Net> as a starting one. Double check that you provide a path to a true Kaldi model and try again.

89. What do the messages "Expect counts file to be one-line file." or "Expect counts file to contain list of integers" mean?

These messages mean that you passed the file counts containing not one line. The count file should start with [ and end with ], and integer values should be separated by space between those signs.

90. What does the message "Model Optimizer is not able to read Kaldi model .." mean?

There are multiple reasons why the Model Optimizer does not accept a Kaldi topology: file is not available or does not exist. Refer to FAQ #88.

91. What does the message "Model Optimizer is not able to read counts file .." mean?

There are multiple reasons why the Model Optimizer does not accept a counts file: file is not available or does not exist. Refer to FAQ #89.

92. What does the message "For legacy MXNet models Model Optimizer does not support conversion of old MXNet models (trained with 1.0.0 version of MXNet and lower) with custom layers." mean?

This message means that if you have model with custom layers and its json file has been generated with MXNet version lower than 1.0.0, Model Optimizer does not support such topologies. If you want to convert it you have to rebuld MXNet with unsupported layers or generate new json with MXNet version 1.0.0 and higher. Also you need to implement Inference Engine extension for used custom layers. For more information, refer to the Extending the Model Optimizer with New Primitives section.

93. What does the message "Graph contains a cycle. Can not proceed .." mean?

Model Optimizer supports only straightforward models without cycles.

There are multiple ways to avoid cycles:

For Tensorflow:

For all frameworks:

  1. Replace cycle containing Sub-graph in Model Optimizer
  2. Extend Model Optimizer with New Primitives from first step

or

  • Edit network in original framework to exclude cycle.
94. What does the message "Can not transpose attribute '..' with value .. for node '..' .." mean?

This message means that model is not supported. It may be caused by using shapes larger than 4-D. There are two ways to avoid such message:

  1. Cut model part containing such layers in Model Optimizer
  2. Edit network in original framework to exclude such layers.
95. What does the message "Expected token </ParallelComponent>, has ..." mean?

This error messages mean that Model Optimizer does not support your Kaldi model, because the Net contains ParallelComponent that does not end by tag </ParallelComponent>. Double check that you provide a path to a true Kaldi model and try again.

96. What does the message "Interp layer shape inference function may be wrong, please, try to update layer shape inference function in the file (extensions/ops/interp.op at the line ...)." mean?

There are many flavors of Caffe framework, and most layers in them are implemented identically. But there are exceptions. For example, output value of layer Interp is calculated differently in Deeplab-Caffe and classic Caffe. So if your model contain layer Interp and converting of your model has failed, please modify the interp_infer function in the file extensions/ops/interp.op according to the comments of the file.

97. What does the message "Mean/scale values should ..." mean?

It means that your mean/scale values have wrong format. Specify mean/scale values using the form layer_name(val1,val2,val3). You need to specify values for each input of the model. For more information, refer to Using Framework-Agnostic Conversion Parameters.

Known Issues

Multiple OpenMP Loadings

If the application uses the Inference Engine with third-party components that depend on OpenMP*, multiple loadings of the libiomp library may occur and cause OpenMP runtime initialization conflicts. This may happen, for example, if the application uses Intel® Math Kernel Library (Intel® MKL) through the “Single Dynamic Library” (libmkl_rt.so) mechanism and calls Intel MKL after loading an Inference Engine plugin. The error log looks as follows:

	OMP: Error #15: Initializing libiomp5.so, but found libiomp5.so already initialized.
OMP: Hint: This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.

Possible workarounds:

  • Preload the OpenMP runtime using the LD_PRELOAD variable:
    LD_PRELOAD=<path_to_libiomp5.so> <path_to your_executable>

    This eliminates multiple loadings of libiomp, and makes all the components use this specific version of OpenMP.

  • Set KMP_DUPLICATE_LIB_OK=TRUE. However, performance degradation or results incorrectness may occur in this case.

Old proto compiler breaks protobuf library

With Python protobuf library version 3.5.1, the following incompatibility can happen. The known case is for Cent OS 7.4

The error log looks as follows:

	File "../lib64/python3.5/site-packages/google/protobuf/descriptor.py", line 829, in _new_
return _message.default_pool.AddSerializedFile(serialized_pb)
TypeError: expected bytes, str found

Possible workaround is to upgrade default protobuf compiler (libprotoc 2.5.0) to newer version, for example, libprotoc 2.6.1.

 

Legal Information

You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel products described herein. You agree to grant Intel a non-exclusive, royalty-free license to any patent claim thereafter drafted which includes subject matter disclosed herein.

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.

All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest Intel product specifications and roadmaps.

The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.

Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn more at http://www.intel.com/ or from the OEM or retailer.

No computer system can be absolutely secure.

Intel, Arria, Core, Movidius, Pentium, Xeon, and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.

OpenCL and the OpenCL logo are trademarks of Apple Inc. used with permission by Khronos.

*Other names and brands may be claimed as the property of others.

Copyright © 2018, Intel Corporation. All rights reserved.

 

For more complete information about compiler optimizations, see our Optimization Notice.