Release Notes for Intel® Distribution of OpenVINO™ Toolkit 2021.4 LTS

By Andrey Zaytsev,

Published: 06/28/2021


The Intel® Distribution of OpenVINO™ toolkit is a comprehensive toolkit for quickly developing applications and solutions that solve a variety of tasks including emulation of human vision, automatic speech recognition, natural language processing, recommendation systems, and many others. Based on latest generations of artificial neural networks, including Convolutional Neural Networks (CNNs), recurrent and attention-based networks, the toolkit extends computer vision and non-vision workloads across Intel® hardware, maximizing performance. It accelerates applications with high-performance, AI and deep learning inference deployed from edge to cloud.

The Intel® Distribution of OpenVINO™ toolkit:

  • Enables deep learning inference from the edge to cloud.
  • Supports heterogeneous execution across Intel accelerators, using a common API for the Intel® CPU, Intel® GPU, Intel® Gaussian & Neural Accelerator, Intel® Neural Compute Stick 2, Intel® Vision Accelerator Design with Intel® Movidius™ VPUs.
  • Speeds time-to-market through an easy-to-use library of CV functions and pre-optimized kernels.
  • Includes optimized calls for CV standards, including OpenCV* and OpenCL™.

New and Changed in Release 2021.4 LTS

Major Features and Improvements

  • This new 2021.4 Long-Term Support (LTS) Release provides bug fixes, longer-term maintenance and support with a focus on stability and compatibility enabling developers to deploy applications powered by Intel® Distribution of OpenVINO™ toolkit with confidence. A new LTS version is released every year and supported for two years. For those developers that prefer the very latest features and leading performance, standard releases will continue to be made available 3-4 times a year. Read more about the long-term support and maintenance, go to the Long Term Support Policy.
  • New Jupyter Notebooks, demos and support for additional public models to make development easier:
    • Ready-to-run Jupyter Notebooks with tutorials for converting TensorFlow and PyTorch models, image classification, segmentation, depth estimation, post-training quantization and more.
    • Audio Noise Suppression & Time Series Forecasting demos
    • Public Models: RCAN and IseeBetter (image super-resolution), Attention OCR (image text prediction), Tacotron 2 (text-to-speech) and ModNet (portrait/image matting)
  • Time-to-first-inference latency performance enhancements: Initialization has been optimized on CPU and integrated GPU (iGPU), significantly improving performance at inferencing startup. Setting up inferencing always involves additional initialization time as the network is loaded and configured on the device, especially on GPUs due to their architecture. This setup time has been reduced significantly for many networks by doing more initialization work in parallel among other optimizations.

  • Preview of OpenVINO ™ integration with TensorFlow: Although not a part of the 2021.4 LTS release, a new open source component called the OpenVINO™ integration with TensorFlow is available as a public preview. This component is designed for TensorFlow developers newly exploring OpenVINO™ toolkit to try it with minimal code changes, maximizing TensorFlow API compatibility. For highest performance, lowest memory footprint and complete hardware control, adopting native OpenVINO APIs continues to be the recommended approach.

Support Change and Deprecation Notices

  • The following deprecated Inference Engine APIs will be removed in 2022.1:

    • ExecutableNetwork::QueryState - use InferRequest::QueryState instead
    • IVariableState / IMemoryState interface - use VariableState wrapper instead
    • VariableState::GetLastState - use VariableState::GetState instead
    • Helpers functions working with UNICODE symbols: fileNameToString, stringToFileName
    • InferenceEngine::Parameter creation from ngraph::Variant, casting to ngraph::Variant shared pointer, InferenceEngine::Parameter::asVariant method.
    • ngraph::Node::get_output_tensor_name(), ngraph::Node::get_input_tensor_name(), ngraph::description::Tensor::get_name(), ngraph::description::Tensor::set_name() - Use ngraph::description::Tensor::get_names() instead.
    • ngraph::runtime::Tensor::get_name(), ngraph::runtime::Tensor::get_scale(), ngraph::runtime::Tensor::set_scale(), ngraph::runtime::Tensor::wait_for_read_ready(), ngraph::runtime::Tensor::wait_for_write_ready() - these methods will be removed without new analogues.
    • The following deprecated Inference Engine Python APIs will be removed:
      • IENetwork:
        • constructor for reading networks from a file/buffer.
        • property inputs - use input_info instead
      • InferRequest:
        • property inputs - use input_blobs instead
        • property outputs - use output_blobs instead
      • ExecutableNetwork:
        • property inputs - use input_info instead
  • FPGA deprecation notice:
Deprication Begins June 29, 2021
Removal Date October 2021
  • Intel® is transitioning to the next-generation programmable deep learning solution, which will be called Intel® FPGA AI Suite and will support OpenVINO™ toolkit when productized.
  • As part of this transition, 2020.3.2 LTS was the final release to include support for Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA and the Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA. Now support offered in 2020.3.2 LTS for FPGA is coming to an end.
  • Any customer inquiries regarding Intel® FPGA AI Suite should be directed to your Intel Programmable Solutions Group account manager or subscribe to get notified with the latest updates.

  2020.3.2 LTS (Final Release to Include FPGA Support) 2021.4 LTS Release (June 29, 2021) October 2021



•Intel® Vision Accelerator Design with an Intel® Arria® 10

•The Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA

OpenVINO™ toolkit will not support FPGAs independent of a future Intel® FPGA AI Suite product OpenVINO™ toolkit will not support FPGAs independent of a future Intel® FPGA AI Suite product

Model Optimizer

  • Common changes:
    • Aligned requirements files with other OpenVINO tools to eliminate any conflicts
    • Fixed tensor names propagation to output nodes in Tensorflow*, Kaldi*, Caffe*, MxNet*, which means that CNNNetwork::getOVNameForTensor() API now works properly for all frameworks
    • Added support of FP16 models with shape subgraphs in FP32 precision. This change enables reshape-ability for model with shape sizes greater than 65000 (maximum of fp16)
    • Added support for operation Gelu-7 with additional attribute “approximation_mode”
    • Added --transform key to allow additional transformation execution inside Model Optimizer. Currently we have only LowLatency2 transformation.
  • ONNX*:
    • Added support for the following operations:
      • Size – 1, 11

      • QuantizeLinear-13

      • DequantizeLinear-13

  • TensorFlow*:
    • Added support for the following operations:
      • Roll
      • GatherV2 operation with nonzero ‘batch_dims’ attribute

      • Einsum-7 with equations without ellipsis and repeated labels (for diagonal extraction)

      • FFT
      • iFTT
    • Extended support for TensorFlow* 1 while to support a body with the current iteration variable used by the general computation in addition to slicing and concatenation
  • MXNet*:
    • Added support for the following operations:
      • Roll
      • FFT
      • iFFT
      • Einsum-7 with equations without ellipsis and repeated labels (for diagonal extraction)

  • Kaldi*:
    • Added support for TimeHeightConvolutionComponent

Inference Engine

  • Model cache feature:
    • For devices supporting Import / Export API (GNA, MYRIAD), we have implemented automatic model cache feature. When enabled, loading of model does the following:
      • On very first application run, honest network compilation is performed, then compiled network is exported on hard disk.
      • On the second and other runs, loading does not perform actual network compilation and takes compiled model from models cache.
      • GPU: Enabling model caching feature enables caching of GPU/CLDNN kernels and this also significantly speeds up load network time for GPU.
    • Loading network from model file API is introduced to simplify coding on application side. This allows for faster performance when model cache feature is enabled, compared to explicitly reading model from IR and loading it from cache.
  • Heads Up: Transition to Intel® oneAPI oneTBB is on the way.
  • Common changes:
    • Added new metrics to QueryAPI:
      • DEVICE_TYPE 
    • Added I4/U4 Precisions
  • Deprecated API:

    • IInferRequest interface is deprecated, use InferRequest wrapper:
      • Constructor for InferRequest from IInferRequest:: Ptr is deprecated
      • Cast operator for InferRequest to IInferRequest shared pointer is deprecated
    • ICNNNetwork interface is deprecated by means of deprecation of all its methods, use CNNNetwork wrapper
    • CNNNetwork methods working with ICNNNetwork are deprecated:
      • Cast to ICNNNetwork shared pointer
      • Cast to reference to ICNNNetwork interface
      • Constructor from ICNNNetwork shared pointer
    • IExecutableNetwork is deprecated, use ExecutableNetwork wrappers:
      • Constructor of ExecutableNetwork from IExecutableNetwork shared pointer is deprecated
    • The following ExecutableNetwork methods are deprecated:
      • ExecutableNetwork::reset
      • Cast operator to IExecutableNetwork shared pointer
      • ExecutableNetwork::CreateInferRequestPtr - use ExecutableNetwork::CreateInferRequest instead
    • Version::ApiVersion structure is deprecated, Inference Engine does not have API version anymore
    • LowLatency - use lowLatency2 instead
    • CONFIG_KEY(DUMP_EXEC_GRAPH_AS_DOT) - use InferenceEngine::ExecutableNetwork::GetExecGraphInfo::serialize() instead
    • Core::ImportNetwork with no device - pass device name explicitly.
    • details::InferenceEngineException - use InferenceEngine::Exception and its derivatives instead.
    • InferenceEngine::make_so_pointer which is used to create Extensions library is replaced by std::make_shared<Extension>(..)
    • InferenceEngine::IExtension::Release is deprecated with no replacement
    • Use IE_DEFINE_EXTENSION_CREATE_FUNCTION helper macro instead of explicit declaration of CreateExtension function, which create extension.
    • GPU plugin configuration options:
      • KEY_CLDNN_NV12_TWO_INPUTS GPU plugin option. Use KEY_GPU_NV12_TWO_INPUTS instead
      • KEY_CLDNN_MEM_POOL GPU plugin option
      • KEY_CLDNN_GRAPH_DUMPS_DIR GPU plugin option
      • KEY_CLDNN_SOURCES_DUMPS_DIR GPU plugin option
      • KEY_DUMP_KERNELS GPU plugin option
      • KEY_TUNING_MODE GPU plugin option
      • KEY_TUNING_FILE GPU plugin option
  • Inference Engine Python API:
    • IECore
      • def load_network(network: str, str device_name, config=None, int num_requests=1) - reads and loads network from model file to device with one line of code. When model cache feature is enabled, it is performed faster than read and load due to save time on unnecessary reading network
    • Added VariableState API:
      • InferRequest:
        • def query_state ( ) - gets state control interface for given infer request. Returns a vector of VariableState objects.
      • VariableState:
        • def reset() - reset internal variable state for relevant infer request to a value specified as default for according ReadValue node. 
        • property state - get/set the value of the variable state.
        • property name - a string representing a state name.
  • CPU plugin:
    • The plugin was migrated on ngraph::function as an input graph representation instead of legacy one. This allowed getting rid of legacy API usage inside the plugin and opened up new opportunities for further improvements of first inference latency and memory consumption characteristics
    • Introduced automatic fallback on ngraph reference implementations (evaluate() method) for operation which are not directly supported by the plugin. The feature might serve for simplifying OV extensibility mechanism for new operations
    • Added support for new operations:
      • DFT-7
      • iDFT-7
      • Roll-7
      • Gather-7
      • GELU-7
    • Implemented several improvements for int8 inference pipeline including:
      • Int8 ConvolutionBackpropData (Deconvolution) layer support
      • Performance optimizations for mixed precision (FP32 + INT8) models by decreasing the number of layout permutations
    • Significantly improved performance of BF16 inference pipeline for LSTM based models
    • Added performance optimizations for operations: ShuffleChannels, DepthToSpace/SpaceToDepth, ExtractImagePatches, Mish, Concat, Transpose
    • Provided performance improvements for first inference latency on client platforms
    • Decreased memory consumption in latency scenario on client platforms
    • Please see a note on the Intel® oneAPI oneTBB in the Inference Engine section.
  • GPU plugin:
    • Most of the GPU plugin options were moved from cldnn/cldnn_config.hpp header to gpu/gpu_config.hpp
    • Added new GPU specific metrics:
    • Added new GPU plugin config options:
    • Added new possible value for OPTIMIZATION_CAPABILITIES metric: HW_MATMUL
    • Enabled parallel compilation of OCL kernels which improves first ever inference latency. Number of used threads can be controlled by KEY_GPU_MAX_NUM_THREADS plugin option
    • Added new fusion for eltwise pattern from yolo-v5 (non-linear eltwises chain in the middle of fused sequence)
    • Added support for the following new operations:
      • TensorIterator-0 (must be explicitly enabled by setting KEY_GPU_ENABLE_LOOP_UNROLLING plugin option to NO)
      • ScatterNDUpdate-3
      • GatherND-5
      • Gather-7
    • Performance improvements for:
      • permute primitive
      • reorder between some blocked ↔ planar layouts
  • MYRIAD plugin:
    • Performance fixes
      • Fixed most of the sporadic performance drops down to 10% by prioritizing runtime threads
    • Accuracy fixes
      • Accuracy and working capability on Yolo-v3 model through ONNX importer were fixed
      • Fixed Yolo-v5 accuracy
    • Operations
      • Fixed Average Pooling operation in case if there are paddings
      • Supported Interpolate operation with batch
    • MyriadX plugin
      • Enable cache support by adding IMPORT_EXPORT_SUPPORT to the list of supported metrics
      • Added option to disable preprocessing check inside the model
      • Fixed an issue when trying to run several MyriadX plugins in two or more threads
      • Fixed an issue related to challenges with importing precompiled blob with batch
      • Unlimited amount of DDR memory in the graph compiler and check it on a device directly
      • Fixed an issue related to challenges with loading a model when MyriadX plugin is being used with HETERO plugin
      • Fixed a memory leak in case there is a failure while booting a device
      • Fixed a memory leak in case a device can’t be booted from the first time, but successfully booted later
      • Fixed batch detection/removal logic in MYX plugin
    • Firmware
      • Fixed an issue when cache eviction would corrupt the read transfer payload
      • Fixed an issue on allocating memory for hardware operations when a device has more than 512Mb DDR memory
      • Fixed a hanging in case of transferring an input tensor with single size
    • myriad_perfcheck tool is deprecated and will be removed in 2022.1. Please, use benchmark_app tool instead
    • myriad_compile tool is deprecated and will be removed in 2022.1. Please, use compile_tool instead
  • HDDL plugin:
    • The same performance, accuracy, operations, and firmware fixes as in MyriadX plugin
    • hddl_perfcheck tool is deprecated and will be removed in 2022.1. Please, use benchmark_app tool instead
    • myriad_compile tool is deprecated and will be removed in 2022.1. Please, use compile_tool instead
    • Added SCALAR layout handling 
    • Aligned functionality with IE API
  • GNA plugin:
    • Introduced support for Fake Quantize layers which enabled GNA support for models after Post Training Optimization Tool

    • Extended support for Convolution and Eltwise operations which require splitting into several operations to be supported by GNA (to satisfy buffers size requirements)

    • Added support for MatMul operations which are mapped to GNA operations with the batch size greater than 8

    • Fixed bugs with LSTM cell unrolling

    • Fixed export/import of networks which require saving of inputs and outputs shape/layout information
    • Fixed concatenation of the layer with itself


  • Introduced opset7. The latest opset contains the new operations listed below. Not all OpenVINO™ toolkit plugins support the operations.
    • DFT-7
    • Einsum-7
    • Gather-7
    • Gelu-7
    • IDFT-7
    • Roll-7
  • Supports of i4/u4 element types.
  • Implemented public nGraph transformations:

    • LowLatency2
      LowLatency2 is a new version of LowLatency transformation that completely replaces the previous one. The transformation inserts Assign/ReadValue operations and connects them to TensorIterator/Loop operations, making step-by-step inference possible. The new version adds support for TensorIterator/Loop with multiple iterations and resolves serialization issues after applying the transformation. A vailable for CPU and GNA plugins.

  • Public nGraph API changes
    • Several new constructors have been added for ngraph::function class:
      • Constructors that take a list of Results/Sinks and automatically detect existing Parameters and Variables in the function.
      • Constructors that additionally take a list of Variables.
    • add_variables, remove_variable, get_variables, get_variable_by_id methods have been added.
    • The class VariableContext has been introduced to work with the memory mechanism in nGraph, supported a functionality to initialize/update/reset State. 'evaluate' methods have been updated with EvaluationContext argument.

Post-Training Optimization Tool (POT)

  • Introduced the quantization support for GNA via POT SW API. Added the Python speech sample which demonstrates the quantization of models from the Kaldi framework for GNA accelerator. 
  • Telemetry support in POT. If the install process records an "accept" to send the telemetry data, the following information is sent by POT: user interface (CLI or API), target device, compression method, drop type and maximal drop used for the AccuracyAware algorithm, accuracy drop and the number of reverted layers after the quantization, the subset size used for collecting statistics, engine type, value of tune hyperparameters and model type parameters.
  • Added INT8 support for the following operations:
    • Deconvolution/ConvolutionBackpropData
    • ConvertLike
  • Improved the description of INT8 quantization path in OpenVINO documentation
  • Extended models coverage: +40 INT8 models enabled

Neural Networks Compression Framework (NNCF)

  • NNCF for PyTorch v1.7.0 and v1.7.1 released:
    • Integration with OTE for Instance Segmentation, Custom Object Detection and Horizontal Text Detection cases.
    • Added 7-bit quantization for weights to avoid the saturation issue on non-VNNI CPU
    • Support for pruning of models with FCOS detection heads and instance normalization operations
    • Added a mean percentile initializer for the quantization algorithm
    • Added Adjust Padding feature to support accurate execution of INT4 for VPU
    • Removed the pattern-based quantizer setup mode for the quantization algorithm 
    • Support for PyTorch 1.8.1

Deep Learning Workbench

  • Support for TGL iGPU - full support including INT8 Calibration for TGL GPU
  • Streamlined support for non-annotated datasets
    • Support for import of user's own images (replacing autogeneration of dataset with random noise) - individually or by directory with Images
    • Augmentation options (horizontal/vertical flip, random pixel clipping, noise injection, contrast/brightness modification) allows to increase the size of non-annotated dataset based on user's images
    • User can run INT8 (Default) quantization with non-annotated dataset, thus able to measure roofline increase in performance even w/o regular dataset
    • User can visualize inference results over images from non-annotated dataset
  • OpenVINO JupyterLab extension. User can record actions in DL WB UI and get corresponding CLI commands to use in JupyterLab notebook (or e.g. use them directly in CLI). That is available to all major steps (MO conversion, Model Downloading, Profiling, Accuracy Measurement, INT8 calibration). 
  • Decreased DL Workbench image size from 3 Gb to 1.1 Gb. Though some additional target framework installation steps are now being run for any first import of the model of supported framework. 
  • Simplified downloading and starting procedure 
  • Extended MO conversion options: pipeline configs, support of grayscale images
  • Extended Yolo support (yolo3/4 family)
  • UI/UX improvements


  • Updated version to 4.5.3.
  • Added support of dynamically loaded UI backends (prebuilt GTK plugin included).
  • Added support of clDNN OpenCL kernels cache parameter.


  • Added the new Python sample (speech_sample), which demonstrates how to do a Synchronous Inference of acoustic model based on Kaldi* neural networks and speech feature vectors. The sample works with Kaldi ARK or Numpy* uncompressed NPZ files.
  • Added support Numpy* uncompressed NPZ files for C++ speech sample.
  • Improved C++/C/Python samples README with information of API and Feature coverage.

Open Model Zoo

Extended the Open Model Zoo with additional CNN-pretrained models and pre-generated Intermediate Representations (.xml + .bin). 

Replacing 2021.2 models:

  • machine-translation-nar-en-de-0002
  • ​machine-translation-nar-de-en-0002
  • text-recognition-0014
  • text-spotting-0005-detector
  • text-spotting-0005-recognizer-decoder
  • text-spotting-0005-recognizer-encoder


  • machine-translation-nar-en-de-0001
  • machine-translation-nar-de-en-0001
  • text-recognition-0013
  • text-spotting-0004-detector
  • text-spotting-0004-recognizer-decoder
  • text-spotting-0004-recognizer-encoder


  • common-sign-language-0002
  • face-reidentification-0095
  • noise-suppression-poconetlike-0001
  • text-recognition-0015-encoder
  • text-recognition-0015-decoder
  • text-to-speech-en-multi-0001-duration-prediction
  • text-to-speech-en-multi-0001-generation
  • text-to-speech-en-multi-0001-regression
  • time-series-forecasting-electricity-0001

The list of public models extended with the support for the following models:

Model Name Task Framework
bert-base-ner named entity recognition Pytorch
f3net salient object detection Pytorch
face-recognition-resnet100-arcface-onnx face reidentification MxNet
hrnet-w32-human-pose-estimation human pose estimation Pytorch
nfnet-f0 image classification Pytorch
pspnet-pytorch semantec segmentation Pytorch
quartznet-15x5-en speech recognition Pytorch
repvgg-a0 image classification Pytorch
repvgg-b1 image classification Pytorch
repvgg-b3 image classification Pytorch
retinaface-resnet50-pytorch object detection Pytorch
text-recognition-resnet-fc text recognition Pytorch
ultra-lightweight-face-detection-rfb-320 object detection Pytorch
ultra-lightweight-face-detection-slim-320 object detection Pytorch
yolo-v4-tiny-tf object detection TensorFlow


The following models were removed from the list of public models:

Model Name Task Framework
face-recognition-mobilefacenet-arcface face reidentification MxNet
face-recognition-resnet100-arcface face reidentification MxNet
face-recognition-resnet34-arcface face reidentification MxNet
face-recognition-resnet50-arcface face reidentification MxNet
retinaface-anti-cov object detection MxNet
retinaface-resnet50 object detection MxNet


Added  new  demo applications:

  • Python face_recognition_demo (restored with updated model license)
  • C++ G-API gaze_estimation_demo
  • C++ image_processing_demo (combines deblurring and image super-resolution cases)
  • Python noise_suppression_demo (removes background noise from the speech)
  • C++ social_distance_demo
  • Python speech_recognition_deepspeech_demo (renamed from speech_recognition_demo and extended with on-line processing mode, to reduce latency)
  • Python speech_recognition_quartznet_demo
  • Python time_series_forecasting_demo

object_detection_demo extended with support new models, including Yolo-V4-Tiny

Open Model Zoo tools:

OMZ tools implements sending telemetry statistics, when this option is accepted through install.
Model Downloader, Converter and Accuracy Checker now available in Python PyPi openvino-dev distribution

Support for new datasets and tasks was added into Accuracy Checker

Deep Learning Streamer

  • GStreamer version updated to 1.18 bringing bug fixes and new features in GStreamer media and IO plugins
  • Added support for human pose estimation models. This feature will enable 3D Athlete Tracking (3DAT) SDK built on top of DL Streamer. Added new sample `gst_launch/human_pose_estimation` with pipeline example
  • Fixed issue with multi-model pipelines (ex, gvadetect and gvaclassify) on Intel® NCS2 (Neural Compute Stick 2)
  • [Preview] Added support for action recognition models. Added sample `gst_launch/action_recognition` with pipeline example
  • [Preview] New property ‘device’ in gvatrack (object tracking) and gvawatermark (overlay) elements allows to select CPU or GPU as target device
  • Operating system deprecation notice: DL Streamer will drop support for CentOS and will introduce support for Red Hat Enterprise Linux (RHEL) 8 starting release 2022.1 (October 2021).

OpenVINO™ Model Server

  • Binary input data - ability to send inference requests using data in a compressed format like jpeg or png – significantly reducing communication bandwidth.  There is a noticeable performance improvement, especially with the REST API prediction calls and image data. For more details, see the documentation.  
  • Dynamic batch size without model reloading – it is now possible to run inference with arbitrary batch sizes using input demultiplexing and splitting execution into parallel streams. This feature enables inference execution with OpenVINO Inference Engine without the side effect of changing the batch size for sequential requests and reloading models at runtime. For more details, see the documentation.  
  • Practical examples of custom nodes – new or updated custom nodes: model zoo object detection , Optical Character Recognition and image transformation. These custom nodes can be used in a range of applications like vehicle object detection combined with recognition or OCR pipelines. Learn more about DAG Scheduler and custom nodes in the documentation.   
  • Change model input and output layouts at runtime – it is now possible to change the model layout at runtime to NHWC. Source images are typically in HWC layout and such layout is used in image transformation libraries. Using the same layout in the model simplifies linking custom nodes with image transformations and avoids data transposing, it also reduces the load on clients and the overall latency for inference requests. Learn more  
  • OpenVINO Toolkit Operator for OpenShift

    The OpenVINO Toolkit Operator for OpenShift 0.2.0 is included in the 2021.4 release. It has been renamed and has the following enhancements compared to previous OpenVINO Model Server Operator 0.1.0 released with 2021.3:

    • The Custom Resource for managing the instances of OpenVINO Model Server is renamed from Ovms to ModelServer.

    • ModelServer resources can now manage additional parameters: annotations, batch_sizeshapemodel_version_policyfile_system_poll_wait_secondsstatefulnode_selector, and layout. For a list of all parameters, see the documentation.

    • The new Operator integrates OpenVINO Toolkit with OpenShift Data Science —a managed service for data scientists and AI developers offered by Red Hat. The Operator automatically builds a Notebook image in OpenShift which integrates OpenVINO develop tools and tutorials with the JupyterHub spawner.

    • Operator 0.2.0 is currently available for OpenShift only. Updates to the Kubernetes Operator will be included in a future release.

New Distributions

  • Containers:
    • Now RHEL 8 runtime Docker image is available on Red Hat* Ecosystem Catalog  container registry with CPU, GPU plugins support.
      • Includes Inference Engine and OpenCV.
      • Supports CPU and GPU devices.
    • Add special images with `_tgl` tag to natively support inference on 11th Generation Intel® Core™ Processor Family for Internet of Things (IoT) Applications (formerly codenamed Tiger Lake) from OpenVINO Docker container.

Preview Features / Support Terminology

A preview feature is a functionality that is being introduced to gain early feedback from developers. You are encouraged to submit your comments, questions, and suggestions related to preview features to the forum.

Known Issues

  Jira ID Description Component Workaround
  #1 A number of issues were not addressed yet, see the Known Issues section in the Release Notes for Intel® Distribution of OpenVINO™ toolkit v.2020 All N/A
1 21670 FC layers with bimodal weights distribution are not quantized accurately by the Intel® GNA Plugin when 8-bit quantization is specified. Weights with values near to zero are set to zero. IE GNA plugin For now, use 16-bit weights in these cases.
2 25358 Some performance degradations are possible in the GPU plugin on GT3e/GT4e/ICL NUC platforms. IE GPU Plugin N/A
3 24709 Retrained TensorFlow Object Detection API RFCN model has significant accuracy degradation. Only the pretrained model produces correct inference results. All Use Faster-RCNN models instead of an RFCN model if retraining of a model is required.
4 24101 Performance and memory consumption may be bad if layers are not 64-bytes aligned. IE GNA plugin Try to avoid the layers which are not 64-bytes aligned to make a model GNA-friendly.
5 35367 [IE][TF2] Several models failed on the last tensor check with FP32. IE MKL-DNN Plugin  
6 34087
[cIDNN] Performance degradation on several models due to upgrade of the OpenCL driver
7 33132 [IE CLDNN] Accuracy and last-tensor checks regressions for FP32 models on ICLU GPU IE clDNN Plugin  
8 25358 [cIDNN] Performance degradation on NUC and ICE_LAKE targets on R4 IE clDNN Plugin N/A
9 39136 Calling LoadNetwork after a failed reshape throws an exception IE NG integration  
10 42203 Customers from China may experience some issues with downloading content from the new storage due to the China firewall OMZ Please use a branch with links to old storage
11 24757 The heterogeneous mode does not work for GNA IE GNA Plugin Split the model to run unsupported layers on CPU
12 57261 For the models created by the Post-Training Optimization Tool in the "performance" mode, the accuracy may not be satisfactory on GNA IE GNA Plugin Use the "accuracy" mode instead

Included in This Release

The Intel® Distribution of OpenVINO™ toolkit is available in these versions:

  • OpenVINO™ toolkit for Windows*
  • OpenVINO™ toolkit for Linux*
  • OpenVINO™ toolkit for macOS*
Component License Location Windows Linux macOS

Deep Learning Model Optimizer

Model optimization tool for your trained models

Apache 2.0 <install_root>/deployment_tools/model_optimizer/* YES YES YES

Deep Learning Inference Engine

Unified API to integrate the inference with application logic

Inference Engine Headers

Intel(R) OpenVINO(TM) Distribution License



Apache 2.0






OpenCV* library

OpenCV Community version compiled for Intel® hardware

Apache 2.0 <install_root>/opencv/* YES YES YES

Intel® Media SDK libraries (open source version)

Eases the integration between the OpenVINO™ toolkit and the Intel® Media SDK.

MIT <install_root>/../mediasdk/* NO YES NO

OpenVINO™ toolkit documentation

Developer guides and other documentation

Apache 2.0 Available from the OpenVINO™ toolkit product site, not part of the installer packages. NO NO NO

Open Model Zoo

Documentation for models from the Intel® Open Model Zoo. Use the Model Downloader to download models in a binary format.

Apache 2.0 <install_root>/deployment_tools/open_model_zoo/* YES YES YES

Inference Engine Samples

Samples that illustrate Inference Engine API usage and demos that demonstrate how you can use features of Intel® Distribution of OpenVINO™ toolkit in your application

Apache 2.0

<install_root>/deployment_tools/inference_engine/samples/* YES YES YES

Deep Learning Workbench

Enables you to run deep learning models through the OpenVINO™ Model Optimizer, convert models into INT8, fine-tune them, run inference, and measure accuracy.

Intel(R) OpenVINO(TM) Distribution License

Starting with the Intel® Distribution of OpenVINO™ toolkit 2021.3 release, DL Workbench is available only as a prebuilt Docker image. Reference to DL Workbench is kept in OpenVINO installation, but now pulls pre-built image from DockerHub instead of building it from the package. 


Post-Training Optimization Toolkit

Designed to convert a model into a more hardware-friendly representation by applying specific methods that do not require retraining, for example, post-training quantization.

Intel(R) OpenVINO(TM) Distribution License <install_root>/deployment_tools/tools/post_training_optimization_toolkit/* YES YES YES

Speech Libraries and End-to-End Speech Demos


GNA Software License Agreement <install_root>/data_processing/audio/speech_recognition/* YES YES NO
DL Streamer End User License Agreement for the Intel(R) Software Development Products <install_root>/data_processing/dl_streamer/* NO YES NO


Where to Download This Release

System Requirements

Disclaimer: Certain hardware (including but not limited to GPU and GNA) requires installation of specific drivers to work correctly. Drivers might require updates to your operating system, including Linux kernel, please refer to the their documentation. Operating system updates should be handled by user, and are not part of OpenVINO installation.

Intel® CPU Processors


  • Intel® Atom* processor with Intel® SSE4.2 support
  • Intel® Pentium® processor N4200/5, N3350/5, N3450/5 with Intel® HD Graphics
  • 6th - 11th generation Intel® Core™ processors
  • Intel® Xeon® processor Scalable Processors (formerly Skylake)
  • 2nd Generation Intel® Xeon® Scalable Processors (formerly Skylake and Cascade Lake)
  • 3rd Generation Intel® Xeon® Scalable Processors (formerly Cooper Lake  and Ice Lake)

Operating Systems:

  • Ubuntu* 18.04 long-term support (LTS), 64-bit
  • Ubuntu* 20.04 long-term support (LTS), 64-bit - preview support
  • Windows* 10, 64-bit
  • macOS* 10.15, 64-bit
  • CentOS* 7, 64-bit
  • For deployment scenarios on Red Hat* Enterprise Linux* 8.2 (64 bit), you can use the of Intel® Distribution of OpenVINO™ toolkit run-time package that includes the Inference Engine core libraries, nGraph, OpenCV, Python bindings, CPU and GPU plugins. The package is available as:

Intel® Processor Graphics


  • Intel® HD Graphics
  • Intel® UHD Graphics
  • Intel® Iris® Xe Graphics
  • Intel® Iris® Xe Max Graphics 
  • Intel® Iris® Pro Graphics

Operating Systems:

  • Ubuntu* 18.04 long-term support (LTS), 64-bit
  • Windows* 10, 64-bit
  • Yocto* 3.0, 64-bit
  • For deployment scenarios on Red Hat* Enterprise Linux* 8.2 (64 bit), you can use the of Intel® Distribution of OpenVINO™ toolkit run-time package that includes the Inference Engine core libraries, nGraph, OpenCV, Python bindings, CPU and GPU plugins. The package is available as:

NOTE: This installation requires drivers that are not included in the Intel Distribution of OpenVINO toolkit package

NOTE:  A chipset that supports processor graphics is required for Intel® Xeon® processors. Processor graphics are not included in all processors. See Product Specifications for information about your processor.

Intel® Gaussian & Neural Accelerator (Intel® GNA)

Operating Systems:

  • Ubuntu* 18.04 long-term support (LTS), 64-bit
  • Windows* 10, 64-bit

Intel® VPU Processors

Intel® Vision Accelerator Design with Intel® Movidius™ Vision Processing Units (VPU)

Operating Systems:

  • Ubuntu* 18.04 long-term support (LTS), 64-bit (Linux Kernel 5.2 and below)
  • Windows* 10, 64-bit
  • CentOS* 7.6, 64-bit

Intel® Movidius™ Neural Compute Stick and Intel® Neural Compute Stick 2

Operating Systems:

  • Ubuntu* 18.04 long-term support (LTS), 64-bit
  • CentOS* 7.6, 64-bit
  • Windows* 10, 64-bit
  • Raspbian* (target only)

AI Edge Computing Board with Intel® Movidius™ Myriad™ X C0 VPU, MYDX x 1

Operating Systems:

  • Windows* 10, 64-bit

Components Used in Validation

Operating systems used in validation:

DL frameworks used for validation:

  • TensorFlow 1.15.2, 2.2.0 (limited support according to product features)
  • MxNet 1.5.1

NOTE: Version of CMake specified above is to build OpenVINO from source. Building samples and demos from the Intel® Distribution of OpenVINO™ toolkit package requires CMake* 3.10 or higher (except of Windows where CMake 3.14 is required as the first supporting Visual Studio 2019).

Helpful Links

NOTE: Links open in a new window.

Legal Information

Performance varies by use, configuration and other factors. Learn more at

Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available updates. See backup for configuration details. No product or component can be absolutely secure.

Your costs and results may vary.

Intel technologies may require enabled hardware, software or service activation.

OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos.

© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.

OpenVINO™ Logo

To build equity around the project, the OpenVINO logo was created for both Intel and community usage. The logo may only be used to represent the OpenVINO toolkit and offerings built using the OpenVINO toolkit.

Logo Usage Guidelines

The OpenVINO logo must be used in connection with truthful, non-misleading references to the OpenVINO toolkit, and for no other purpose. Modification of the logo or use of any separate element(s) of the logo alone is not allowed.



Product and Performance Information


Performance varies by use, configuration and other factors. Learn more at