Pretrained Models

The Intel® Distribution of OpenVINO™ toolkit includes two sets of optimized models that can expedite development and improve image processing pipelines for Intel® processors. Use these models for development and production deployment without the need to search for or to train your own models.

Public Model Set

Download and incorporate some of the most popular models created by the open developer community using the included Model Downloader. Add these models directly to your environment and accelerate your development.

Find the downloader in this toolkit folder: \deployment_tools\tools\model_downloader.


Free Model Set

Discover the capabilities of Intel® software and silicon with a fully functioning set of pretrained models. These models provide common vision use cases and reduce development time and cost. Documentation for each model includes links to public data.

For more details on the complete list of pretrained models included in the package, see Documentation.

Minimum System Requirements

Age & Gender Recognition

This neural network-based model provides age and gender estimates with enough accuracy to help you focus your marketing efforts.

Text Detection

This model is based on PixelNet* architecture with MobileNetV2* as a backbone. It enables the ability to detect text in indoor and outdoor scenes.

More Information

Single Image, Super Resolution

Attention-Based Approach

Enhance the input image resolution by a factor of four or three with single-image, super resolution networks that are built on this approach. The two models are faster than the SRResNet-based networks and have better memory consumption.

More Information

Face Detection

Standard Model

Identify faces for a variety of uses, such as observing if passengers are in a vehicle or counting indoor pedestrian traffic. Combine it with a person detector to identify who is coming and going.

Enhanced Model

While similar to the standard model, this model performs better in a wider range of lighting conditions. The detector backbone is SqueezeNet light (half-channels) with a single-shot detector (SSD) for shooting indoor and outdoor scenes with a front-facing camera.

Retail Environment Model

Different colored bounding boxes simultaneously detect a head and an entire person. Based on a backbone similar to MobileNetV2, the model includes depth-wise convolutions that reduce computation for a 3 x 3 convolution block.

Head Position

This model shows the position of the head and provides guidance on what caught the subject's attention.

Note This model does not capture the subject's gazing direction.

35 facial landmarks

Facial Landmarks Detection

This is a custom architecture based on a convolution neural network. It detects 35 facial landmarks that cover eyes, noses, mouths, eyebrows, and facial contours.

five identified face landmarks

Lightweight Facial Landmarks Detection

This lightweight regressor model identifies five facial landmarks: two eyes, a nose, and two lip corners. The model is best suited for smart classroom use cases.

example of identifying faces

Face Reidentification

Use this lightweight network for face reidentification in smart classroom scenarios. For best results, use a frontally oriented and aligned input face.

Human Detection

Eye-Level Detection

View the number of people in a frame at any given time. This model performs best when the camera angle is approximately at eye level and is based on the hyper feature (R-FCN) backbone.

High-Angle Detection

Use this model for cameras mounted at higher vantage points to count the people in a frame.

Detect People, Vehicles, & Bikes

Distinguish between people, people riding bikes, bikes alone, and vehicles. A variety of lighting conditions in this model improve accuracy in daylight, darkness, and variations in the weather.

Pedestrian Detection

Distinguish between people and objects in public using a network that is based on an SSD framework with a tuned MobileNetV1* as a feature extractor.

Pedestrian & Vehicle Detection

Identify people and vehicles by using a network that is based on an SSD framework with a tuned MobileNetV1 as a feature extractor.

Pedestrian Attributes

Identify key attributes of a person crossing the road: gender, hat, long sleeves, long pants, long hair, coat, and jacket.

individual action detection

Action Detection for a Smart Classroom

This model recognizes poses that include sitting, standing, and raising a hand. Use this action detector for a smart classroom scenario based on the RMNet backbone with depthwise convolutions.

Human Pose Estimation

This multiperson, 2D pose estimation network is based on the OpenPose approach and uses a tuned MobileNetV1 to extract features. It detects a skeleton (which consists of keypoints and connections between them) to identify human poses for every person inside the image. The pose may contain up to 18 keypoints: ears, eyes, nose, neck, shoulders, elbows, wrists, hips, knees, and ankles.

Vehicle Feature Recognition

Vehicle Detection

Identify vehicles by applying an SSD framework that uses a tuned MobileNetV1 to extract features.

a recognized license plate

License Plate Detection: Front-Facing Camera

This MobileNetV2 and SSD-based vehicle and license plate detector recognizes Chinese license plates from a front-facing camera. This model is useful for security barriers that require front license plate detection.

Note This model replaces the previous version and runs faster while maintaining the same accuracy.

Vehicle Metadata

Conduct an initial analysis and present back-key metadata for faster sorting and searching in the future. The average color accuracy for the model is over 82 percent for red, white, black, green, yellow, gray, and blue. Its average vehicle-type attribution is over 87 percent for cars, vans, trucks, and buses.

Advanced Roadside Identification

Classify objects as roads, sidewalks, buildings, walls, fences, poles, traffic lights, traffic signs, vegetation, terrain, sky, people, passengers, cars, trucks, buses, trains, motorcycles, bicycles, or electric vehicles.

For target hardware and samples that support the pretrained models, see Supported Samples.