Tutorial: Using Inference to Accelerate Computer Vision Applications

Introduction

This tutorial will walk you through the basics of using the Deep Learning Deployment Toolkit's Inference Engine (included in the Intel® Computer Vision SDK Beta R3). Here, inference is the process of using a trained neural network to infer meaning from data (e.g., images). In the code sample that follows, a video (frame by frame) is fed to the Inference Engine (our trained neural network) which then outputs a result (classification of an image). Inference can be done using various neural network architectures (AlexNet*, GoogleNet*, etc.). This example uses a Single Shot MultiBox Detector (SSD) on GoogleNet model. For an example of how SSD is used see this article on the Intel® Developer Zone.

The Inference Engine requires that the model be converted to IR (Intermediate Representation) files. This tutorial will walk you through the basics taking an existing model (GoogleNet) and converting it to IR (Intermediate Representation) files using the Model Optimizer.

The result of this tutorial is that you will see inference in action on a video by detecting multiple objects, such as people or cars.  Here's an example of what you might see on this sample image:

So What's Different About Running a Neural Network on the Inference Engine?

  • The Inference Engine optimizes inference allowing a user to run deep learning deployments significantly faster on Intel® architecture. For more information on the performance on Intel® Processor Graphics see this article
  • Inference can run on hardware other than the CPU such as the built-in Intel® GPU or Intel® FPGA accelerator card.

How Does the Inference Engine Work?

The Inference Engine takes a representation of a neural network model and optimizes it to take advantage of advanced Intel® instruction sets in the CPU, and also makes it compatible with the other hardware accelerators (GPU and FPGA). To do this, the model files (.caffemodel, .prototxt) are given to the Model Optimizer which then processes the files and outputs two new files: a .bin and .xml. These two files are used instead of the original model files when you run your application. In this example, the .bin and .xml files are provided.

In the above diagram, IR stands for Intermediate Representation, which is just a name for the .xml and .bin files that are inputs to the Inference Engine.

What you’ll Learn

  • How to install the OpenCL™ Runtime Package
  • How to install the Intel® Computer Vision SDK Beta R3
  • How to generate the .bin and .xml (IR files) needed for the Inference Engine from a Caffe model
  • Run the Inference Engine using the generated IR files in a C++ application
  • Compare the performance of CPU vs GPU

Gather your materials

  • 5th or greater Generation Intel® Core™ processor. You can find the product name in Linux* by running the ‘lscpu’ command. The ‘Model name:’ contains the information about the processor.

Note: The generation number is embedded into the product name, right after the ‘i3’, ‘i5’, or ‘i7’. For example, the Intel® Core™ i5-5200U processor and the Intel® Core™ i5-5675R processor are both 5th generation, and the Intel® Core™ i5-6600K processor and the Intel® Core™ i5 6360U processor are both 6th generation.

  • Ubuntu* 16.04.3 LTS
  • In order to run inference on the integrated GPU:
    • A processor with Intel® Iris® Pro graphics or HD Graphics
    • No discrete graphics card installed (required by the OpenCL™ platform). If you have one, make sure to disable it in BIOS before going through this installation process.
    • No drivers for other GPUs installed, or libraries built with support for other GPUs

Continue on GitHub.

For more complete information about compiler optimizations, see our Optimization Notice.