The Intel® Integrated Native Developer Experience (Intel® INDE) is a cross-architecture productivity suite that provides developers with tools, support, and IDE integration to create high-performance C++/Java* applications for Windows* on Intel® architecture and Android* on ARM* and Intel® architecture.
The new OpenCV beta, a feature of Intel INDE, is compatible with the new open source OpenCV 3.0 beta (Open Source Computer Vision Library: http://opencv.org). Provides free binaries for computer vision applications development and production for usages like enhanced photography, augmented reality, video summarization, and more.
Key features of the Intel® INDE OpenCV are
- Compatibly with OpenCV 3.0
- Pre-build and validated binaries, cleared of IP protected building blocks.
- Easy to use and to maintain with IDE integration for both Windows and Android development.
- Optimized for Intel® platforms with heterogeneous computing.
This document is focused on the performance. Refer to the Getting Started with Intel® INDE OpenCV for the full list of Intel INDE OpenCV features.
While, the OpenCV 3.0 Transparent API (described in the “OpenCV 3.0 Architecture Guide for Intel INDE OpenCV” document) creates an opportunity for GPU computing, the free subset of Intel IPP provides a powerful implementation of OpenCV functions for Intel CPUs. Very few libraries offer GPU acceleration paired with efficient CPU fallback in a way that is transparent to the user. This document, describes both the original (open source) OpenCV improvements, and performance features that are unique for the Intel INDE OpenCV version.
Introducing OpenCV 3.0
OpenCV 3.0 is a new iteration of the now de-facto standard library for vision and image processing. Since its’ Alpha version it introduces important changes in OpenCV architecture. Directly from the changelog:
- The new technology is nick-named "Transparent API" and, in brief, is extension of classical OpenCV functions, such as cv::resize(), to use OpenCL underneath. See more details about here:T-API
- Performance of OpenCL-accelerated code on Intel Iris Graphics and Intel Iris Pro Graphics has been improved by 10%-230%
- On x86 and x64 platforms OpenCV binaries include and use a subset of Intel® Integrated Performance Primitives (Intel® IPP) by default. OpenCV 3.0 beta includes a subset of Intel® IPP 8.2.1 with additional optimization for AVX2.
Intel INDE OpenCV is based exactly on the OpenCV 3.0 Beta community sources and contains preview of even more Intel’s specific optimizations and features (detailed below) that are not part of the public OpenCV “stock” version yet.
Notice that official “Beta” status of the current OpenCV release actually implies that there still might be performance changes by the final OpenCV 3.0 (“Gold”) release.
Intel INDE OpenCV Performance
This document relies on the results from the OpenCV performance tests hosted on GitHub. These tests measure performance across multiple different variables, including OpenCV function, image size, border handling scheme, and function-specific parameters like filter size. In recent versions of the test suite, this adds up to almost 3,000 distinct tests that cover optimized functions. In order to make sense of these tests, this document shows the speedup numbers as a geometric mean across the various tests that cover an individual function.
Performance Gains from Direct OpenCV 3.0 Optimizations
Figure 1 shows the example performance gains achieved via the OpenCV 3.0 optimizations made by Itseez. This chart shows the geometric mean across different tests of an individual function. The results measure the impact of a specific changelist by comparing against the immediate predecessor.
Figure 1. Performance gains from OpenCL optimizations for OpenCV 3.0, on Intel HD Graphics OpenCL device.
Bars are tests geomeans for each individual OpenCV function. Refer to the IDF’14 presentation “Intel® Processor Graphics: Optimizing Computer Vision and More” for details on the applied optimizations.
These results show substantial performance gains on Intel Processor Graphics. Similarly, the IPP-enabled path was significantly improved in OpenCV 3.0 (by use of the free Intel IPP subset, available for OpenCV users). So today the IPP-enabled path offers significant acceleration on the Intel CPUs, comparing to the default C code. As INDE-OpenCV is based exactly on the community OpenCV 3.0 sources, all the optimizations we discussed so far are also included in the INDE-OpenCV.
Default OpenCV 3.0 Logic behind Performance Code Paths
The community OpenCV 3.0 is equipped with two major performance paths beyond the “plain” fallback in C/C++:
- OpenCL flavor of OpenCV functions, running on the Intel Processor Graphics OpenCL device.
- Intel IPP-enabled path, running on the CPU.
The community OpenCV makes very coarse decisions about which function to run on which particular piece of hardware. For example, with the original OpenCV 3.0 Beta if you use UMat data type and call an OpenCV function (that has an OpenCL implementation) then the function runs on the GPU. Otherwise, the function runs on the CPU. Still this approach does not always result in the best performance. Intel INDE OpenCV features the dispatcher that solves this issue.
Overriding the Default Logic in Intel INDE OpenCV by Dispatcher
CPUs and GPUs have significantly different architectures that make them better suited to different tasks. Often, a CPU performance is superior for complex processing on a single or few streams of data. GPU's in contrast are performing much better for data- parallel and computationally heavy tasks. Many OpenCV functions actually still run faster on modern CPUs, due to the nature of the algorithm, especially when backed by optimized/multi-threaded implementation. Just like community OpenCV, the Intel INDE OpenCV provides an Intel IPP-enabled codepath for Intel CPUs.
Moreover, Intel INDE OpenCV features a dispatcher API that enables you to specify which specific code path (for example, OpenCL or IPP) to use in each particular case. Refer to the separate document for details (dispatcher article).
Performance Analysis of Complex Pipelines
Most part of the analysis described in this document is focused on performance of individual OpenCV functions. For information on analysis of complicated pipelines that use OpenCV on Intel SoCs, refer to the Intel INDE OpenCV Profiler tutorial.