At Intel, one of the projects we’re undertaking research on is developing computer vision algorithms based on deep neural networks (DNNs) and how to streamline the process. As with any other DNN, for model training data scientists need annotated data. Of course, there is plenty of data available on the Internet, but there are some roadblocks to utilizing. On one hand, data scientists are asked to apply AI to more and more new tasks without appropriate annotated data for those tasks. On the other hand, some data require a license agreement and therefore not suitable for use in commercial products’ development. That’s why Intel’s work isn’t only to develop and train algorithms, but to annotate data. This is a quite long and time-consuming process and shouldn’t fall to the algorithm developers to tackle. For example, members of our data annotation team spent about 3,100 hours to annotate more than 769,000 objects only for one of our algorithms.
There are two options to solve the data annotation dilemma:
First – delegate data annotation to other companies with the appropriate specialization. It should be noted that in this case the process of data validation and re-annotation is quite complicated and, of course, involves more paperwork.
Second (more convenient for our team) – create and support an internal data annotation team. We can quickly assign them new tasks and manage the work process. It is also easy to balance the price and quality of the work. In addition, it’s possible to implement our own automation algorithms and to improve the quality of annotation.
Generally, there are many ways to annotate data, but using special tools may help to speed up this process. Thus, in order to accelerate this process within the realm of Computer Vision, Intel developed a program called Computer Vision Annotation Tool (CVAT).
In this article, we will cover general information on CVAT, as well as more information on the architecture and future development directions. Let’s start by looking at an overview of CVAT before diving into the details.
Computer Vision Annotation Tool (CVAT) is an open source tool for annotating digital images and videos. The main function of the application is to provide users with convenient annotation instruments. For that purpose, we designed CVAT as a versatile service that has many powerful features.
CVAT is a browser-based application for both individuals and teams that supports different work scenarios. The main tasks of supervised machine learning can be divided into three groups:
CVAT allows you to annotate data for each of these cases. There are some advantages and disadvantages of the tool.
As mentioned above, CVAT supports a number of additional optional components:
Additionally, there are many features for use in typical annotation tasks: automation instruments (copy and propagate objects, interpolation, and automatic annotation using the TensorFlow OD API), visual settings, shortcuts, filters, and others. These can be changed in the Settings menu.
The Help menu also contains a number of shortcuts and other hints.
The annotation process is detailed in the examples below
Interpolation mode: CVAT may be used to interpolate bounding boxes and attributes between key frames. Then a set of images will be annotated automatically.
Attribute annotation mode: Attribute Annotation Mode was developed for image classification and accelerates the process of attribute annotation by focusing the annotator’s attention on just one exact attribute. Additionally, the process of annotation is carried out in this mode by using hot keys.
Segmentation mode: Annotation with polygons is used for semantic segmentation and instance segmentation. In this mode, visual settings make the annotation process easier.
Users have to choose Dump Annotation to download the latest annotations. It will be written in .xml file that contains some metadata and all annotations. If a user connected a Git repository at the stage of creating a task, this file can be downloaded there.
Docker* containers in CVAT are used to simplify its installation and deployment. The system includes several containers. The CVAT container runs the supervisord process which generates a few Python* processes in a Django* environment. For example, the wsgi server process, which works on clients’ requests. Other processes – rq workers are used to process the “long-running” tasks from Redis: default and low. The long-running tasks are those which can’t be processed during one user’s request (creating a task, preparing an annotation file, annotation using TF OD API, and others). The number of workers can be changed in the supervisord configuration file.
The Django environment works with two database servers. The Redis* server stores information about the status of the tasks’ queue and the CVAT database contains all information about tasks, users, annotations, etc. PostgreSQL* (and SQLite* 3 during the development) is used as a Database Management System for CVAT. All data stores in cvat db volume. Volumes are used to avoid data loss when updating a container. In this way, the following volumes are mounted to CVAT container:
The analytics system contains Elasticsearch, Logstash, and Kibana wrapped in Docker containers. When the work is saved, all data, including logs, are transferred to the server. The server transfers it in Logstash for filtration. Additionally, there’s an opportunity to automatically send notifications to email if any errors occur. Then, logs are transferred to Elasticsearch, where they are stored in the cvat events volume. After that, the user can view statistics and logs in Kibana. Meanwhile, Kibana will closely work with Elasticsearch.
CVAT’s source code contains a list of Django applications:
Strong feedback and user demand is helping Intel determine the future direction of CVAT’s development.
There are many feature requests in addition to what is detailed above. Unfortunately, there are always more requests than opportunities to implement them. That’s why Intel encourages the community to take an active part in the open source development.
There is also a guideline where users can learn how to set up a development environment, the process of creating your PR, etc. As has been mentioned above, there’s no documentation for developers yet but users can always ask for help in Gitter chat.
Now get involved and create, using the Computer Vision Annotation Tool.
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserverd for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804