Artificial Intelligence (AI) can be used to solve a wide range of problems, including those related to computer vision, such as image recognition, object detection, and medical imaging. In the present paper we show how to integrate OpenCV* (Open Source Computer Vision Library) with a neural network backend. In order to achieve this aim, we first explain how the video stream is manipulated using a Python* programming interface and we also provide guidelines on how to use it. Finally, we discuss a working example of an OpenCV application. OpenCV is one of the packages that ship with Intel® Distribution for Python* 2018.
Today, the possibilities of artificial intelligence (AI) are accessible to almost everyone. There are a number of artificial intelligence applications and many of them require the use of computer vision techniques. One of the most currently used libraries to help detection and matching, motion estimation, and tracking is OpenCV1. OpenCV is a library of programming functions mainly aimed at real-time computer vision. Originally developed by Intel, it was later supported by Willow Garage and is now maintained by Itseez. The library is cross-platform and free for use under the open-source BSD license.
Usually, the OpenCV library is used to detect something on a video or image that is used as input for some AI application or deep learning framework like MXNet*2, Caffe*3, Caffe2*4, Torch*5, Theano*6, TensorFlow*7, and others. There are several AI applications that use computer vision, for instance:
- Advanced driver assistance systems (ADAS) and autonomous cars
- Image recognition, object detection and tracking, and automatic document analysis
- Real-time detection of unattended baggage in airport, train, and bus stations
- Face detection and recognition, normally used for security issues
- Medical image processing
- IoT (Internet of Things) applications
In this work, we will make an overview of video preprocessing techniques that are used to detect some feature that may be present in a video stream, and generate images with the detected features. Our example will focus on face detection, which is used as a preprocessing phase to a face recognition system. The face recognition systems can be an AI application, a deep learning framework, or some cloud service such as Amazon Rekognition*8, Microsoft Azure* Cognitive Services9, Google Cloud Vision10, and others.
Figure 1. Video preprocessing.
Figure 1 shows the flow diagram of the face detection process from a video stream:
- Input video: Can be from a surveillance camera, webcam, notebook camera, and so on.
- Backend stream video: Sometimes the OpenCV cannot directly open the video from the camera. In this case, we can use a tool to record, convert, and stream video to one format/encode that the OpenCV knows. The main tool for that is FFmpeg* lib11. FFmpeg is a free software project that produces libraries and programs for handling multimedia data. FFmpeg is a leading multimedia framework, capable of decoding, encoding, transcoding, muxing, demuxing, streaming, filtering, and playing nearly any signal format available. It supports from the most obscure ancient formats up to the cutting edge ones. FFmpeg also supports any signal source, such as screen, or camera, as well as a file input.
- OpenCV video reader: Open software used to read the video stream and process it to make face or object detection.
- Frames (fps): Frames per second processed by OpenCV.
- Image file (.jpg): Output file; this is the OpenCV image recognition result.
The environment used for this work is composed of one surveillance camera and one computer running CentOS* 7 Linux* with the Intel® Distribution for Python 2018. Intel® Distribution for Python complies with the SciPy* Stack specification, and includes the package OpenCV and a deep learning framework such as Caffe, Intel® Math Kernel Library for Deep Neural Networks (Intel® MKL-DNN)12, Theano, or TensorFlow. Python packages have been accelerated with Intel® Performance Libraries, including Intel® Math Kernel Library (Intel® MKL), Intel® Threading Building Blocks, and Intel® Data Analytics Acceleration Library. All the aforementioned software packages have been optimized to take advantage of parallelism through the use of threading, multiple nodes, and vectorization.
Intel® Distribution for Python 2018 has improved OpenCV performance compared to OpenCV available on CentOS Linux distribution. The performance was measured by comparing the time, in seconds, it takes to compute the full-HD frames captured and processed on OpenCV. The percent gain was around 92 percent. The machine used in our test has two Intel® Xeon® E5-2630 CPU with 8 GBytes of RAM.
To capture a video stream it is necessary to create a VideoCapture* object. The input argument to create such an object can be either the device index or the name of a video file. The device index is simply the number that identifies which camera will provide images for the system. When the VideoCapture object is created, the image provided by the specified camera or video file is captured frame by frame. When the built-in laptop webcam or some external camera is used, it is possible to open the video directly in OpenCV, using the sequence of commands shown in figure 2.
import cv2 cap = cv2.VideoCapture(0)
Figure 2. Capture video from camera.
However, OpenCV cannot handle our surveillance camera directly. For this reason, it is necessary to use a backend video stream to convert the input from the camera to a format that OpenCV can understand. We use the FFmpeg multimedia framework to convert the video stream to MPEG format.
Figure 3 shows the flow diagram for the backend video stream. The components used for this solution are described below.
- Camera: Device camera
- ffmpeg: A tool used to copy the video stream to the file cam.ffm.
- ffserver: A tool that converts the video stream from camera, saved to file cam.ffm, to an MPEG video stream (cam.mpeg) that will be used by OpenCV.
- OpenCV: Reads the video stream from ffserver (cam.mpeg) and treats it frame by frame. It is possible to use filters to help the face detection process.
- File cam.ffm: Used as buffer from ffmpeg tool to ffserver.
- File cam.mpeg: Used as buffer from ffserver to OpenCV.
Figure 3. Backend video stream.
The ffmpeg software gets input from a video camera and writes to the file named “cam.ffm”. An IP address is assigned to the video camera and an authentication system is used (“user:password”) to grant the user access to the video stream. ffmpeg uses the Real-Time Streaming Protocol (RTSP) over TCP; see Figure 4. In the present case, the Uniform Resource Identifier (URI) uses channel 1, which corresponds to the original video camera.
ffmpeg -rtsp_transport tcp -i rtsp://user:firstname.lastname@example.org:554/Streaming/Channels/1 http://localhost:8090/cam.ffm
Figure 4. ffmpeg tool.
Provided that the “cam.ffm” file is created, it is read by the ffserver, which decodes the “cam.ffm” file and encodes it to MPEG format, saving it to the file named “cam.mpeg”. The ffserver needs a configuration file, named “ffserver.config”. Figure 5 shows a basic configuration file. The ffserver has a number of options that can be set up, but in this application, we need only the following basic configuration options:
- Enable access to http port 8090.
- Up to 10 clients are allowed.
- Read the “cam.ffm” and allow access only localhost.
- Generate a video stream with MPEG format (cam.mjpeg) with the settings below:
- 20 frames per second
- Full HD resolution
- Access from localhost and the 192.168.1./24 network is allowed.
The “ffserver.config” file can be stored in the default directory (/etc); alternatively, a custom location for the configuration file can be provided. If this file is stored in the user’s local directory, ffserver can be called by means of the command line shown in Figure 6. On the other hand, if the “ffserver.config” file is stored in the default directory, use the command shown in Figure 7 to call ffserver.
Finally, the OpenCV reads the “cam.mpeg” file frame by frame using the cv2.VideoCapture function, and processes the video frame by frame.
HTTPPort 8090 HTTPBindAddress 0.0.0.0 MaxClients 10 MaxBandWidth 50000 CustomLog - #NoDaemon <Feed cam.ffm> File /tmp/cam.ffm FileMaxSize 1G ACL allow 127.0.0.1 ACL allow localhost </Feed> <Stream cam.mjpeg> Feed cam.ffm Format mpjpeg VideoFrameRate 20 VideoBitRate 10240 VideoBufferSize 20480 VideoSize 1920x1080 VideoQMin 3 VideoQMax 31 NoAudio Strict -1 </Stream> <Stream stat.html> Format status # Only allow local people to get the status ACL allow localhost ACL allow 192.168.1.0 192.168.1.255 </Stream> <Redirect index.html> URL http://www.ffmpeg.org/ </Redirect>
Figure 5. ffserver.config file.
ffserver -d -f ./ffserver.config
Figure 6. ffserver tool—local directory.
ffserver -d -f /etc/ffserver.config
Figure 7. ffserver tool—default.
When OpenCV is correctly configured by means of the procedure described above, it reads and processes all frames from the video stream. OpenCV has several built-in pretrained classifiers for face, eyes, and smile detection, among others. We use the frontal face Haar-Cascade classifier for the detection process. The details of this classifier are given in the file named haarcascade_frontalface_default.xml.
Figure 8 shows the Python script to detect faces. Below, we describe how the Python script works.
- The script has a function called “detect”, which is used for face detection.
- The script opens the video stream and runs in an infinite loop, identifying each beginning and end of frame.
- Then, the frame is converted to gray to serve as input to the detect function.
- If any face is identified within the frame, the script saves a JPEG file, with the naming convention YYYYMMDD_HH_MM_SS_Frame_cam.jpg, where:
- YYYYMMDD: stands for year, month and day
- HH_MM_SS: stands for hour, minutes and seconds
import cv2, platform import numpy as np import urllib import os from time import strftime def detect(img, cascade, scale, neigh, size): rects = cascade.detectMultiScale(img, scaleFactor=scale, minNeighbors=neigh, minSize=(size, size)) if len(rects) == 0: return  rects[:,2:] += rects[:,:2] return rects face_cascade = cv2.CascadeClassifier('/opt/intel/intelpython2/pkgs/opencv-3.1.0-np113py27_intel_6/share/OpenCV/haarcascades/haarcascade_frontalface_default.xml') cam = "http://localhost:8090/cam.mjpeg" stream = urllib.urlopen(cam) bytes = '' nframe = 0 nfaces = 0 scale = 1.3 neigh = 3 size = 50 margin = 40 while True: # to read mjpeg frame - bytes += stream.read(8192) a = bytes.find('\xff\xd8') b = bytes.find('\xff\xd9') if a!=-1 and b!=-1: nframe = nframe+1 jpg = bytes[a:b+2] bytes= bytes[b+2:] if (nframe % 20)!=0: continue frame = cv2.imdecode(np.fromstring(jpg, dtype=np.uint8),cv2.CV_LOAD_IMAGE_COLOR) # we now have frame stored in frame. # Our operations on the frame come here gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) # Detect faces in the image rects = detect(gray, face_cascade, scale, neigh, size) # Draw a rectangle around the faces if len(rects): filename = strftime("%Y%m%d_%H_%M_%S")+"_frame_cam.jpg" cv2.imwrite(filename, frame) # Press 'q' to quit #if cv2.waitKey(1) & 0xFF == ord('q'): # break cv2.destroyAllWindows()
Figure 8. Face detection script—save the frame.
Figure 10 shows another Python script to detect faces. Basically the idea is the same, but this script does not save the whole frame, it saves only the detected faces. The script identifies the faces and adds some margin to get more information to help the recognition software. After adding the margin, the script crops the frame and saves into a small image. Figure 9 shows the face detection internal rectangle (green) and face detection with margin in the external rectangle (blue).
Figure 9. Face detection and margin.
import cv2, platform import numpy as np import urllib from time import strftime def detect(img, cascade, scale, neigh, size): rects = cascade.detectMultiScale(img, scaleFactor=scale, minNeighbors=neigh, minSize=(size, size)) if len(rects) == 0: return  rects[:,2:] += rects[:,:2] return rects face_cascade = cv2.CascadeClassifier( '/usr/share/OpenCV/haarcascades/haarcascade_frontalface_default.xml') cam = "http://localhost:8090/cam.mjpeg" stream = urllib.urlopen(cam) bytes = '' nframe = 0 nfaces = 0 scale = 1.3 neigh = 3 size = 50 margin = 40 xfhd = 1920 yfhd = 1080 while True: # to read mjpeg frame - bytes += stream.read(8192) a = bytes.find('\xff\xd8') b = bytes.find('\xff\xd9') if a!=-1 and b!=-1: nframe = nframe+1 jpg = bytes[a:b+2] bytes= bytes[b+2:] if (nframe % 20) != 0: continue frame = cv2.imdecode(np.fromstring(jpg, dtype=np.uint8),cv2.CV_LOAD_IMAGE_COLOR) # we now have frame stored in frame. # Our operations on the frame come here gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) # Detect faces in the image rects = detect(gray, face_cascade, scale, neigh, size) # Draw a rectangle around the faces if len(rects): iface = 0 for x1, y1, x2, y2 in rects: iface = iface+1 nfaces = nfaces+1 sface = "_%02d" % iface filename = strftime("%Y%m%d_%H_%M_%S")+ sface +".jpg" yf1 = y1 – margin if yf1 < 0: yf1 = 0 yf2 = y2 + margin if yf2 >= yfhd: yf1 = yfhd - 1 xf1 = x1 – margin if xf1 < 0: xf1 = 0 xf2 = x2 + margin if xf2 >= xfhd: xf2 = xfhd - 1 crop_img = frame[yf1:yf2, xf1:xf2] cv2.imwrite(filename, crop_img) # Press 'q' to quit #if cv2.waitKey(1) & 0xFF == ord('q'): # break cv2.destroyAllWindows()
Figure 10. Face detection script—save only the faces.
This work shows how the OpenCV library can be used to provide adequate input to some face recognition software. Intel Distribution for Python 2018 greatly improves OpenCV performance. Any package included in Intel Distribution for Python as the deep learning framework can be used to make recognition software. Depending on the image, some filters of OpenCV can be used to improve image sharpness; for example, the histogram equalization.
If the face recognition software is used from a cloud, the second script shown in Figure 10 is more appropriate because it saves only the face, not the entire frame, saving storage space, since the file is smaller and its upload is faster. In the examples discussed above, the frame size was around 400 KB and the face size was around 35 KB. Figure 11 shows a fragment of the code of the script for uploading a file to S3* on Amazon Web Services* (AWS). Once the upload is completed, the file is removed. This code fragment can be included in any of the previously shown scripts.
import boto3 import os # ... # Upload frame to AWS and remove the file s3.upload_file(filename, 'YOUR_BUCKET', 'YOUR_FOLDER/'+filename) # remove file after upload os.remove(filename)
Figure 11. Stretch for upload to AWS* Cloud.
- OpenCV: http://opencv.org/
- MXNet: https://mxnet.incubator.apache.org/
- Caffe: http://caffe.berkeleyvision.org/
- Caffe2: https://caffe2.ai/
- Torch: http://torch.ch/
- Theano: http://deeplearning.net/software/theano/
- TensorFlow: https://www.tensorflow.org/
- Amazon Web Services—Rekognition: https://aws.amazon.com/rekognition/
- Microsoft Azure—Cognitive Services: https://azure.microsoft.com/en-us/services/cognitive-services/
- Google Cloud Platform*—Cloud Vision: https://cloud.google.com/vision/
- FFmpeg multimedia framework: https://www.ffmpeg.org/
- Intel Math Kernel Library for Deep Neural Networks (Intel MKL-DNN): https://01.org/mkl-dnn