Face Detection with Intel® Distribution for Python*

Abstract

Artificial Intelligence (AI) can be used to solve a wide range of problems, including those related to computer vision, such as image recognition, object detection, and medical imaging. In the present paper we show how to integrate OpenCV* (Open Source Computer Vision Library) with a neural network backend. In order to achieve this aim, we first explain how the video stream is manipulated using a Python* programming interface and we also provide guidelines on how to use it. Finally, we discuss a working example of an OpenCV application. OpenCV is one of the packages that ship with Intel® Distribution for Python* 2018.

Introduction

Today, the possibilities of artificial intelligence (AI) are accessible to almost everyone. There are a number of artificial intelligence applications and many of them require the use of computer vision techniques. One of the most currently used libraries to help detection and matching, motion estimation, and tracking is OpenCV1. OpenCV is a library of programming functions mainly aimed at real-time computer vision. Originally developed by Intel, it was later supported by Willow Garage and is now maintained by Itseez. The library is cross-platform and free for use under the open-source BSD license.

Usually, the OpenCV library is used to detect something on a video or image that is used as input for some AI application or deep learning framework like MXNet*2, Caffe*3, Caffe2*4, Torch*5, Theano*6, TensorFlow*7, and others. There are several AI applications that use computer vision, for instance:

  • Advanced driver assistance systems (ADAS) and autonomous cars
  • Image recognition, object detection and tracking, and automatic document analysis
    • Real-time detection of unattended baggage in airport, train, and bus stations
  • Face detection and recognition, normally used for security issues
  • Medical image processing
  • IoT (Internet of Things) applications

In this work, we will make an overview of video preprocessing techniques that are used to detect some feature that may be present in a video stream, and generate images with the detected features. Our example will focus on face detection, which is used as a preprocessing phase to a face recognition system. The face recognition systems can be an AI application, a deep learning framework, or some cloud service such as Amazon Rekognition*8, Microsoft Azure* Cognitive Services9, Google Cloud Vision10, and others.


Figure 1. Video preprocessing.

Figure 1 shows the flow diagram of the face detection process from a video stream:

  • Input video: Can be from a surveillance camera, webcam, notebook camera, and so on.
  • Backend stream video: Sometimes the OpenCV cannot directly open the video from the camera. In this case, we can use a tool to record, convert, and stream video to one format/encode that the OpenCV knows. The main tool for that is FFmpeg* lib11. FFmpeg is a free software project that produces libraries and programs for handling multimedia data. FFmpeg is a leading multimedia framework, capable of decoding, encoding, transcoding, muxing, demuxing, streaming, filtering, and playing nearly any signal format available. It supports from the most obscure ancient formats up to the cutting edge ones. FFmpeg also supports any signal source, such as screen, or camera, as well as a file input.
  • OpenCV video reader: Open software used to read the video stream and process it to make face or object detection.
  • Frames (fps): Frames per second processed by OpenCV.
  • Image file (.jpg): Output file; this is the OpenCV image recognition result.

Environment

The environment used for this work is composed of one surveillance camera and one computer running CentOS* 7 Linux* with the Intel® Distribution for Python 2018. Intel® Distribution for Python complies with the SciPy* Stack specification, and includes the package OpenCV and a deep learning framework such as Caffe, Intel® Math Kernel Library for Deep Neural Networks (Intel® MKL-DNN)12, Theano, or TensorFlow. Python packages have been accelerated with Intel® Performance Libraries, including Intel® Math Kernel Library (Intel® MKL), Intel® Threading Building Blocks, and Intel® Data Analytics Acceleration Library. All the aforementioned software packages have been optimized to take advantage of parallelism through the use of threading, multiple nodes, and vectorization.

Intel® Distribution for Python 2018 has improved OpenCV performance compared to OpenCV available on CentOS Linux distribution. The performance was measured by comparing the time, in seconds, it takes to compute the full-HD frames captured and processed on OpenCV. The percent gain was around 92 percent. The machine used in our test has two Intel® Xeon® E5-2630 CPU with 8 GBytes of RAM.

To capture a video stream it is necessary to create a VideoCapture* object. The input argument to create such an object can be either the device index or the name of a video file. The device index is simply the number that identifies which camera will provide images for the system. When the VideoCapture object is created, the image provided by the specified camera or video file is captured frame by frame. When the built-in laptop webcam or some external camera is used, it is possible to open the video directly in OpenCV, using the sequence of commands shown in figure 2.

import cv2

cap = cv2.VideoCapture(0)

Figure 2. Capture video from camera.

However, OpenCV cannot handle our surveillance camera directly. For this reason, it is necessary to use a backend video stream to convert the input from the camera to a format that OpenCV can understand. We use the FFmpeg multimedia framework to convert the video stream to MPEG format.

Figure 3 shows the flow diagram for the backend video stream. The components used for this solution are described below.

  • Camera: Device camera
  • ffmpeg: A tool used to copy the video  stream to the file cam.ffm.
  • ffserver: A tool that converts the video stream from camera, saved to file cam.ffm, to an MPEG video stream (cam.mpeg) that will be used by OpenCV.
  • OpenCV: Reads the video stream from ffserver (cam.mpeg) and treats it frame by frame. It is possible to use filters to help the face detection process.
  • File cam.ffm: Used as buffer from ffmpeg tool to ffserver.
  • File cam.mpeg: Used as buffer from ffserver to OpenCV.


Figure 3. Backend video stream.

The ffmpeg software gets input from a video camera and writes to the file named “cam.ffm”. An IP address is assigned to the video camera and an authentication system is used (“user:password”) to grant the user access to the video stream. ffmpeg uses the Real-Time Streaming Protocol (RTSP) over TCP; see Figure 4. In the present case, the Uniform Resource Identifier (URI) uses channel 1, which corresponds to the original video camera.

ffmpeg -rtsp_transport tcp -i
rtsp://user:password@192.168.1.100:554/Streaming/Channels/1
http://localhost:8090/cam.ffm

Figure 4. ffmpeg tool.

Provided that the “cam.ffm” file is created, it is read by the ffserver, which decodes the “cam.ffm” file and encodes it to MPEG format, saving it to the file named “cam.mpeg”. The ffserver needs a configuration file, named “ffserver.config”. Figure 5 shows a basic configuration file. The ffserver has a number of options that can be set up, but in this application, we need only the following basic configuration options:

  • Enable access to http port 8090.
  • Up to 10 clients are allowed.
  • Read the “cam.ffm” and allow access only localhost.
  • Generate a video stream with MPEG format (cam.mjpeg) with the settings below:
    • 20 frames per second
    • Full HD resolution
  • Access from localhost and the 192.168.1./24 network is allowed.

The “ffserver.config” file can be stored in the default directory (/etc); alternatively, a custom location for the configuration file can be provided. If this file is stored in the user’s local directory, ffserver can be called by means of the command line shown in Figure 6. On the other hand, if the “ffserver.config” file is stored in the default directory, use the command shown in Figure 7 to call ffserver.

Finally, the OpenCV reads the “cam.mpeg” file frame by frame using the cv2.VideoCapture function, and processes the video frame by frame.

HTTPPort 8090
HTTPBindAddress 0.0.0.0
MaxClients 10
MaxBandWidth 50000
CustomLog -
#NoDaemon

<Feed cam.ffm>
   File /tmp/cam.ffm
   FileMaxSize 1G
   ACL allow 127.0.0.1
   ACL allow localhost
</Feed>
<Stream cam.mjpeg>
   Feed cam.ffm
   Format mpjpeg
   VideoFrameRate 20
   VideoBitRate 10240
   VideoBufferSize 20480
   VideoSize 1920x1080
   VideoQMin 3
   VideoQMax 31
   NoAudio
   Strict -1
</Stream>
<Stream stat.html>
   Format status
   # Only allow local people to get the status
   ACL allow localhost
   ACL allow 192.168.1.0 192.168.1.255
</Stream>
<Redirect index.html>
   URL http://www.ffmpeg.org/
</Redirect>

Figure 5. ffserver.config file.

ffserver -d -f ./ffserver.config

Figure 6. ffserver tool—local directory.

ffserver -d -f /etc/ffserver.config

Figure 7. ffserver tool—default.

Face Detection

When OpenCV is correctly configured by means of the procedure described above, it reads and processes all frames from the video stream. OpenCV has several built-in pretrained classifiers for face, eyes, and smile detection, among others. We use the frontal face Haar-Cascade classifier for the detection process. The details of this classifier are given in the file named haarcascade_frontalface_default.xml.

Figure 8 shows the Python script to detect faces. Below, we describe how the Python script works.

  • The script has a function called “detect”, which is used for face detection.
  • The script opens the video stream and runs in an infinite loop, identifying each beginning and end of frame.
  • Then, the frame is converted to gray to serve as input to the detect function.
  • If any face is identified within the frame, the script saves a JPEG file, with the naming convention YYYYMMDD_HH_MM_SS_Frame_cam.jpg, where:
    • YYYYMMDD: stands for year, month and day
    • HH_MM_SS: stands for hour, minutes and seconds
import cv2, platform
import numpy as np
import urllib
import os
from time import strftime

def detect(img, cascade, scale, neigh, size):
    rects = cascade.detectMultiScale(img, scaleFactor=scale,
            minNeighbors=neigh, minSize=(size, size))
    if len(rects) == 0:
        return []
    rects[:,2:] += rects[:,:2]
    return rects

face_cascade = cv2.CascadeClassifier('/opt/intel/intelpython2/pkgs/opencv-3.1.0-np113py27_intel_6/share/OpenCV/haarcascades/haarcascade_frontalface_default.xml')

cam = "http://localhost:8090/cam.mjpeg"

stream = urllib.urlopen(cam)
bytes = ''
nframe = 0
nfaces = 0

scale = 1.3
neigh = 3
size = 50
margin = 40
while True:
    # to read mjpeg frame -
    bytes += stream.read(8192)
    a = bytes.find('\xff\xd8')
    b = bytes.find('\xff\xd9')
    if a!=-1 and b!=-1:
        nframe = nframe+1
        jpg = bytes[a:b+2]
        bytes= bytes[b+2:]

        if (nframe % 20)!=0:
          continue
        frame = cv2.imdecode(np.fromstring(jpg, 
                dtype=np.uint8),cv2.CV_LOAD_IMAGE_COLOR)
        # we now have frame stored in frame.

        # Our operations on the frame come here
        gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

        # Detect faces in the image
        rects = detect(gray, face_cascade, scale, neigh, size)

        # Draw a rectangle around the faces
        if len(rects):
            filename = strftime("%Y%m%d_%H_%M_%S")+"_frame_cam.jpg"
            cv2.imwrite(filename, frame)

    # Press 'q' to quit
    #if cv2.waitKey(1) & 0xFF == ord('q'):
    #    break

cv2.destroyAllWindows()

Figure 8. Face detection script—save the frame.

Figure 10 shows another Python script to detect faces. Basically the idea is the same, but this script does not save the whole frame, it saves only the detected faces. The script identifies the faces and adds some margin to get more information to help the recognition software. After adding the margin, the script crops the frame and saves into a small image. Figure 9 shows the face detection internal rectangle (green) and face detection with margin in the external rectangle (blue).


Figure 9. Face detection and margin.

import cv2, platform
import numpy as np
import urllib
from time import strftime

def detect(img, cascade, scale, neigh, size):
    rects = cascade.detectMultiScale(img, scaleFactor=scale,
            minNeighbors=neigh, minSize=(size, size))
    if len(rects) == 0:
        return []
    rects[:,2:] += rects[:,:2]
    return rects

face_cascade = cv2.CascadeClassifier(
 '/usr/share/OpenCV/haarcascades/haarcascade_frontalface_default.xml')

cam = "http://localhost:8090/cam.mjpeg"

stream = urllib.urlopen(cam)
bytes = ''
nframe = 0
nfaces = 0

scale = 1.3
neigh = 3
size = 50
margin = 40
xfhd = 1920
yfhd = 1080

while True:
    # to read mjpeg frame -
    bytes += stream.read(8192)
    a = bytes.find('\xff\xd8')
    b = bytes.find('\xff\xd9')
    if a!=-1 and b!=-1:
        nframe = nframe+1
        jpg = bytes[a:b+2]
        bytes= bytes[b+2:]

        if (nframe % 20) != 0:
          continue
        frame = cv2.imdecode(np.fromstring(jpg, 
                dtype=np.uint8),cv2.CV_LOAD_IMAGE_COLOR)
        # we now have frame stored in frame.

        # Our operations on the frame come here
        gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

        # Detect faces in the image
        rects = detect(gray, face_cascade, scale, neigh, size)

        # Draw a rectangle around the faces
        if len(rects):
            iface = 0
            for x1, y1, x2, y2 in rects:
                iface = iface+1
                nfaces = nfaces+1
                sface = "_%02d" % iface
                filename = strftime("%Y%m%d_%H_%M_%S")+ sface +".jpg"
                yf1 = y1 – margin
                if yf1 < 0:
                   yf1 = 0
                yf2 = y2 + margin 
                if yf2 >= yfhd:
                   yf1 = yfhd - 1
                xf1 = x1 – margin
                if xf1 < 0:
                   xf1 = 0
                xf2 = x2 + margin 
                if xf2 >= xfhd:
                   xf2 = xfhd - 1
                crop_img = frame[yf1:yf2, xf1:xf2]
                cv2.imwrite(filename, crop_img)

    # Press 'q' to quit
    #if cv2.waitKey(1) & 0xFF == ord('q'):
    #    break

cv2.destroyAllWindows()

Figure 10. Face detection script—save only the faces.

Conclusions

This work shows how the OpenCV library can be used to provide adequate input to some face recognition software. Intel Distribution for Python 2018 greatly improves OpenCV performance. Any package included in Intel Distribution for Python as the deep learning framework can be used to make recognition software. Depending on the image, some filters of OpenCV can be used to improve image sharpness; for example, the histogram equalization.

If the face recognition software is used from a cloud, the second script shown in Figure 10 is more appropriate because it saves only the face, not the entire frame, saving storage space, since the file is smaller and its upload is faster. In the examples discussed above, the frame size was around 400 KB and the face size was around 35 KB. Figure 11 shows a fragment of the code of the script for uploading a file to S3* on Amazon Web Services* (AWS). Once the upload is completed, the file is removed. This code fragment can be included in any of the previously shown scripts.

import boto3
import os

# ...

# Upload frame to AWS and remove the file
s3.upload_file(filename, 'YOUR_BUCKET', 'YOUR_FOLDER/'+filename)
# remove file after upload
os.remove(filename)

Figure 11. Stretch for upload to AWS* Cloud.

References

  1. OpenCV: http://opencv.org/
  2. MXNet: https://mxnet.incubator.apache.org/
  3. Caffe: http://caffe.berkeleyvision.org/
  4. Caffe2: https://caffe2.ai/
  5. Torch: http://torch.ch/
  6. Theano: http://deeplearning.net/software/theano/
  7. TensorFlow: https://www.tensorflow.org/
  8. Amazon Web Services—Rekognition: https://aws.amazon.com/rekognition/
  9. Microsoft Azure—Cognitive Services:  https://azure.microsoft.com/en-us/services/cognitive-services/
  10. Google Cloud Platform*—Cloud Vision: https://cloud.google.com/vision/
  11. FFmpeg multimedia framework: https://www.ffmpeg.org/
  12. Intel Math Kernel Library for Deep Neural Networks (Intel MKL-DNN): https://01.org/mkl-dnn
For more complete information about compiler optimizations, see our Optimization Notice.