Face It is a mobile application that detects a person's facial structure as well as information about the person’s lifestyle and current trends, and utilizing that data, recommends the user a hair/beard style.
For this Early Innovation Project, our goal is to have the user scan his face to determine a face shape and then use this face shape along with other personal information such as information about the person’s hair and lifestyle to come up with personalized hair and beard style recommendations.
During the third and fourth week of this early innovation project we focused on building our actual product and forming a simple a demonstration. Now that a decent amount of back-end work has been taken care of, we aimed to focus on some front-end parts of our application.
Our User Interface has been fully created using Android Studio. Our UI consists of three screens. The first one being the camera screen, the second one being the preferences screen and the third one being the final recommendations screen.
We have also started training and testing our convolutional neural network to recognize face shape. We chose a convolutional neural network (CNN) because the architecture of a CNN is the best for image recognition tasks. CNN architectures are inspired by biological processes and include variations of multilayer receptors that result in minimal amounts of preprocessing. In a CNN, there are multiple layers that each have distinct functions to help us recognize an image. These layers include a convolutional layer, pooling layer, rectified linear unit (ReLU) layer, fully connected layer and loss layer.
Image credit: https://www.mathworks.com/help/nnet/convolutional-neural-networks.html
- The Convolutional layer acts as the core of any CNN. The network of a CNN develops a 2-dimensional activation map that detects the special position of a feature at all the given spatial positions which are set by the parameters.
- The Pooling layer acts as a form of down sampling. Max Pooling is the most common implementation of pooling. Max Pooling is ideal when dealing with smaller data sets which is why we are choosing to use it.
- The ReLU layer is a layer of neurons which applies an activation function to increase the nonlinear properties of the decision function and of the overall network without affecting the receptive fields of the convolutional layer itself.
- The Fully Connected Layer, which occurs after several convolutional and max pooling layers, does the high-level reasoning in the neural network. Neurons in this layer have connections to all the activations amongst the precious layers. After, the activations for the Fully Connected layer are computed by a matrix multiplication and a bias offset.
- The Loss layer specifies how the network training penalizes the deviation between the predicted and true layers. We believe that Softmax Loss is the best for our project as this is ideal for detecting a single class in a set of mutually exclusive classes.
For our dataset we have divided the face shapes into six different shapes: diamond, oblong, oval, round, square and triangle. We have designated a folder to each shape.
Each of these folders contain various images of faces that match the shape of the designated folder name. These images were gathered from select articles and images from Google images. For our dataset, we tried to encapsulate the various angles and positions a person’s face would be in so that when it comes time to testing with a new face, the results would be as accurate as possible.
With these images we trained our CNN model using transfer learning and Google’s* Inception v3 CNN model with TensorFlow*. We then created a very basic demonstration using Android* Studio where a person can use his phone to scan a person’s face and determine the person’s face shape. This demo uses stored weights from our trained CNN and uses this information to output a percentage of how similar the person’s face shape is to the various face shapes we trained our model with.
As you can see in the above image, according to our model, the user had a 69% chance of having a square face shape. When performing these tests on other users we noticed a lot of variation between the percent of accuracy. For example, we saw some users receive an accuracy of about 56% while other users received an accuracy of about 80%.
This is a problem because of course we want our data to be as accurate as possible. To solve this issue, we have identified two areas that we can improve upon: increasing our dataset and using clearer and simpler images.
The amount of images we trained the model on was approximately 50 images per face shape. This may be too small of an amount and could possibly be the reason why the results aren’t always very accurate. We plan on increasing our dataset by at least doubling it and using approximately 100 images per face shape. Hopefully this will provide enough data for our results to be more accurate.
Our data itself may have caused the inaccurate results as well. Our data may not have been clear enough for the algorithm to realize what part of the image to focus on. The people in our images had various hairstyles and were in front of various backgrounds which may have caused the algorithm to not focus on exactly what we want it to focus on which is the person’s face. To solve this issue, we plan on using clearer data such as images of people’s faces with no additional hair or unnecessary backgrounds. Below is an example of an image we would use.
Image credit: http://www.masterfile.com/image/en/400-04150913/man-shaving-his-head-isolated-on-a-white-background
We hope that these changes will give us much more accurate results. Once we start getting the results that we are looking for we will then move on to our next stages of integrating our CNN model with our user interface. We are very excited for the next few stages of our project!