Doctor Hazel: A Real Time AI Device for Skin Cancer Detection

Why We Built Doctor Hazel

According to the Skin Cancer Foundation, half of the population in the United States is diagnosed with some form of skin cancer by age 65. The survival rate for early detection is almost 98 percent, but it falls to 62 percent when the cancer reaches the lymph node, and 18 percent when it metastasizes to distant organs. With Doctor Hazel, we want to use power of artificial intelligence (AI) to provide early detection as widely as is available.

We originally built Doctor Hazel at the TechCrunch Disrupt Hackathon in September, 2017. Afterwards, we were covered by TechCrunch, the Wall Street Journal, Intel IQ, and many other outlets. And so far, we gave live demos at the Strata Data Conference in New York, the Conference and Workshop on Neural Information Processing Systems (NIPS), and the Intel® AI Dev Jam. For more information, we can be seen at the Doctor Hazel website.

Doctor Hazel Strata Conference
Figure 1. Giving a demo at the Strata Conference.

After the hackathon and demos we received thousands of emails, which motivated us to take this project further.

What AI is and How to Use It

Deep learning is a pretty big trend for machine learning, and the recent success has paved the way to build a project like this. In this sample we focus specifically on computer vision and image classification. To do this, we will build nevus, melanoma, and seborrheic keratosis image classifiers using a deep learning algorithm, the convolution neural network (CNN) through the Caffe* framework.

In this article we focus on supervised learning. It requires training on the server as well as deploying on the edge. Our goal is to build a machine learning algorithm that can detect cancer images in real time; this way you can build your own AI-based skin cancer classification device.

Our application includes two parts. The first part is training, in which we will use different sets of the cancer image database to train a machine learning algorithm (model) with its corresponding labels. The second part is deploying on the edge, using the same model we've trained and running it on an edge device; in this case, the Intel® Movidius™ Neural Compute Stick.

Training on the server diagram
Figure 2. Training on the server and deploying on the edge.

Traditional Machine Learning Versus Deep Learning

This is probably the most asked question in AI, and it's fairly simple once you learn how to do it. In order to understand this, we first have to learn how machine learning image classification works.

Machine learning (ML) requires feature extraction and model training. We first have to use domain knowledge to extract features that can be used for our ML algorithm model; some examples include scale-invariant feature transform (SIFT) and histogram of oriented gradients (HoG). After that we can use a dataset that has all the image features and labels to train our machine learning model.

The major difference between traditional ML and deep learning is in the feature engineering. Traditional ML uses manually programmed features, where deep learning does it automatically. Feature engineering is relatively difficult since it requires domain expertise and is very time consuming. Deep learning requires no feature engineering and can be more accurate.

Traditional machine learning vs deep learning
Figure 3. Traditional machine learning versus deep learning.

Artificial Neural Networks

According to Techopedia, “An artificial neuron network (ANN) is a computational model based on the structure and functions of biological neural networks.” The ANN is technically emulating how a human biological neuron works: It has a finite number of inputs, weights associated with them, and an activate function.

The activation function of a node defines the output of that node, given an input or set of inputs. It is non-linear to encode complex patterns of the data. When input comes in, the activation function applies to the weight sum of the inputs to generate the output. The artificial neurons are connected to one another to form a network; hence it's called an artificial neuron network.

Biological neuron versus artificial neuron
Figure 4. Biological neuron versus artificial neuron (source: Wikipedia).

A feedforward neural network is an artificial neural network wherein connections between the nodes do not form a cycle; this is the simplest form of ANN. It has three layers, the input, hidden, and output layer, where the data comes in through the input layer, through the hidden layer, and onto the output nodes as in the figure below. We can have multiple hidden layers; the complexity of the model is correlated to the size of the hidden layers.

Feedforward neural network
Figure 5. Feedforward neural network (source: Wikipedia).

Training data and loss function are the two elements used in training a neural network. Training data is composed of images and the corresponding labels; loss function measures the inaccuracies during classification. Once those two elements are obtained, we then use a backpropagation algorithm and gradient descent to train an ANN.

Convolutional Neural Networks

A convolutional neural network (CNN) is a class of deep, feed-forward artificial neural networks, most commonly applied to analyzing visual imagery because it is designed to emulate biological behaviors of an animal visual cortex. It consists of convolutional layers and pooling layers so that the network can encode image properties.

Typical C N N network
Figure 6. Typical CNN network (source: Wikipedia).

The convolutional layer's parameters consist of a set of learnable filters (or kernels) that have a small receptive field. This way the image can convolve across spatially, computing dot products between the entries of the filter and the input, producing a two-dimensional activation map of that filter. This way, the network learns of filters that can activate when it detects special features on the input image's spatial feature.

Neurons of a convolutional layer blue, red
Figure 7. Neurons of a convolutional layer (blue) connected to their receptive field (red) (source: Wikipedia).

The pooling layer is a form of non-linear downsampling. It partitions the input image into a set of non-overlapping rectangles and, for each such sub-region, outputs the maximum. The idea is to continuously reduce the spatial size of the input representation to reduce the amount of parameters and computation in the network, so it can also control overfitting.

Maximum pooling is the most common type of non-linear pooling. According to Wikipedia, pooling is often applied “with filters of size 2x2 applied with a stride of 2 downsamples at every depth slice...” A pooling layer of size 2x2 with stride of 2 shrinks the input image to a quarter of its original size.

Max pooling
Figure 8. Max pooling (source: Wikipedia).

Doctor Hazel Components

The equipment needed is very simple for this project—you can either do it with your computer and a USB Intel Movidius Neural Compute Stick or build it using embedded computing like these Internet of things (IoT) devices.

  • UP Squared* board
  • Endoscope camera
  • Intel® Movidius™ (or USB Intel Movidius Neural Compute Stick) product/add-on PCIe*
  • A screen or monitor

Doctor Hazel components
Figure 9. Doctor Hazel components.

Step 1: Gather image dataset for skin cancers

We first need skin cancer datasets. Although there are many places to get one, the International Skin Imaging Collaboration (ISIC) archive is the easiest. We just need about 500 images of each between nevus, melanoma, and seborrheic keratosis as well as 500 random images of anything else. To get greater accuracy we would have to use more data; this is just to get the training started. The easiest way to get the data is via the image below using the ISIC Archive.

ISIC archive
Figure 10. ISIC archive.

Afterwards, we can save the images into one folder and name them nevus-00.jpg and melanoma-00.jpg, so we can easily form our Lightning Memory-Mapped Database (LMDB).

Getting the data from ISIC
Figure 11. Getting the data from ISIC.

Step 2: Setting up the server for training

Machine learning training uses a lot of processing power, and hence, they generally cost a lot. In this article we are going to focus on the Intel® AI DevCloud, which is free for Intel® AI Academy members. It uses Intel® Xeon® Scalable processors and Intel® Optimization for Caffe*, as well as other ML frameworks. You can access it from the Intel AI DevCloud site.

To build Doctor Hazel we are using Caffe framework, mainly because of the Intel Movidius Neural Compute Stick’s edge support. Caffe is a deep learning framework developed by the Berkeley Vision and Learning Center (BVLC), and there are four steps for using the Caffe framework to train our algorithm model.

  • Data prep: Clean the images and store them to the LMDB.
  • Model definition prototxt file: Define the parameters and choose CNN architecture.
  • Solver definition prototxt file: Define the solver parameters for model optimization.
  • Model training: Execute Caffe command to get our .caffemodel algorithm file.

On Intel AI DevCloud, we can check whether these are available simply by going to:

cd /glob/deep-learning/py-faster-rcnn/caffe-fast-rcnn/build/tools

Step 3: Preparing LMDB for training

Once we have Intel AI DevCloud set up, we can build it up:

mkdir doctorhazel
cd doctorhazel
mkdir input
cd input
mkdir train

From there we can get all the data we've previously set up in the folder:

scp ./* colfax:/home/[youruser_name]/doctorhazel/input/train/

Then we can build our LMDB this way. We do the following:

  • So that we can calculate the accuracy of the model, five-sixths of the dataset is used for training and one-sixth is used for validation.
  • Resize all images to 227 x 227 to follow the same standard as BVLC.
  • Apply histogram equalization to all the training images to adjust the contrast.
  • Store the images among train_lmdb and validation_lmdb.
  • Use make_datum to label all the image datasets inside LMDB.
import os
import glob
import random
import numpy as np
import cv2
import caffe
from caffe.proto import caffe_pb2
import lmdb
#We use 227x227 from BVLC
IMAGE_WIDTH = 227
IMAGE_HEIGHT = 227
def transform_img(img, img_width=IMAGE_WIDTH, img_height=IMAGE_HEIGHT):
   img[:, :, 0] = cv2.equalizeHist(img[:, :, 0])
   img[:, :, 1] = cv2.equalizeHist(img[:, :, 1])
   img[:, :, 2] = cv2.equalizeHist(img[:, :, 2])
   img = cv2.resize(img, (img_width, img_height), interpolation = cv2.INTER_CUBIC)
   return img
def make_datum(img, label):
   return caffe_pb2.Datum(
       channels=3,
       width=IMAGE_WIDTH,
       height=IMAGE_HEIGHT,
       label=label,
       data=np.rollaxis(img, 2).tostring())
train_lmdb = '/home/[your_username]/doctorhazel/input/train_lmdb'
validation_lmdb = '/home/[youser_username]/doctorhazel/input/validation_lmdb'
os.system('rm -rf  ' + train_lmdb)
os.system('rm -rf  ' + validation_lmdb)
train_data = [img for img in glob.glob("./input/train/*jpg")]
random.shuffle(train_data)
print 'Creating train_lmdb'
in_db = lmdb.open(train_lmdb, map_size=int(1e12))
with in_db.begin(write=True) as in_txn:
   for in_idx, img_path in enumerate(train_data):
       if in_idx %  6 == 0:
           continue
       img = cv2.imread(img_path, cv2.IMREAD_COLOR)
       img = transform_img(img, img_width=IMAGE_WIDTH, img_height=IMAGE_HEIGHT)
       if 'none' in img_path:
           label = 0
       elif 'nevus' in img_path:
           label = 1
       elif 'melanoma' in img_path:
           label = 2
       else:
           label = 3
       datum = make_datum(img, label)
       in_txn.put('{:0>5d}'.format(in_idx), datum.SerializeToString())
       print '{:0>5d}'.format(in_idx) + ':' + img_path
in_db.close()
print '\nCreating validation_lmdb'
in_db = lmdb.open(validation_lmdb, map_size=int(1e12))
with in_db.begin(write=True) as in_txn:
   for in_idx, img_path in enumerate(train_data):
       if in_idx % 6 != 0:
           continue
       img = cv2.imread(img_path, cv2.IMREAD_COLOR)
       img = transform_img(img, img_width=IMAGE_WIDTH, img_height=IMAGE_HEIGHT)
       if 'none' in img_path:
           label = 0
       elif 'nevus' in img_path:
           label = 1
       elif 'melanoma' in img_path:
           label = 2
       else:
           label = 3
       datum = make_datum(img, label)
       in_txn.put('{:0>5d}'.format(in_idx), datum.SerializeToString())
       print '{:0>5d}'.format(in_idx) + ':' + img_path
in_db.close()
print '\nFinished processing all images’

Afterwards, we will run the script:

python2 create_lmdb.py

To get all of the LMDB. When that is done, we need to get the mean image of the training data. As part of Caffe, we can do it through:

cd /glob/deep-learning/py-faster-rcnn/caffe-fast-rcnn/build/tools
compute_image_mean -backend=lmdb /home/[your_user]/doctorhazel/input/train_lmdb /home/[your_user]/doctorhazel/input/mean.binaryproto

The command above generates the mean image of training data. Each input image subtracts the mean image so that every feature pixel has zero mean. This is commonly used in preprocessing for supervised machine learning.

Step 4: Set up model definition and solver definition

We now need to set up the model definition and solver definition. In this article we will be using bvlc_reference_net, which can be seen at the BVLC / GitHub* site.

This GitHub site has the full code of the prototext.

Below is the modified version of train.prototxt. We’ve only changed the input data on the following lines:

mean_file: "/home/[your_username]/doctorhazel/input/mean.binaryproto"
source: "/home/[your_username]/doctorhazel/input/train_lmdb"
mean_file: “/home/[your_username]/doctorhazel/input/mean.binaryproto"

And output to num_output:

num_output: 4

If you want to train additional labels, we can add additional information. But this is enough to get the project trained.

Visual representation of the Caffe* model
Figure 12. Visual representation of the Caffe* model.

At the same time, we can create deploy.prototxt, which is built off of train.prototxt. This can be seen from the GitHub repo. We will also create the label.txt file the same way that we created the LMDB file:

classes
None
Nevus
Melanoma 
Seborrheic Keratosis

After that, we need the solver definition in solver.prototxt; it is used to optimize the training models. Because we are relying on a CPU, we need to make some modifications on the solver definition below:

net: "/home/[your_username]/doctorhazel/model/train.prototxt"
test_iter: 50
test_interval: 50
base_lr: 0.001
lr_policy: "step"
gamma: 0.1
stepsize: 50
display: 50
max_iter: 5000
momentum: 0.9
weight_decay: 0.0005
snapshot: 1000
snapshot_prefix: "/home/[your_username]/doctorhazel/model"
solver_mode: CPU

Because we are dealing with a small amount of data, we can shorten the test iterations and get our model as quickly as possible. To make this short, the solver computes the accuracy of the model every 50 iterations using the validation set. Since we do not have a lot of data, the solver optimization process will take a snapshot every 1,000 iterations and run for a maximum of 5,000 iterations. The current configuration of lr_policy: "step", stepsize: 2500, base_lr: 0.001, and gamma: 0.1 is pretty standard, and we can try to use others as well through the BVLC solver documentation.

Step 5: Training the model

Since we are using the free Intel AI DevCloud and have everything all set, we can use the Intel® Optimization for Caffe*, that’s installed on the cluster. Since this is a cluster, we can simply start training by using the command below:

cd /glob/deep-learning/py-faster-rcnn/caffe-fast-rcnn/build/tools
echo caffe train --solver ~/doctorhazel/model/solver.prototxt | qsub -o ~/doctorhazel/model/output.txt -e ~/doctorhazel/model/train.log

The trained model will be model_iter_1000.caffemodel, model_iter_2000.caffemodel, and so on. With the data from ISIC you should obtain somewhere around 70 to 80 percent accuracy. You can plot your own curve by the following command:

cd ~/doctorhazel
python2 plot_learning_curve.py ./model/train.log ./model/train.png

Training curve
Figure 13. Training curve.

Step 6: Deploying and running on the edge (Intel® Movidius™ Neural Compute Stick)

For this article we are using an UP Squared board so that we can have an offline device to carry around. The UP Squared board is already installed with Ubuntu* 16.04, making things a bit easier. We first create a folder on the device and copy everything we trained on the server.

mkdir -p ~/workspace
cd workspace
mkdir doctorhazel
cd doctorhazel
mkdir WaterNet
scp colfax:/doctorhazel/model/* ./CancerNet

Next, we need to install the Intel Movidius Neural Compute Stick software development kit (SDK). This ensures that we can run our programs on the edge. The simple command would be:

cd ~/workspace
git clone https://github.com/movidius/ncsdk.git
cd ~/workspace/ncsdk
make install

Full instructions can be seen in the following video:

Then we need to download all the sample apps in ncappzoo, which is also created by Intel® Movidius™ technology. The specific app we need is stream inference, which you can get from the example file:

cd ~/workspace
git clone https://github.com/movidius/ncappzoo.git
cd doctorhazel
cp ~/workspace/ncappzoo/apps/stream_infer/* ./

Change the file into CancerNet

NETWORK_IMAGE_WIDTH = 227                     # the width of images the network requires
NETWORK_IMAGE_HEIGHT = 227                    # the height of images the network requires
NETWORK_IMAGE_FORMAT = "BGR"                  # the format of the images the network requires
NETWORK_DIRECTORY = "/CancerNet/" # directory of the network 
NETWORK_STAT_TXT = "./squeezenet_stat.txt"    # stat.txt for network
NETWORK_CATEGORIES_TXT = "/CancerNet/label.txt" # categories.txt for network

For the last and final part, we need to compile a graph file for the Intel Movidius Neural Compute Stick SDK:

cd ~/workspace/doctorhazel/CancerNet/
mvNCCompile deploy.prototxt -w 
cd ..
python3 stream_infer.py

From here, you have a Doctor Hazel of your own.

Regular mole
Figure 14. Image of a regular mole.

melanoma
Figure 15. Image of a melanoma.

We used Intel AI DevCloud with Intel Optimization for Caffe on the cloud, and the Intel Movidius Neural Compute Stick on the edge.

For more complete information about compiler optimizations, see our Optimization Notice.