Detecting Acute Lymphoblastic Leukemia Using Caffe*, OpenVINO™ and Intel® Neural Compute Stick 2: Part 1

Introduction to convolutional neural networks in Caffe*

Image Placeholder
Credit: Anh Vo

 

As part of my R&D for the Acute Myeloid/Lymphoblastic Leukemia (AML/ALL) AI Research Project, I am reviewing a selection of papers related to using Convolutional Neural Networks (CNN) for detecting AML/ALL. These papers share various ways of creating CNNs, and include useful information about the structure of the layers and the methods used which will help to reproduce the work outlined in the papers.

This is the first part of a series of articles that will take you through my experience building a custom classifier with Caffe* that should be able to detect Acute Lymphoblastic Leukemia (ALL). I chose Caffe as I enjoyed working with it in a previous project, and I liked the intuitivity of defining the layers using prototxt files, however my R&D will include replicating both the augmentation script and the classifier using different languages and frameworks to compare results.

Previously I had followed the Leukemia Blood Cell Image Classification Using Convolutional Neural Network paper by T. T. P. Thanh, Caleb Vununu, Sukhrob Atoev, Suk-Hwan Lee, and Ki-Ryong Kwon paper to create a simple data augmentation program that would match the methods carried out in the paper. This was my first time translating a research paper into code, and although the resulting code is fairly basic in this case (mostly a wrapper around OpenCV* functions), it was a cool experience.

Within the AML/ALL AI Research Project there is GitHub* repository dedicated to open source classifiers, in this directory the team and GitHub developer community we hope to attract will share tutorials that use various languages, frameworks and technologies to create convolutional neural networks.

In this technical article I will explain my experience of creating a custom convolutional neural network in Caffe using an architecture based on the Acute Myeloid Leukemia Classification Using Convolution Neural Network In Clinical Decision Support System paper by Thanh.TTP, Giao N. Pham, Jin-Hyeok Park, Kwang-Seok Moon, Suk-Hwan Lee, and Ki-Ryong Kwon, and the Acute Lymphoblastic Leukemia Image Database for Image Processing dataset by Fabio Scotti from the University of Milan.

In the augmentation paper, the authors mentioned that they were unable to reproduce a good accuracy using the augmented dataset, I will try to reproduce this and if I am unable to get good results will work on recreating the proposed architecture from the beginning of the augmentation paper.

“Our experiments were conducted on Matlab with 1188 images, 70% (831 images) of them for training and the remaining 30% (357 images) for testing our model. The slightly narrow architecture used dramatically failed to reach an appropriate accuracy when applied to this augmented dataset. Therefore, we have presented here a deeper CNN architecture and changed the size of the input volume in order to improve the accuracy rate of the recognition of leukemia (our proposed CNN model achieved 96.6%)”

Hardware

Operating System

  • Ubuntu 16.04 
  • Ubuntu 18.04

Programming Language

  • Python 3 or above

Software

Caffe Installation

Ubuntu 16.04

In my case I installed Caffe on an UP2, but as stated above this is not a requirement. During installation I ran into issues whilst following the Caffe Ubuntu 16.04 installation guide, which led me to find the following tutorial. 

Follow this tutorial to install Caffe & PyCaffe Ubuntu 16.04. If the tutorial did not work for you, you will need to work out how to install Caffe and PyCaffe on your development machine and then come back to this tutorial, installing and debugging Caffe is out of the scope of this tutorial.

If you are installing on an UP2 or similar this may take some time.

Image Placeholder

Ubuntu 18.04

If you are installing Caffe on Ubuntu 18.04 it is a lot easier to get up and running. To install Caffe and PyCaffe on Ubuntu 18.04 you can simply run one of the following commands:

CPU Installation

sudo apt install caffe-cpu

GPU Installation

sudo apt install caffe-cuda

Let's Continue!

Now that we have Caffe installed, I will explain a little bit about it. Caffe is another framework that we can use for building deep learning networks, including convolutional neural networks. I have used Caffe before with Intel Neural Compute Stick (NCS) and YOLO for object detection, but have never really gone too deep into it the framework.

 

Image Placeholder

Figure 1. Proposed Architecture (source)


In the CNN paper it explains the methods they used to define their convolutional neural network’s architecture. Through the use of prototxt files used by Caffe, we can easily, and fairly visually, set up our layers based on the information found in the paper. For more information about convolutions you can check out  Caffe’s convolutions page.or for a more in depth explanation you can check out the information in CS231n: Convolutional Neural Networks for Visual Recognition. The remainder of this part of the article will focus on Caffe and the layers used in the paper, in the future I will cover convolutions in more detail.

As mentioned above, in the paper the authors share information about their architecture, they state how they use an architecture of using a 50 x 50 x 3 input layer (an image), two convolutional layers, a max pooling layer, a fully connected layer and softmax layer as an output. The convolutional layers and the max pooling layers are used for feature detection, while the fully connected and softmax layers are used for feature classification.

Input Layer

The input layer is what feeds data into the network, in our case we were using an image that is 257px x 257px x 3px so our input size would need to be 257 x 257 x 3 (Height, width, depth), for this project a new augmented dataset will be created using the dimensions specified in the paper.

We can create a simple input layer using the following in a prototxt file: allCNN.prototxt, the additional dim, dim: 1, is the batch size meaning we will only send one image through the network per iteration, dim: 3/50/50 are the dimensions shown above which are the result of print(image.shape) (CV2).

layer {
  name: "data"
  type: "Input"
  input_param { shape: { dim: 1 dim: 3 dim: 50 dim: 50 }}
  top: "data"
}

Feature Detection Layers

Convolutional Layers

Image Placeholder

Figure 2. Convolutional Layer (source)


As mentioned in the paper, two convolution layers were used in the proposed architecture. The convolutional layers produce a feature map of a filter’s output activations. During convolution a filter is moved across the image and creates a new pixel in the output image.

We can define the layers as shown below. You will notice the bottom and top settings, these position this layer below the data (input) layer and top is itself, num_outputs is the number of filters, kernel_size represents the size of the filters, stride represents how many pixels the kernel will move by, pad is padding added to the input image (required if we increase the size of the filter larger than the image), engine specifies which engine the model will use (CAFFE/CUDNN), weight_filler initializes the weights, we use the algorithm xavier which allows us to keep a stable signal, and finally bias_filter initializes the bias to 0, in the future I will cover more information about these parameters.   

layer {
  name: "conv1"
  type: "Convolution"
  convolution_param {
    num_output: 30
    kernel_size: 5
    stride: 1
    pad: 2
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
  bottom: "data"
  top: "conv1"
}

Pooling Layer

Image Placeholder

Figure 3. Max Pooling Layer (source)


The authors propose a pooling layer as the final layer in the feature extraction layers. Pooling layers help to reduce overfitting by reducing the size of the representation and the amount of activations/computation used by the network.

The authors state they use a 25 x 25 layer with a filter size of two and using a stride of two. We can define the pooling layer using the allCNN.prototxt file with the following:

layer {
 name: "pool1"
 type: "Pooling"
 pooling_param {
   pool: MAX
   kernel_size: 2
   stride: 2
 }
 bottom: "conv2"
 top: "pool1"
}

Feature Classification Layers

Fully Connected Layer

Figure 4. Fully Connected Layers
Figure 4. Fully Connected Layers (source)

 

The proposed architecture for feature classification includes a two neuron fully connected or inner product layer. The name fully connected layers means the fc layers are fully connected to the activations of the layers they follow. Fully connected layers used with a softmax output layer are used to classify the input image using the trained classes. For more information about fully connected layers visit this link.

The authors state a fully connected layer with two neurons. We can recreate this layer using the following in the allCNN.prototxt file.

layer {
 name: "fc"
 type: "InnerProduct"
 inner_product_param {
   num_output: 2
   weight_filler {
     type: "xavier"
   }
   bias_filler {
     type: "constant"
     value: 0
   }
 }
 bottom: "pool1"
 top: "fc"
}

Softmax Layer

Figure 5. Softmax Layer
Figure 5. Softmax Layer (source)

 

The softmax layer proposed in the paper will output a probabilities distribution of an image being from each of the trained classes, each of the probabilities will add up to 1.0. For more information about softmax you can visit this link.

We can recreate the proposed softmax layer using allCNN.prototxt using the following:

layer {
 name: "prob"
 type: "Softmax"
 bottom: "fc"
 top: "prob"
}

Conclusion

In allCNN.prototxt we should now have the architecture proposed in the Acute Myeloid Leukemia Classification Using Convolution Neural Network In Clinical Decision Support System paper, it is not quite ready for training yet but we can use it to check if the network matches the one proposed in the paper, and visualize the network.

layer {
  name: "data"
  type: "Input"
  input_param { shape: { dim: 1 dim: 3 dim: 50 dim: 50 }}
  top: "data"
}
layer {
  name: "conv1"
  type: "Convolution"
  convolution_param {
    num_output: 30
    kernel_size: 5
    stride: 1
    pad: 2
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
  bottom: "data"
  top: "conv1"
}
layer {
  name: "conv2"
  type: "Convolution"
  convolution_param {
    num_output: 30
    kernel_size: 5
    stride: 1
    pad: 2
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
  bottom: "conv1"
  top: "conv2"
}
layer {
 name: "pool1"
 type: "Pooling"
 pooling_param {
   pool: MAX
   kernel_size: 2
   stride: 2
 }
 bottom: "conv2"
 top: "pool1"
}
layer {
 name: "fc"
 type: "InnerProduct"
 inner_product_param {
   num_output: 2
   weight_filler {
     type: "xavier"
   }
   bias_filler {
     type: "constant"
     value: 0
   }
 }
 bottom: "pool1"
 top: "fc"
}
layer {
 name: "prob"
 type: "Softmax"
 bottom: "fc"
 top: "prob"
}

Using the Info script in the AML / ALL Classifier repository, we can check our networks match. First you need to clone the repository using the following commands:

git clone https://github.com/AMLResearchProject/AML-ALL-Classifiers.git

Then navigate to the allCNN directory:

cd AML-ALL-Classifiers/Python/_Caffe/allCNN

Now you need to install any requirements:

sed -i 's/\r//' Setup.sh 
sh Setup.sh

And finally we can check our network:

python3 Info.py NetworkInfo

The output of the script will include the following, showing that our network was created correctly:

I0309 16:11:30.786394 13920 layer_factory.hpp:77] Creating layer data
I0309 16:11:30.786425 13920 net.cpp:86] Creating Layer data
I0309 16:11:30.786440 13920 net.cpp:382] data -> data
I0309 16:11:30.786473 13920 net.cpp:124] Setting up data
I0309 16:11:30.786490 13920 net.cpp:131] Top shape: 1 3 50 50 (7500)
I0309 16:11:30.786499 13920 net.cpp:139] Memory required for data: 30000
I0309 16:11:30.786507 13920 layer_factory.hpp:77] Creating layer conv1
I0309 16:11:30.786526 13920 net.cpp:86] Creating Layer conv1
I0309 16:11:30.786538 13920 net.cpp:408] conv1 <- data
I0309 16:11:30.786551 13920 net.cpp:382] conv1 -> conv1
I0309 16:11:30.786855 13920 net.cpp:124] Setting up conv1
I0309 16:11:30.786872 13920 net.cpp:131] Top shape: 1 30 50 50 (75000)
I0309 16:11:30.786880 13920 net.cpp:139] Memory required for data: 330000
I0309 16:11:30.786901 13920 layer_factory.hpp:77] Creating layer conv2
I0309 16:11:30.786921 13920 net.cpp:86] Creating Layer conv2
I0309 16:11:30.786931 13920 net.cpp:408] conv2 <- conv1
I0309 16:11:30.786943 13920 net.cpp:382] conv2 -> conv2
I0309 16:11:30.787359 13920 net.cpp:124] Setting up conv2
I0309 16:11:30.787375 13920 net.cpp:131] Top shape: 1 30 50 50 (75000)
I0309 16:11:30.787384 13920 net.cpp:139] Memory required for data: 630000
I0309 16:11:30.787398 13920 layer_factory.hpp:77] Creating layer pool1
I0309 16:11:30.787411 13920 net.cpp:86] Creating Layer pool1
I0309 16:11:30.787420 13920 net.cpp:408] pool1 <- conv2
I0309 16:11:30.787432 13920 net.cpp:382] pool1 -> pool1
I0309 16:11:30.787456 13920 net.cpp:124] Setting up pool1
I0309 16:11:30.787470 13920 net.cpp:131] Top shape: 1 30 25 25 (18750)
I0309 16:11:30.787478 13920 net.cpp:139] Memory required for data: 705000
I0309 16:11:30.787487 13920 layer_factory.hpp:77] Creating layer fc
I0309 16:11:30.787501 13920 net.cpp:86] Creating Layer fc
I0309 16:11:30.787510 13920 net.cpp:408] fc <- pool1
I0309 16:11:30.787523 13920 net.cpp:382] fc -> fc
I0309 16:11:30.788055 13920 net.cpp:124] Setting up fc
I0309 16:11:30.788071 13920 net.cpp:131] Top shape: 1 2 (2)
I0309 16:11:30.788079 13920 net.cpp:139] Memory required for data: 705008
I0309 16:11:30.788110 13920 layer_factory.hpp:77] Creating layer prob
I0309 16:11:30.788125 13920 net.cpp:86] Creating Layer prob
I0309 16:11:30.788133 13920 net.cpp:408] prob <- fc
I0309 16:11:30.788144 13920 net.cpp:382] prob -> prob
I0309 16:11:30.788161 13920 net.cpp:124] Setting up prob
I0309 16:11:30.788175 13920 net.cpp:131] Top shape: 1 2 (2)
I0309 16:11:30.788183 13920 net.cpp:139] Memory required for data: 705016
I0309 16:11:30.788197 13920 net.cpp:202] prob does not need backward computation.
I0309 16:11:30.788205 13920 net.cpp:202] fc does not need backward computation.
I0309 16:11:30.788214 13920 net.cpp:202] pool1 does not need backward computation.
I0309 16:11:30.788223 13920 net.cpp:202] conv2 does not need backward computation.
I0309 16:11:30.788231 13920 net.cpp:202] conv1 does not need backward computation.
I0309 16:11:30.788240 13920 net.cpp:202] data does not need backward computation.
I0309 16:11:30.788249 13920 net.cpp:244] This network produces output prob
I0309 16:11:30.788262 13920 net.cpp:257] Network initialization done.

Visualization

Finally we can visualize how our network looks. The way you installed Caffe will determine what path you need to use draw_net.py. Ensure you are in the AML-ALL-Classifiers/Python/_Caffe/allCNN directory and execute the relevant command from the following commands:

Built From Source

Replace YourCaffePythonLocation with the location of the python directory in your Caffe installation directory.

python3 /YourCaffePythonLocation/draw_net.py Model/allCNN.prototxt Model/allCNN.png

Installed Caffe CPU Using Apt

python3 /usr/share/doc/python3-caffe-cpu/examples/draw_net.py Model/allCNN.prototxt Model/allCNN.png

Installed Caffe Cuda Using Apt

python3 /usr/share/doc/python3-caffe-cuda/examples/draw_net.py Model/allCNN.prototxt Model/allCNN.png

The above command will produce the following image which will be found in the AML-ALL-Classifiers/Python/_Caffe/allCNN/Model directory:

allCNN Architecture
allCNN Architecture

 

To save our network we can use the following command:

python3 Info.py Save

This will save the network to the location Model/allCNN.caffemodel. In the next part of this series of articles I will cover preparing the Acute Lymphoblastic Leukemia Image Database for Image Processing dataset ready to train with our network.

Thanks to AML/ALL AI Research Project team members Amita Kapoor (Associate Professor at Delhi University, New Dehli, India) and Ho Leung Ng (Kansas State University, Dept. Biochemistry & Molecular Biophysics) for their assistance with the article.

References

Author

Adam is a BigFinite IoT Network Engineer, part of the team that works on the core IoT software. In his spare time he is an Intel Software Innovator in the fields of Internet of Things, Artificial Intelligence and Virtual Reality.

For more complete information about compiler optimizations, see our Optimization Notice.