Acute Myeloid/Lymphoblastic Leukemia Data Augmentation

Published:03/12/2019   Last Updated:03/12/2019

Acute Myeloid/Lymphoblastic Leukemia Classifier Data Augmentation

Acute Myeloid/Lymphoblastic Leukemia Data Augmentation

 

The AML/ALL Classifier Data Augmentation program applies filters to datasets and increases the amount of training / test data available to use. The program is part of the computer vision research and development for the Peter Moss Acute Myeloid/Lymphoblastic (AML/ALL) Leukemia AI Research Project.

Before you start the tutorial below, you should complete the steps in the Augmentation README

Research papers followed

The papers that this part of the project is based on were provided by project team member, Ho Leung, Associate Professor of Biochemistry & Molecular Biophysics at Kansas State University.

Leukemia Blood Cell Image Classification Using Convolutional Neural Network

T. T. P. Thanh, Caleb Vununu, Sukhrob Atoev, Suk-Hwan Lee, and Ki-Ryong Kwon 

Dataset

The Acute Lymphoblastic Leukemia Image Database for Image Processingdataset is used for this project. The dataset was created by Fabio Scotti, Associate Professor Dipartimento di Informatica, Università degli Studi di Milano. Big thanks to Fabio for his research and time put in to creating the dataset and documentation, it is one of his personal projects. You will need to follow the steps outlined here to gain access to the dataset.

Data augmentation

I decided to use some augmentation proposals outlined in Leukemia Blood Cell Image Classification Using Convolutional Neural Network by T. T. P. Thanh, Caleb Vununu, Sukhrob Atoev, Suk-Hwan Lee, and Ki-Ryong Kwon. The augmentations I chose were grayscaling, histogram equalization, horizontal and vertical reflection, gaussian blur and translation to start with. Using these techniques so far I have been able to increase a dataset from 39 positive and 39 negative images to 1053 positive and 1053 negative, with more augmentations to experiment with.

The full Python* class that holds the functions mentioned below can be found in Classes/Data.py, The Data class is a wrapper class around releated functions provided in popular computer vision libraries including as OpenCV* and Scipy.

Resizing

The first step is to resize the image this is done with the following function:

def resize(self, filePath, savePath, show = False):
    
    ###############################################################
    #
    # Writes an image based on the filepath and the image provided. 
    #
    ###############################################################

    image = cv2.resize(cv2.imread(filePath), self.fixed)
    self.writeImage(savePath, image)
    self.filesMade += 1
    print("Resized image written to: " + savePath)
    
    if show is True:
        plt.imshow(image)
        plt.show()
        
    return image

Grayscaling

In general grayscaled images are not as complex as color images and result in a less complex model. In the paper the authors described using grayscaling to create more data easily. To create a greyscale copy of each image I wrapped the built in OpenCV function, cv2.cvtColor(). The created images will be saved to the relevant directories in the default configuration.

def grayScale(self, image, grayPath, show = False):
    
    ###############################################################
    #
    # Writes a grayscale copy of the image to the filepath provided. 
    #
    ###############################################################
    
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    self.writeImage(grayPath, gray)
    self.filesMade += 1
    print("Grayscaled image written to: " + grayPath)
    
    if show is True:
        plt.imshow(gray)
        plt.show()
        
    return image, gray

Histogram Equalization

Histogram equalization is basically stretching the histogram horizontally on both sides, increasing the intensity/contrast. Histogram equalization is described in the paper to enhance the contrast.

In the case of this dataset, it makes both the white and red blood cells more distinguishable. The created images will be saved to the relevant directories in the default configuration.

def equalizeHist(self, gray, histPath, show = False):
    
    ###############################################################
    #
    # Writes histogram equalized copy of the image to the filepath 
    # provided. 
    #
    ###############################################################
    
    hist = cv2.equalizeHist(gray)
    self.writeImage(histPath, cv2.equalizeHist(gray))
    self.filesMade += 1
    print("Histogram equalized image written to: " + histPath)
    
    if show is True:
        plt.imshow(hist)
        plt.show()
        
    return hist

Reflection

Reflection is a way of increasing your dataset by creating a copy that is fliped on its X axis, and a copy that is flipped on its Y axis. The reflection function below uses the built in OpenCV function, cv2.flip, to flip the image on the mentioned axis. The created images will be saved to the relevant directories in the default configuration.

def reflection(self, image, horPath, verPath, show = False):
    
    ###############################################################
    #
    # Writes histogram equalized copy of the image to the filepath 
    # provided. 
    #
    ###############################################################
    
    horImg = cv2.flip(image, 0)
    self.writeImage(horPath, horImg)
    self.filesMade += 1
    print("Horizontally reflected image written to: " + horPath)
    
    if show is True:
        plt.imshow(horImg)
        plt.show()
        
    verImg = cv2.flip(image, 1)
    self.writeImage(verPath, verImg)
    self.filesMade += 1
    print("Vertical reflected image written to: " + verPath)
    
    if show is True:
        plt.imshow(verImg)
        plt.show()
        
    return horImg, verImg

Gaussian Blur

Gaussian Blur is a popular technique used on images and is especially popular in the computer vision world. The function below uses the ndimage.gaussian_filter function. The created images will be saved to the relevant directories in the default configuration.

def gaussian(self, filePath, gaussianPath, show = False):
    
    ###############################################################
    #
    # Writes gaussian blurred copy of the image to the filepath 
    # provided. 
    #
    ###############################################################
    
    gaussianBlur = ndimage.gaussian_filter(plt.imread(filePath), sigma=5.11)
    self.writeImage(gaussianPath, gaussianBlur)
    self.filesMade += 1
    print("Gaussian image written to: " + gaussianPath)

    if show is True:
        plt.imshow(gaussianBlur)
        plt.show()
        
    return gaussianBlur

Translation

Translation is a type of Affine Transformation and basically repositions the image within itself. The function below uses the cv2.warpAffine function. The created images will be saved to the relevant directories in the default configuration.

def translate(self, image, translatedPath, show = False):

    y, x, c = image.shape
    translated = cv2.warpAffine(image, np.float32([[1, 0, 84], [0, 1, 56]]), (x, y))
    self.writeImage(translatedPath, translated)
    self.filesMade += 1
    print("Translated image written to: " + translatedPath)

    if show is True:
        plt.imshow(translated)
        plt.show()

    return translated

Rotation

Rotation is a popular technique used on images to rotate the original image. The function below uses the img.rotate function. The created images will be saved to the relevant directories in the default configuration.

def rotation(self, path, filePath, filename, show = False): 
    
    ###############################################################
    #
    # Writes rotated copies of the image to the filepath 
    # provided. 
    #
    ###############################################################
    
    img = Image.open(filePath)

    for i in range(0, 20):
        randDeg = random.randint(-180, 180)
        fullPath = os.path.join(path, str(randDeg) + '-' + str(i) + '-' + filename)

        try:
            if show is True:
                img.rotate(randDeg, expand=True).resize((self.confs["Settings"]["ImgDims"], self.confs["Settings"]["ImgDims"])).save(fullPath).show()
                self.filesMade += 1
            else:
                img.rotate(randDeg, expand=True).resize((self.confs["Settings"]["ImgDims"], self.confs["Settings"]["ImgDims"])).save(fullPath)
                self.filesMade += 1
            print("Rotated image written to: " + fullPath)
        except:
            print("File was not written! "+filename)

 

Clone the code from the GitHub* repo

You will need to clone the project from our GitHub* to which ever device you are going to run it on.

$ git clone https://github.com/AMLResearchProject/AML-ALL-Classifiers.git

Dataset Access

The Acute Lymphoblastic Leukemia Image Database for Image Processing by Fabio Scotti, Associate Professor Dipartimento di Informatica, Università degli Studi di Milano is used with this project, you can request access by following the instructions on the Download and Term of use page, you can also view Reporting the results on ALL-IDB for information on how to organize and submit your findings.

Once you have access to the dataset, you should add your dataset to the 0 & 1 directories in the Model/Data directory, if you configure the same way you do not need to change any settings. Using these techniques so far I have been able to increase a dataset from 39 positive and 39 negative images to 1053 positive and 1053 negative, with more augmentations to experiment with.

The Data Augmentation Notebook

In AML-ALL-Classifiers/Python/Augmentation follow the steps outlined in the augmentation installation guide named Augmentation.ipynb on your Jupyter server. The notebook and related README provide a full walk through of setting up and using the data augmentation program.

Contributing

We welcome contributions of the project. Please read CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests.

Versioning

We use SemVer for versioning. For the versions available, see Releases.

License

This project is licensed under the MIT License — see the LICENSE file for details.

Bugs/Issues

We use the repo issues to track bugs and general requests related to using this project.

About The Author

Adam is a BigFinite IoT Network Engineer, part of the team that works on the core IoT software for our platform. In his spare time he is an Intel Software Innovator in the fields of Internet of Things, Artificial Intelligence and Virtual Reality.

Adam Milton-Barker

 

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.