Deep Learning for Cancer Diagnosis: A Bright Future

Cancer is a leading cause of death and affects millions of lives every year. Its early detection could help to increase the survival of many lives1 in addition to saving billions of dollars.2 Most of the healthcare data are obtained from ‘omics’ (such as genomics, transcriptomics, proteomics, or metabolomics), clinical trials, research and pharmacological studies. Such data are highly complex, variable, and multidimensional. Sometimes such data are available from incompatible data sources. Unfortunately, the bulk of this data remains underutilized and could be used for biomarker identification and drug discovery.

Deep learning (DL) is a member of the larger machine learning (ML) and artificial intelligence (AI) family. It has been applied in many fields like computer vision, speech recognition, natural language processing, object detection, and audio recognition.3 Deep learning architectures, including deep neural networks (DNNs) and recurrent neural networks (RNNs), have been persistently improving the state of the art in drug discovery and disease diagnosis.4 Deep learning has the potential to achieve good accuracy for the diagnosis of various types of cancers, such as breast, colon, cervical, and lung cancer. It builds an efficient algorithm based on multiple processing layers of neurons5 (see Figure 1). However, the output (i.e. accuracy) of any deep learning model depends on multiple factors including, but not limited to, data type (numeric, text, image, sound, video), data size, architecture, and data ETL (extract, transform, load) and so on.

Figure 1

In this article we explore how deep learning has been successfully applied to potential areas of oncology (the study of cancer diagnosis and treatment). It provides an insight into deep learning for medical and paramedical professionals, educators, and students. It also highlights the various potential area of healthcare where data science professionals, such as scientists, data engineers, and developers, can take the lead in building products and services that use Intel® technologies.

Identification of Biomarker Useful for Cancer Diagnosis Using Deep Learning

Figure 2The human genome is a complex sequence of nucleic acids. It encodes as DNA within 23 chromosomes.6 It is well known that the expression of genes changes according to the situation and consequently such changes regulate many biological functions. Interestingly, certain genes change only as a result of specific pathological conditions (like cancer) or with treatment. These genes are called biomarker(s) for a specific tumor. Recently, a group of scientists from Oregon State University used a deep learning approach to identify certain genes critical for the diagnosis of breast cancer. As shown in Figure 2, they used a stacked denoising autoencoder (SDAE) for features extraction and then implied supervised classification models to verify new features in cancer detection.7 Another group of scientist from China applied a deep learning model for high-level features extraction between combinatorial SMP (somatic point mutations) and cancer types.8

Discovering Drug Molecules and Biomarkers Using Deep Learning

Several endogenous molecules (chemical compound or protein) circulate in body fluid (blood, urine, cerebrospinal fluid). Some of these molecules are considered to be a tumor-specific biomarkers. The discovery of such a molecule or its synthetic analog gives new hope for understanding the mechanisms of disease and for creating therapeutic benefits.

The design of a new molecule is based on the historical dataset of old molecules and targets. In quantitative structure-activity relationship (QSAR) analysis scientists try to find out a known and novel patterns between structures and activity. At the Merck Research Laboratory, Ma et al. used a dataset of thousands of compounds (~5000) and built a model based on the architecture of DDNs.9 In another QSAR study, Dahl et al. built neural network models on 19 datasets of 2000‒14000 compounds to predict the activity of new compound.10 Aliper and colleagues built a deep neural network–support vector machine (DNN-SVM) model that was trained on a large transcriptional response dataset and classified various drugs into therapeutic categories.11

AtomNet is the first structure-based deep convolutional neural network. It incorporates structural target information and consequently predicts the bioactivity of small molecules. This application worked successfully to predict new active molecules for targets with no previously known modulators.12 Furthermore, Altae-Tran et al. introduced a new deep learning architecture called iterative refinement long short-term memory (LSTM), which significantly increases predictive power for specific drug discovery problems even with limited data.13

Feature Detection in Histopathological Images Using Deep Learning

Figure 3 Histopathological images are primarily obtained from the thin section of a tumor. These sections are generally stained with specific colorful chemicals or antibodies to distinguish cancerous cells. Recently Kaggle* organized the Intel and MobileODT Cervical Cancer Screening competition to improve the precision and accuracy of cervical cancer screening using deep learning.14 The participants used different deep learning models such as the faster R-CNN detection framework with VGG16,15 supervised semantic-preserving deep hashing (SSDH), and U-Net for convolutional networks.16 As shown in Figure 3, Dr. Silva achieved 81 percent accuracy using the Intel® Deep Learning SDK and GoogLeNet* using Caffe* on the validation test.

Turkki et al. also applied convolutional neural networks (CNNs) and a support vector analysis approach to quantify immune cells in breast cancer slides. They achieved up to 90 percent of the agreement, similar to what pathologists achieve.17 Another application of deep learning is predicting the prognosis of cancer (that is estimating the stage of cancer). Hyung et al. found 83.5 percent average accuracy in predicting a patient’s survival with gastric cancer.18

Feature Detection in MRI and Ultrasound Images Using Deep Learning

Figure 4 Medical technologies such as computed tomography, magnetic resonance imaging (MRI), and ultrasound are a rich source to capture tumor images without invasion. Deep learning models can be used to measure the tumor growth over time in cancer patients on medication. As shown in Figure 4, Jaeger et al. applied CNN architecture on diffusion-weighted MRI. Based on an estimation of the properties of the tumor tissue, this architecture reduced false-positive findings and thereby decreased the number of unnecessary invasive biopsies. The researchers noticed that deep learning reduced the motion and vision error and thus provided more stable results in comparison to manual segmentation.19 A study conducted in China showed that deep learning helped to achieve 93 percent accuracy in distinguishing malignant and benign cancer on the elastogram of ultrasound shear-wave elastography of 200 patients.2,20

Identification of Cancer Cell Type Based on Morphological Features of Cells Using Deep Learning

Figure 5Several participants in the Kaggle competition successfully applied DNN to the breast cancer dataset obtained from the University of Wisconsin. Based on the features of each cell nucleus (radius, texture, perimeter, area, smoothness, compactness, concavity, symmetry, and fractal dimension), a DNN classifier was built to predict breast cancer type (malignant or benign) (Kaggle: Breast Cancer Diagnosis Wisconsin). Similarly, Xu et al. investigated datasets from over 7,000 images of single red blood cells (RBCs) from eight patients with sickle cell disease and applied the DNN classifier to classify the different RBC types (sickle, elongated, granular, oval, discocytes, and so on) (see Figure 5). The trained deep CNN distinguished the subtle differences in texture alteration inside the oxygenated and deoxygenated RBCs.22

Challenges of Implementing Deep Learning in Healthcare

Data Scarcity

Healthcare generates a huge amount of data, which is consistently growing. However, this data is not widely available to scientists and developers in startups and other areas of industry and in academic areas. While the potential for lucrative profit that can come from joining AI with the healthcare domain is high, unfortunately only a small fraction of startups and institutes are interested in exploring AI tools in healthcare. The amount of investment of time, money, and human resources in this domain is restricted probably due to ethical concerns and several rules and regulations. However the data scarcity trend is changing rapidly, and there is a lot of room in which to grow.

Data Sharing

In order to build a deep learning model in the medical field, we need a significant amount of high-quality data. In some cases, the specific clinical or research data are not available (or are very limited) from a particular institute. It is essential to collaborate with other institutes to get sufficient data. But setting up such collaborations with a mutual agreement is sometimes hard to accomplish and very time-consuming.

Computational Skills

People working in the medical, paramedical, and research fields are well educated for performing medical and research jobs. However, they may not have enough training in computer science and various computation languages, such as C++, Python*, and Java*, hardware and software knowledge. Having a team of AI experts, such as deep learning data scientists, developers, and solution architects, may be beneficial in order to fill the gaps in computation skill and run the healthcare projects.

Deep Learning Skills

It is essential that a data scientist, developer, or data engineer have knowledge of the healthcare domain and also a good understanding of DNNs as well as experience in advanced statistical modeling. They should be aware of the latest deep learning frameworks, libraries, APIs, UIs, web interactive notebooks, labs, Intel® AI DevCloud, multinode systems, big data, and so on. Implementations using older libraries or frameworks in the project may delay the process or produce poor results. A team of deep learning experts may help you to design optimal sampling procedures and get effective results. They may also help you to not only identify external and internal factors causing variation in your data analysis but also build strategies to reduce optimization difficulties due to poor-conditioning or local minima.23

Data Science and Medical Science: A Combined Approach

Deep learning has a great potential to help medical and paramedical practitioners24 by

  • Reducing the human error rate and the workload,
  • Helping in diagnosis and the prognosis of disease, and
  • Analyzing complex data and building a report.

The histopathological examination of thousands of images is complex, time-consuming, and labor intensive. How can AI help?

A team from Harvard Medical School’s Beth Israel Deaconess Medical Center noticed a 2.9 percent error rate with the AI model and a 3.5 percent error rate with pathologists for breast cancer diagnosis. Interestingly, the pairing of “deep learning with pathologist” showed a 0.5 percent error rate, which is an 85 percent drop.24 Litjens et al. suggest that deep learning holds great promise to improve the efficacy of prostate cancer diagnosis and breast cancer staging. 25,26

Learn More About Initiatives in AI from Intel

Intel commends the AI developers who contribute their time and talent to help improve diagnosis and treatment for this life-threatening disease. Committed to helping scale AI solutions through the developer community, Intel makes available free AI training and tools through the Intel® AI Developer Program.

Intel recently published the series of AI hands-on tutorials. Here you will learn:

  • How to start your project by defining goals, data sources, and the strategy of building your AI team (ideation and planning)
  • How to select a deep learning framework optimized by Intel, an AI computing infrastructure from Intel, data resources, and so on (technology and infrastructure)
  • How to build an AI model (data and modeling)
  • How to build and deploy an app (app development and deployment)

The same concepts could also be useful in healthcare to solve a similar set of problems. In the next series of articles, we will explore some examples of healthcare datasets where you will learn how to apply deep learning. If you want to test the deep learning in your own dataset, please contact the support community at Intel® AI Developer Program. Our Intel team will help you achieve your project goals.

Take part as AI drives the next big wave of computing, delivering solutions that create, use and analyze the massive amounts of data that are generated every minute.

Sign up to get the latest tools, optimized frameworks, and training for AI, machine learning, and deep learning.


  1. Howard, J. Ted Talk: The wonderful and terrifying implication of computers that can learn. (2014).
  2. Ali, A.-R. Deep Learning in Oncology – Applications in Fighting Cancer. September 14 (2017). 
  3. Lecun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).
  4. Mamoshina, P., Vieira, A., Putin, E. & Zhavoronkov, A. Applications of Deep Learning in Biomedicine. Molecular Pharmaceutics 13, 1445–1454 (2016).
  5. INSILICO MEDICINE, I. Deep learning applied to drug discovery and repurposing. (2016). 
  6. Wikipedia. Human genome.
  7. Danaee, P., Ghaeini, R. & Hendrix, D. A. A deep learning approach for cancer detection and relevant gene indentification. Pac. Symp. Biocomput. 22, 219–229 (2017).
  8. Yuan, Y. et al. Deepgene: An advanced cancer type classifier based on deep learning and somatic point mutations. BMC Bioinformatics 17, (2016).
  9. Ma, J., Sheridan, R. P., Liaw, A., Dahl, G. E. & Svetnik, V. Deep neural nets as a method for quantitative structure-activity relationships. J. Chem. Inf. Model. 55, 263–274 (2015).
  10. Dahl, G. E., Jaitly, N. & Salakhutdinov, R. Multi-task Neural Networks for QSAR Predictions. (University of Toronto, Canada. 2014).
  11. Aliper, A. et al. Deep learning applications for predicting pharmacological properties of drugs and drug repurposing using transcriptomic data. Mol. Pharm. 13, 2524–2530 (2016).
  12. Wallach, I., Dzamba, M. & Heifets, A. AtomNet: A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery. 1–11 (2015). doi:10.1007/s10618-010-0175-9
  13. Altae-Tran, H., Ramsundar, B., Pappu, A. S. & Pande, V. Low Data Drug Discovery with One-Shot Learning. ACS Cent. Sci. 3, 283–293 (2017).
  14. Kaggle competition-Intel & MobileODT Cervical Cancer Screening. Intel & MobileODT Cervical Cancer Screening. Which cancer treatment will be most effective? (2017).
  15. Intel and MobileODT* Competition on Kaggle*. Faster Convolutional Neural Network Models Improve the Screening of Cervical Cancer. December 22 (2017).
  16. Intel and MobileODT* Competition on Kaggle*. Deep Learning Improves Cervical Cancer Accuracy by 81%, using Intel Technology. December 22 (2017).
  17. Turkki, R., Linder, N., Kovanen, P., Pellinen, T. & Lundin, J. Antibody-supervised deep learning for quantification of tumor-infiltrating immune cells in hematoxylin and eosin stained breast cancer samples. J. Pathol. Inform. 7, 38 (2016).
  18. Hyung, W. J. et al. Superior prognosis prediction performance of deep learning for gastric cancer compared to Yonsei prognosis prediction model using Cox regression. J Clin Oncol 35, abstract 164 (2017).
  19. Jäger, P. F. et al. Revealing hidden potentials of the q-space signal in breast cancer. Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics) 10433 LNCS, 664–671 (2017).
  20. Zhang, Q. et al. Sonoelastomics for Breast Tumor Classification: A Radiomics Approach with Clustering-Based Feature Selection on Sonoelastography. Ultrasound Med. Biol. 43, 1058–1069 (2017).
  21. Kaggle: Breast Cancer Diagnosis Wisconsin. Breast Cancer Wisconsin (Diagnostic) Data Set: Predict whether the cancer is benign or malignant.
  22. Xu, M. et al. A deep convolutional neural network for classification of red blood cells in sickle cell anemia. PLoS Comput. Biol. 13, 1–27 (2017).
  23. Bengio, Y. Deep learning of representations: Looking forward. Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics) 7978 LNAI, 1–37 (2013).
  24. Kontzer, T. Deep Learning Drops Error Rate for Breast Cancer Diagnoses by 85%. September 19 (2016). 
  25. Litjens, G. et al. Deep learning as a tool for increased accuracy and efficiency of histopathological diagnosis. Sci. Rep. 6, (2016).
  26. Litjens, G. et al. A survey on deep learning in medical image analysis. Med. Image Anal. 42, 60–88 (2017).
For more complete information about compiler optimizations, see our Optimization Notice.