The same Intel® architecture-based hardware systems used in the lab for everyday computational tasks can be harnessed to perform deep learning research and neural network training to automate drug discovery.
“An enormous figure looms over scientists searching for new drugs: the estimated US$2.6 billion price tag of developing a treatment. A lot of that effectively goes down the drain, because it includes money spent on the nine out of ten candidate therapies that fail somewhere between Phase 1 trials and regulatory approval. Few people in the field doubt the need to do things differently.” 1
— Nic Fleming,
Nature: International Journal of Science
Inefficiencies in traditional techniques for identifying promising drug treatments have slowed the process of discovery and added substantially to research costs. New approaches are needed to accelerate discovery and reduce costs.
An innovative method for training the multiscale convolutional neural network (CNN) topology on a distributed CPU architecture gives researchers a valuable tool for discovering promising drugs. Within this domain, data generation and capture are highly automated, making it possible to implement scalable analytic solutions efficiently on the same computing hardware used in the lab for other computational tasks.
Kyle Ambert, a senior deep learning data scientist at Intel, has been on a quest for much of his career to discover and refine more effective solutions for performing life science analytics. While working on his PhD at Oregon Health & Science University, Kyle focused on developing machine learning systems for helping researchers in the neurosciences. One keen area of interest for him was natural language processing, which led him to address the challenge of building machines that can analyze and extract useful patterns from scientific literature.
“When I joined Intel, I was naturally drawn to the work that we were doing to solve computational problems in the life sciences. Two years ago, I joined our deep learning group and a main focus was on understanding how image classification systems can be optimized to run on Intel® architecture-based hardware platforms. One of my colleagues introduced me to our collaborator’s computational research team, who challenged my team to take a deep learning topology they already use and optimize it for running on their Intel® Xeon® processor-based cluster. The goal was to make it possible to process more images per day than they were currently able to do. At the time, I believe it was taking 11 hours for them to train their model. All told, our work led to a drastic improvement—our eight-machine [Intel] Xeon processor-based cluster trains in 31 minutes.”
A small collection of commonly-available datasets guides understanding in the artificial intelligence community around optimal image classification with deep learning, and images in these tend to be relatively small with respect to number of pixels,” Kyle said, “and, in terms of content, simple. One of the more frequently-used collections, for instance, contains 256 x 256 images belonging to one of thousands of possible categories. One image, for example, depicts an airplane, the next a dog, the next a car, and so on.”
Kyle noted that while image collections such as this facilitate training systems for carrying out many important tasks, the information obtained from these types of images doesn’t often translate well to pharmaceutical research, which primarily relies on image data acquired with microscopes.
Image capture devices in use in much of the pharmaceutical industry generally produce large images—often at a resolution of 1024 x 1280 or above—depicting complex results that are usually best understood by human annotators. “Rather than depicting a single object of interest,” Kyle said, “high-content images in this domain generally depict multiple cells of potentially differing phenotypes. Rather than simply identifying the presence or absence of a particular cell type, a given task may require identifying a certain number of cells or an interaction between two cells of different phenotypes. In my experience, these are the types of images common to the life sciences. A CT scan depicts a complex snapshot of the human body.
Figure 1. Biological images obtained from microscopy can be analyzed using deep learning techniques.
An MRI might show enlarged ventricles along with a brain tumor. Teaching a machine to understand biological images potentially requires re-evaluating what we understand about using deep learning methods for image classification.”
A recent Intel collaboration with a major pharmaceutical firm began in April 2017, focusing on the application of deep learning techniques to analyze high-content images. Optimization enhancements to the analytical process began in fall of the same year with plans to release the findings to the community in November 2018.
“Intel technology is everywhere and, because of that, it can sometimes open some doors for collaboration that would be otherwise difficult to move.”
— Kyle Ambert, senior deep learning data scientist, Intel
Intel® Xeon® Scalable processor technology proved extremely important to the collaborative research being conducted. The computational demands of working on hundreds or thousands of microscopy images—that often contain millions of pixels each—within a deep convolutional neural network model can require tremendous amounts of time. Using deep neural network acceleration techniques, the research team was able to process images in less time while simultaneously gaining improved insights in the image characteristics relevant to the learning process.
The team employed an eight-machine cluster composed of two-socket Intel® Xeon® 6148 processors (total of 40 cores per machine) running at 2.4 GHz with 192 GB of memory available for image processing (see 8 Node Cluster Configuration Details on the last page for more information).
This system enabled the team to handle over 120 3.9-megapixel images each second, using images from the Broad Bioimage Benchmark Collection* Q21 (BBBC-021) for training. The result was an improvement of more than 20 times in processing a dataset of 10,000 images.
Figure 2. Example of an image from the Broad Bioimage Benchmark Collection (640 x 640).
“The large memory capabilities of Intel Xeon Scalable processors enable us to train deep learning workloads with a memory footprint beyond what other technologies would be able to accommodate,” Kyle said. The system configuration developed by the team also featured a high- speed fabric interconnect—Intel® Omni-Path Host Fabric Interface (Intel® OP HFI)—and Intel® Solid State Drives. On the software side, Intel® Math Kernel Library for Deep Neural Networks (Intel® MKL-DNN), Intel® Advanced Vector Extensions 512 (Intel® AVX-512), and the TensorFlow* optimizations were all important to the results.
The performance optimization team on this project released a white paper in October 2018 titled Best Practices for Scaling Deep Learning Training and Inference with TensorFlow* On Intel® Xeon® Processor Based HPC Infrastructures. This paper describes the best-known methods for tackling challenges in this space, enabling developers to build on the discoveries made during this research project.
“Next,” Kyle said, “I’m really interested in examining deep learning-based methods for unsupervised classification workloads. I don’t think the current approach of using supervised machine learning is scalable to the diversity of problems and dynamics in real-time data.”
Intel engagements with leading organizations in the medical community generate insights into advances and help develop AI techniques that can be applied to a broad spectrum of applications. Kyle noted, “We directly engage with our target industry for this very reason. The workload we studied for this project is common to the drug discovery process used by every company, so we imagine others will be interested in our results as well.”
“Besides addressing the industry problem in question,” Kyle continued, “we also contributed to the field’s understanding for how to scale out training on clusters of CPUs with large data.”
To validate the methodology in use, Kyle thinks that it is very important to be aware of the assumptions that go into using a statistical model or a particular machine learning library and to continually question why something is done a certain way. This process of maintaining awareness and re-evaluating the methods being employed during discovery can reveal hidden biases or flaws in the logic behind the operations.
A seminal paper on the image classification, A Multi-Scale Convolutional Neural Network for Phenotypic High-Content Cellular Images, by William Godinez and colleagues, traces the history and background of classifying images in drug discovery research, as well as describing the topology used in the Intel research.
To those interested in furthering their knowledge on the latest artificial intelligence advances and successes, ai.intel.com provides news of research breakthroughs, development guidelines, educational content, and programming libraries.
Figure 3. Artificial intelligence is reshaping the way we investigate human health issues and medicine.
“Unsupervised deep learning methods—that may be applied to unlabeled microscopy images— hold the promise of revealing novel insights for cellular biology and ultimately drug discovery. This will be the focus of continuing efforts in the future.” 2
- Intel Newsroom
TensorFlow, a framework for math numerical computations based on an open-source library, includes specific features for implementing large-scale machine learning processes. Originally released by Google in November 2015, TensorFlow initially performed slowly on CPU processor platforms. Following Intel optimizations for running TensorFlow on Intel® Xeon® processor-based platforms, substantial performance improvements have been realized. TensorFlow is well-suited to a range of AI applications, including image recognition, language recognition, and object detection and localization.
Python* is the primary interface for TensorFlow with support for NumPy. It gives developers a means to create dataflow graphs, detailing the ways in which data moves through the structure or a collection of nodes. Nodes correspond with individual mathematical operations, and the connections between nodes represents a mathematical data array, called a tensor. Python makes it possible to easily couple together the high-level abstractions being expressed. Tensors are composed as Python objects within TensorFlow and each TensorFlow application is essentially a Python application. Through working with abstractions in TensorFlow, the process of building machine learning implementation becomes much easier, allowing developers to focus on the logical constructs of a program without having to deal with lower-level algorithms or implementation details.
The TensorFlow machine learning framework simplifies the acquisition of data, training of models, and predictive operations. The structures used in TensorFlow are well-suited to CNN models. Intel offers guidance setting threading models for CNN implementations and performance guidance for using TensorFlow with Intel® MKL. The optimizations that Intel has created for TensorFlow give developers a performance boost when it comes to processor-intensive operations in machine learning and can significantly reduce learning times for training and inference operations.
Figure 4. Image recognition within a TensorFlow* structure.
Through the design of specialized chips and enhancements to existing architectures, research, educational outreach, and industry partnerships, Intel is accelerating the progress of AI to solve difficult challenges in medicine, manufacturing, agriculture, scientific research, robotics, and other industry sectors. Intel works closely with policymakers, educational institutions, and enterprises of all kinds to uncover and advance solutions that address major challenges in the sciences.
Detecting patterns that exist in large volumes of data is one of the key strengths of deep learning methodologies and this capability is drawing many startups into research efforts that focus on using AI to accelerate drug discovery. One example is Berg, a biotechnology company outside of Boston, Massachusetts that pioneered a technique for identifying cancer mechanisms, using an AI platform to generate and analyze massive volumes of patient data, narrowing in on the relevant characteristics that apply to diseased cells. The research team modeled diseased human cells, monitoring lipid, metabolite, enzyme, and protein profiles, while changing sugar and oxygen levels at the cellular level. Tests on over 1,000 human cell samples, some healthy and others cancerous, has opened pathways for identifying treatment methods based on the biological origins of disease.
Berg’s co-founder and chief executive, Niven Narain, said, “We are turning the drug-discovery paradigm upside down by using patient-driven biology and data to derive more- predictive hypotheses, rather than the traditional trial-and- error approach.”3
“From autonomous cars that will save thousands of lives, to data analytics programs that may finally discover a cure for cancer, to machines that give voice to those who can’t speak, AI will be known as one of the most revolutionary innovations of mankind.”4
— Naveen Rao, corporate vice president and general manager,
Artificial Intelligence Products Group, Intel
The Intel® AI portfolio includes:
Framework Optimization: Achieve faster training of deep neural networks on a robust scalable infrastructure.
Intel® Xeon® Scalable processors: Tackle AI challenges with a compute architecture optimized for a broad range of AI workloads, including deep learning.
Intel® Movidius™ Neural Compute Stick: Provides deep learning prototyping at the network edge with always-on vision processing making it ideal for use in smart security cameras, gesture controlled drones, industrial machine vision equipment, and more.
Intel® FPGA: Create specialized, custom functionality for a wide variety of electronic equipment, including AI-based solutions and monitoring devices, medical equipment, aircraft navigation devices, system accelerators, and more.
Reinforcement Learning Coach: Provides an open source research framework for training and evaluating RL agents by harnessing the power of multicore CPU processing to achieve state-of-the-art results.
Intel® Distribution of OpenVINO™ toolkit: Make your vision a reality on Intel® platforms—from smart cameras and video surveillance to robotics, transportation, and more.
Intel® Distribution for Python*: Supercharge applications and speed up core computational packages with this performance-oriented distribution.
Intel® Data Analytics Acceleration Library (Intel® DAAL): Boost machine learning and data analytics performance with this easy-to-use library.
Intel® Math Kernel Library (Intel® MKL): Accelerate math processing routines, increase application performance, and reduce development time.
For more information, visit the portfolio page.
|Compute Nodes||2 sockets Intel® Xeon® Gold 6148 CPU with 20 cores each @ 2.4GHz for a total of 40 cores per node.
2 Threads per core.
L3 Cache: 27.3MB, 192GB of DDR4,
Intel® Omni-Path Host Fabric Interface, dualrail,
Software: OpenMPI library 3.0.0,
100Gbps Intel® Omni-Path Host Fabric Interface (Intel® OP HFI),
480GB Intel® SSD OS drive, 1.6TB Intel® SSD data drive,
CentOS* Linux 7.3,
|Top of the rack Switch||48-port Intel® Omni-Path Edge Switch (Intel® OP Edge Switch) 100 series|
|TensorFlow*||Intel® Optimization for TensorFlow* version 1.7.0 https://github.com/tensorflow/tensorflow/tree/v1.7.0|
|Model||As defined by Godinez et al, A multi-scale convolutional neural network for phenotyping high-content cellular images. Bioinformatics, 2017|
|Performance measure with||OMP_NUM_THREADS=10 mpirun -np 32 -cpus-per-proc 10 --map-by socket -hostfile HOSTFILE --report-bindings --oversubscribe -x LD_LIBRARY_PATH -x PATH -x OMP_NUM_THREADS-x HOROVOD_FUSION_THRESHOLD numactl -l python tf_cnn_benchmarks.py --model=mcnn --batch_size=8 -- data_format=NCHW --data_dir=INPUT_DATA_DIR --data_name=mcnn--num_intra_threads=10 --num_inter_threads=2 --num_batches=2000 -- num_warmup_batches=70 --display_every=5 --momentum=0.9 --weight_decay=0.00005 --optimizer=momentum --resize_method=bilinear --distortions=False --sync_on_finish=True --device=cpu --mkl=True --kmp_affinity==”granularity=fine,compact,1,0” --variable_update=horovod --local_parameter_ device=cpu --kmp_blocktime=1 --horovod_device=cpu --piecewise_learning_rate_schedule=’0.008;2;0.032;5;0.029;10;0.026;15;0.001;20;0.0001’ -- train_dir=TRAIN_DATAWRITE_DIR --save_ summaries_steps=1 --summary_verbosity=1|
Intel® AI DevCloud – free cloud compute for Intel AI Academy members
Advancing Data-Driven Healthcare Solutions – Intel Press Kit
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804