Youdong (Jack) Mao
Instructor, Department of Cancer Immunology and AIDS, Dana-Farber Cancer Institute, Department of Microbiology and Immunobiology, Harvard Medical School
Dr. Mao received his Ph.D. in biophysics at Peking University in 2005 and completed his postdoctoral training at Dana-Farber Cancer Institute (DFCI) and Harvard Medical School (HMS). He joined the faculty at DFCI and HMS in 2012. Dr. Mao is a veteran in software engineering and high-performance computing (HPC). He has published tens of peer-reviewed papers in a number of prestigious journals across the disciplines of physics, chemistry, nanotechnology and biomedicine. He is leading a team to develop the next-generation HPC platform for structural biology.
The research at Intel® Parallel Computing Center (IPCC) at Dana-Farber Cancer Institute (DFCI) is dedicated to develop a cutting-edge solution for the next-generation HPC platform for structural biology, based on Intel® Many Integrated Coprocessor Architecture. The center focuses on a grand computational challenge in modern life sciences, i.e., visualizing biological molecules in action at atomic resolution by single-molecule electron microscopy and related nanotechnology. The research team in the IPCC at DFCI capitalizes on a wide spectrum of expertise, from software engineering, HPC, biophysicis, to molecular biology and immunology, a comprehensive blend of computer engineering and life sciences. We seek to capitalize on the tremendous potential of Intel’s coprocessor architecture, as well as heterogeneous parallel computing, to advance our capability to process a rapidly increasing volume of electron microscopy data that “encrypts” the fundamental structural “codes” of life.
Understanding the structure-function relationship of biological macromolecules represents a central focus common to much molecular biomedicine research in contemporary life sciences. Among the technologies available for biological structure analysis, cryo-electron microscopy (cryo-EM) is emerging as a promising tool to visualize the three-dimensional (3D) structures of single biomolecules in their native functional states. However, because biomolecules are highly sensitive to radiation damage by the electron beam, the molecular images have to be taken at a low dose that gives rise to an extremely high degree of noise in the formation of the image. This situation leads to one of the most critical challenges facing computational approaches to cryo-EM reconstruction of biomolecules; namely, the extraction of signal from heavy noise. This involves the analysis of a large number of very noisy images that allows one to reconstruct the entire structure of the molecule up to atomic resolution through averaging and statistical techniques. Such a procedure is highly data-intensive and computationally demanding; the computing cost increases dramatically with the increase of resolution and structural diversity, or the decrease of the signal-to-noise (SNR) ratio.
Instead of simply migrating the existing cryo-EM software codes, the research at IPCC at DFCI aims to build a coherent software-hardware system that implements novel machine-learning methods for massively parallel cryo-EM data processing; the particular focus of these efforts will be weak signal extraction, 3D reconstruction and verification at the single-molecule level, taking full advantage of Intel MIC Architecture in a supercomputing environment. So far, there is no software package in structural biology optimized for the Intel® Xeon® processors and Intel® Xeon Phi™ coprocessors, representing a significant disadvantage. We anticipate that the system will become the first of its kind in computational tools for structural biology, being able to process a huge amount of highly noisy image data for structure determination in a heretofore unachievable manner. The software package developed in IPCC@DFCI will be released as free open-source software under GPL to the scientific and industrial community. Further development along this avenue will complete a new generation of Intel Xeon coprocessors supercomputing platform for ultra-high-resolution reconstruction of single biomolecules in their native states. Such a platform may emerge as a general resource for future parallel computing applications in structural biology and molecular medicine.