Ben Langmead is an Assistant Professor in the Department of Computer Science, Whiting School of Engineering, Johns Hopkins University. He earned his Ph.D. in Computer Science from the University of Maryland in 2012. His group seeks to make high-throughput biological datasets easy for biomedical researchers to use by applying ideas from sequence alignment, text indexing, statistics and parallel programming. He has released several high-impact software tools (e.g. Bowtie, Bowtie 2) and his paper describing Bowtie won the Genome Biology award for outstanding paper in 2009, and he is the recipient of an NSF CAREER award and a Sloan Research Fellowship.
Biology and medicine are increasingly fueled by data, especially data from DNA sequencers. Since the end of the Human Genome Project, speed and cost of DNA sequencing have improved very rapidly, far faster than Moore’s Law. Sequencers are now common tools in biology labs and, increasingly, in hospitals and other medical settings. Data from DNA sequencers can help scientists to address crucial questions in both basic science (e.g. how do genes collaborate to drive biological processes?) and clinical practice (e.g. how should we treat this particular patient’s cancer?). It will not be long before every individual’s genome will be sequenced and used to personalize their medical care. That said, analyzing sequencing data requires vast computational effort. Raw sequencing data is fragmentary, like pieces of a puzzle, and the important process of assembling the pieces into larger pictures (genes and chromosomes) requires sophisticated algorithms that make the best possible use of the of computer processors and clusters.
In the past, our Center has built and released some of the most widely used software tools for analyzing DNA sequencing data. These tools have been downloaded hundreds of thousands of times and used in thousands of projects across the globe. As an Intel® Parallel Computing Center (s) (Intel® PCC), we will be working on several fronts to make these tools work as efficiently as possible on both current-day and future Intel® architecture, including Intel® Xeon® and Intel® Xeon Phi™. Our goal is to enable the larger research community make the best possible use of modern, many-core architectures and, ultimately, to make large DNA sequencing datasets as usable as possible to researchers working at the frontiers of biology and medicine.
- Ben Langmead, Christopher Wilks, Valentin Antonescu, Rone Charles, July 18, 2018, Scaling read aligners to hundreds of threads on general-purpose processors, Bioinformatics, White Paper