Prof. Dr. Knut Reinert holds the chair of the Algorithms in Bioinformatics group in the institute of Bioinformatics. In addition, he is a Max Planck fellow at the Max Planck institute for Molecular Genetics. He and his team focus on the development of novel algorithms and data structures for problems in the analysis of biomedical mass data. Previously, Knut was at Celera Genomics, where he worked on bioinformatics algorithms and software for the Human Genome Project, which assembled the very first human genome.
New technologies have reduced the cost of sequencing by many orders of magnitude in the last decade. In recent years, next generation sequencing (NGS) data have begun to appear in many applications that are clinically relevant, such as resequencing of cancer patients, disease-gene discovery and diagnostics for rare diseases, microbiome analyses, and gene expression profiling. The management-consulting firm McKinsey currently endorsed NGS as one of the most disruptive technologies that will transform life, business, and the global economy. Its prospective scope of economic impact is broad and possibly changing the biomedical field. It promises to transform how doctors diagnose and treat cancer and other diseases, possibly extending lives. With rapid sequencing and advanced computing power, scientists can systematically test how genetic variations can bring about specific traits and diseases, rather than using trial and error.
Unfortunately, lack of expertise or programming infrastructure often makes it impossible or very time-consuming to develop bioinformatics solutions meeting the growing demand. The analysis of sequencing data is demanding because of the enormous data volume and the need for fast turnaround time, accuracy, reproducibility, and data security. This requires a large variety of expertise: algorithm design, strong implementation skills for analyzing big data on standard hardware and accelerators, statistical knowledge, and specific domain knowledge for each medical problem. Consequentially the development of tools is often fragmented, mainly driven by academic groups and SMEs (Small and Medium Enterprises) with different levels of expertise in the required domains.
We aim to address this problem by enabling academic groups and SMEs to significantly accelerate their time to market for innovative technical solutions in medical diagnostics by providing the open source software development kit (SDK) that enables researchers and software engineers to build efficient, hardware- accelerated, and sustainable tools for the analysis of medical NGS data. In this proposal we will address specifically the tight integration of Intel Xeon® and Intel Xeon Phi™ processor families to provide fast, well-tested, algorithmic components for medical next generation sequence (NGS) analysis by extending the existing and well-established C++ library SeqAn. Using the library will enable academic groups and SMEs to develop and maintain their own hardware-accelerated, efficient tools for medical NGS analysis at an unprecedented time-scale. To achieve this we plan to fully integrate modern multicore hardware for core data structures such as string indices or pairwise sequence alignment algorithms. This will make modern hardware accelerators available to non-expert programmers. In addition we will add the combination of data parallelism with compute parallelism as a strategy to reduce the computational effort required to process many genomes at once in main memory. This will allow a seamless scale-up of tools that need to process the very large data volumes associated with a large number of individual genomes, a challenge clearly visible for medical applications.
- Knut Reinert, 11/22/2015, Topic: The SeqAn C++ library for efficient NGS sequence analysis - applications and HPC modernization using generic programming, HPC Dev Conf 2015
- Knut Reinert, 10/1/2015, Freie Universität Berlin has been selected as new Intel® Parallel Computing Center (Intel® PCC), Free University Berlin