Optimization of Profrager, a protein structure and function prediction tool

  • Overview
  • Resources

Speakers: Silvio Luiz Stanzani amd Rogério Luiz Iope, São Paulo State University - Center for Scientific Computing

Current trends on the design of parallel computing architectures are towards increasing the computational power of multi-core processor servers by aggregating many-core coprocessors or accelerators. Such hybrid architectures have the potential to speed up and improve the throughput of applications. The challenge is how to efficiently use all the processing power offered by such heterogeneous resources. Protein Structure Prediction (PSP) is one of the most important topics in the field of bioinformatics, and several important applications in medicine (such as drug design) and biotechnology (such as the design of novel enzymes) are based on PSP methods. Profrager is a fragment library generation tool developed at the Brazilian National Laboratory for Scientific Computing (LNCC) that aims to improve the performance of PSP. The fragment libraries generated are used to minimize the PSP search space. The execution of Profrager can be computationally intensive. In this sense, we will report an optimization of Profrager to improve scalability and throughput in a hybrid parallel computing architecture, composed of Intel® Xeon and Xeon Phi™. In this context, we will show the feasibility of using Intel Advisor to estimate the speedup of loops parallelized with OpenMP. We also present a simple parallelization strategy using MPI/OpenMP model to balance the load between Xeon and Xeon Phi™. We carried out two comparisons between the obtained optimized version and the original serial version of Profrager. In the first comparison, we evaluated the performance improvements, so we executed a single Profrager experiment and the results showed that the optimized version achieved a speedup of 8 to 12 times. In the second comparison, we evaluated the throughput improvements, so we executed a set of Profrager experiments simultaneously, and in this case we achieved an increase of nearly 2 times when compared to several executions of the corresponding serial version.