David Keyes is a founding professor of Applied Mathematics and Computational Science at KAUST, where he focuses on high performance implementations of implicit methods for PDEs. He received a BSE from Princeton and a PhD from Harvard. He has held faculty positions at Yale, Old Dominion, and Columbia Universities and research positions at NASA and DOE laboratories, and has led the scalable solvers initiative of the DOE SciDAC program. He is a Fellow of AMS and SIAM, and recipient of the IEEE Sidney Fernbach Award, the ACM Gordon Bell Prize, and the SIAM Prize for Distinguished Service to the Profession.
Hatem Ltaief is a Senior Research Scientist in the Extreme Computing Research Center at KAUST, where he directs the KBLAS software project for dense and sparse linear algebraic operations on emerging architectures. He received an MS in computational science from the University of Lyon and an MS in applied mathematics and a PhD in computer science from the University of Houston. He has been a Research Scientist at the Innovative Computing Laboratory of the University of Tennessee and a Computational Scientist in the KAUST Supercomputing Laboratory. He is a member of the European Exascale Software Initiative (EESI2).
Rio Yokota is an associate professor in the Global Scientific Information and Computing Center at the Tokyo Institute of Technology and a consultant at KAUST, where he researches fast multipole methods, their implementation on emerging architectures, and their applications in PDEs, BEMs, molecular dynamics, and particle methods. He received his undergraduate and doctoral degrees in Mechanical Engineering from Keio University, and held postdoctoral appointments at the University of Bristol and Boston University and a Research Scientist appointment at KAUST. He is a recipient of the ACM Gordon Bell Prize.
The Intel® Parallel Computing Center (s) (Intel® PCC) at King Abdullah University of Science and Technology (KAUST) aims to provide scalable software kernels common to scientific simulation codes that will adapt well to future architectures, including a scheduled upgrade of KAUST’s globally Top10 Intel-based Cray XC40 system. In the spirit of co-design, Intel® PCC at KAUST will also provide feedback that could influence architectural design trade-offs. The Intel® PCC at KAUST is hosted in the KAUST’s Extreme Computing Research Center (ECRC), directed by co-PI Keyes, which aims to smooth the architectural transition of KAUST’s simulation-intensive science and engineering code base. Rather than taking a specific application code and optimizing it, the ECRC adopts the strategy of optimizing algorithmic kernels that are shared among many application codes, and of providing the results in open source libraries. Chief among such kernels are Poisson solvers and dense symmetric generalized eigensolvers.
We focus on optimizing two types of scalable hierarchical algorithms – fast multipole methods (FMM) and hierarchical matrices – on next generation Intel® Xeon® processors and Intel® Xeon Phi™ coprocessors. These algorithms have the potential to replace workhorse kernels of molecular dynamics codes (drug/material design), sparse matrix preconditioners (structural / fluid dynamics), and covariance matrix calculations (statistics / big data). Co-PI Yokota is the architect of the open source fast multipole library ExaFMM, which attempts to integrate best solutions offered by FMM algorithms, including the ability to control expansion order and octtree decomposition strategy independently to create the fastest inverter to meet a given accuracy requirement for solver or a preconditioner on manycore and heterogenous architectures. Co-PI Ltaief is the architect of the KBLAS library, which promotes the directed acyclic graph-based dataflow execution model to create NUMA-aware work-stealing tile algorithms of high concurrency, with innermost SIMD structure well suited to floating point accelerators. The overall software framework of this Intel® PCC at KAUST, Hierarchical Computations on Manycore Architectures (HiCMA), is built upon these linear solvers and the philosophy that dense blocks of low rank should often be replaced with hierarchical matrices as they arise. Hierarchical matrices are natural algebraic generalizations of fast multipole, and are implementable in data structures similar to those that have made FMM successful on distributed nodes of shared memory cores.
FMM and hierarchical matrix algorithms share a rare combination of O(N) arithmetic complexity and high arithmetic intensity (flops / Byte). This is in contrast to traditional algorithms that have either low arithmetic complexity with low arithmetic intensity (FFT, sparse linear algebra, and stencil application), or high arithmetic intensity with high arithmetic complexity (dense linear algebra, direct N-body summation). In short, FMM and hierarchical matrices are efficient algorithms that will remain compute-bound on future architectures. Furthermore, these methods have a communication complexity of O(log P) for P processors, and permit high asynchronicity in their communication. Therefore, they are amenable to asynchronous programming models that are gaining popularity as architectures approach the exascale.
- May 10, 2018, Exploiting Data Sparsity for Large-Scale Matrix Computations, IPCC Asia Summit 2018
- April 23, 2018, Massively Parallel Polar Decomposition on Distributed-Memory Systems, IXPUG Workshop at KAUST
- April 23, 2018, Cholesky Factorization on Tile Low-Rank Matrices for Distributed-Memory Systems, IXPUG Workshop at KAUST
- April 23, 2018, Asynchronous Task-Based Parallelization of Iterative Algebraic Solvers, IXPUG Workshop at KAUST
- April 24, 2018, Unstructured Computations on Intel Xeon and Xeon Phi Architecture, IXPUG Workshop at KAUST
- April 23, 2018, STARS-H: a High-Performance H-Matrix Market Library for Large-Scale Systems, IXPUG Workshop at KAUST
- April 23, 2018, Application to big ensembles data assimilation and forecasting in the Red Sea Circulation, IXPUG Workshop at KAUST
- April 24, 2018, BEMFMM: An FMM-Accelerated Boundary Element Method-Based Solver for the 3D Helmholtz Equation, IXPUG Workshop at KAUST
- April 24, 2018, ALTANAL Abstraction Layer for Task bAsed NumericAl Libraries, IXPUG Workshop at KAUST
- April 23, 2018, Large scale rigid body dynamics simulation on HPCs with distributed memory architecture, IXPUG Workshop at KAUST
- April 23, 2018, Epileptic Seizure Prediction using Rotation Forest in a Parallel Environment, IXPUG Workshop at KAUST
- April 23, 2018, Parallel Simulation of Blood Flows in 3D Patient-specific Arteries, IXPUG Workshop at KAUST
- April 23, 2018, Real-Time Massively Distributed Multi-Object Adaptive Optics Simulations for the European Extremely Large Telescop, IXPUG Workshop at KAUST
- April 23, 2018, Tile Low-Rank Approximation of Maximum LargeScale Likelihood Estimation on Manycore Architectures, IXPUG Workshop at KAUST
- April 23, 2018, HPC and Big Data Convergence, IXPUG Workshop at KAUST
- April 24, 2018, Screen-Space Normal Distribution Function Caching for Consistent Multi-Resolution Rendering of Large Particle Data, IXPUG Workshop at KAUST
- April 24, 2018, Optimization of finite-difference kernels on multi-core architectures for seismic applications, IXPUG Workshop at KAUST
- Ali Charara, David Keyes, Hatem Ltaief, June 5, 2017, A framework for dense triangular matrix kernels on various manycore architectures, Wiley Online Library
- Kadir Akbudak, Hatem Ltaief, Aleksandr Mikhalev, and David Keyes, June 8, 2017, Tile Low Rank Cholesky Factorization for Climate-Weather Modeling Applications on Manycore Architectures, Springer and ISC 2017
- Mustafa Abduljabbar, George S, Markomanolis, Huda Ibeid, Rio Yokota, David Keyes, May 12, 2017, Communication Reducing Algorithms for Distributed Hierarchical N-Body Problems with Boundary Distributions, Springer and ISC 2017
- Dalal Sukkari, Hatem Ltaief, Mathieu Faverge, David Keyes, September 11, 2017, Asynchronous Task-Based Polar Decomposition on Single Node Manycore Architectures, IEEE TPDS and HGPU
- Maxwell Hutchinson, Alexander Heinecke, Hans Pabst, Greg Henry, Matteo Parsani, David Keyes, June15, 2016, Efficiency of High Order Spectral Element Methods on Petascale Architectures, Springer
- Amani Alonazi, George S. Markomanolis, David Keyes, June 26, 2017, Asynchronous Task-Based Parallelization of Algebraic Multigrid, ACM
- Tareq M. Malas, July 21, 2016, Optimization of an Electromagnetics Code with Multicore Wavefront Diamond Blocking and Multi-dimensional Intra-Tile Parallelization, IEEE
- Gustavo Chavez, George Turkiyyah, David E. Keyes, March 18, 2017, A Direct Elliptic Solver Based on Hierarchically Low-Rank Schur Complements, Springer
Product and Performance Information
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.