Intel® Parallel Computing Center at The Innovative Computing Laboratory The University of Tennessee

Principal Investigator:

Jack Dongarra
University Distinguished Professor

Jack Dongarra received his Ph.D. in Applied Mathematics from the University of New Mexico in 1980. He is a University Distinguished Professor at UTK, Distinguished Research Staff at ORNL, Turing Fellow at Manchester University, Adjunct Professor at Rice University, and director of the ICL at UTK. He specializes in numerical algorithms in linear algebra, parallel computing, use of advanced-computer architectures, programming methodology, and tools for parallel computers. His research includes the development, testing and documentation of high-quality mathematical software. He has contributed to the following: EISPACK, LINPACK, the BLAS, LAPACK, ScaLAPACK, Netlib, PVM, MPI, NetSolve, Top500, ATLAS, and PAPI.

Description:

The objective of the ICPP at UTK is the development and optimization of numerical linear algebra libraries and technologies for applications, while tackling current challenges in heterogeneous Intel® Xeon Phi™ coprocessor-based High Performance Computing. The developments will be disseminated through the MAGMA MIC library, designed as a replacement of the popular LAPACK for heterogeneous systems with Intel Xeon Phi Coprocessors.

Over the first year we developed the main dense linear algebra routines to solve dense linear systems and eigenvalue problems on heterogeneous Intel Xeon Phi coprocessor-based platforms. The developments were disseminated through two major software releases. Further, we developed benchmarks and two APIs, and evaluated programming models for the Intel Xeon Phi coprocessor architectures. We taught a graduate Scientific Computing for Engineers class with the use of Intel Xeon Phi coprocessors, organized tutorials and gave presentations at HPC conferences like SC13, IPDPS14, ISC14, and VECPAR14.

Solving linear systems of equations and eigenvalue problems is fundamental to scientific computing. Our developments are thus likely to enable Intel Xeon Phi coprocessor architectures for high-performance computing by providing an effortless migration of LAPACK-relying existing scientific and engineering codes to coprocessor-accelerated architectures. The developments will further help to explore the full potential of the Intel Xeon Phi coprocessor architecture and related programming models on the scientific computing community.

Our plan for the next year includes the development of new new algorithms and software tools in four main research and software development trusts:

  • Dense linear algebra
    Algorithmic improvements and new methods will be developed, e.g., in the area of eigensolvers and SVD, we will develop two-stage reductions to tridiagonal and bidiogonal forms. These algorithms remove the memory-bound limitations of the LAPACK algorithms, and depending on hardware can be an order of magnitude faster. Another direction will be the development of batched linear algebra operations to provide support for various applications. Batched LU, QR, and Cholesky will be developed for the simultaneous factorization of many very small dense matrices. This will include the development of batched BLAS as needed in the solvers and basic applications.
  • Sparse linear algebra (SLA)
    While extremely important for applications, SLA is notorious for running only at a fraction of the peak of modern architectures. We will first develop a highly optimized MAGMA MIC Sparse package, including the standard CG, BiCGSTAB, GMRES, and preconditioned versions. Second, we will develop communication-avoiding algorithms that significantly exceed in performance the standard memory and latency bound algorithms. This will include new s-step methods like the CA-GMRES, and blocked eigensolvers like the LOBPCG.
  • Mixed-precision methods
    We will develop numerical algorithms that recognize and exploit the presence of mixed-precision mathematics. This will include mixed-precision iterative refinement solvers for dense problems and mixed-precision orthogonalization schemes with applications to sparse iterative linear system and eigenproblem solvers.
  • Benchmarks
    We will develop a set of benchmarks, including the newly proposed HPCG, and optimize them for Intel Xeon Phi coprocessor architectures. The benchmarks will show essential communication and computation patterns in various applications, with the goal to encourage the focus of both hardware and software developers on architecture features and application needs.

Related websites:

http://icl.cs.utk.edu/magma

Publications:

Porting the PLASMA Numerical Library to the OpenMP Standard

LU, QR, and Cholesky factorizations: Programming Model, Performance Analysis and Optimization Techniques for the Intel Knights Landing Xeon Phi

Accelerating Tensor Contractions for High-Order FEM on CPUs, GPUs, and KNLs

For more complete information about compiler optimizations, see our Optimization Notice.