Overview of the Intel Optimized HPCG
The Intel® Optimized High Performance Conjugate Gradient Benchmark (Intel® Optimized HPCG) provides an implementation of the HPCG benchmark (http://hpcgbenchmark.org) optimized for Intel® Xeon® processors and Intel® Xeon Phi™ processors with Intel® Advanced Vector Extensions (Intel® AVX), Intel® Advanced Vector Extensions 2 (Intel® AVX2), Intel® Advanced Vector Extensions 512 (Intel® AVX512) support. The HPCG Benchmark is intended to complement the High Performance LINPACK benchmark used in the TOP500 (http://www.top500.org) system ranking by providing a metric that better aligns with a broader set of important cluster applications.
The HPCG benchmark implementation is based on a 3dimensional (3D) regular 27point discretization of an elliptic partial differential equation. The implementation calls a 3D domain to fill a 3D virtual process grid for all the available MPI ranks. HPCG uses the preconditioned conjugate gradient method (CG) to solve the intermediate systems of equations and incorporates a local and symmetric GaussSeidel preconditioning step that requires a triangular forward solve and a backward solve. A synthetic multigrid Vcycle is used on each preconditioning step to make the benchmark better fit realworld applications. HPCG implements matrix multiplication locally, with an initial halo exchange between neighboring processes. The benchmark exhibits irregular accesses to memory and finegrain recursive computations that dominate many scientific workloads (for details, see
http://www.sandia.gov/~maherou/docs/HPCGBenchmark.pdf).
The Intel® Optimized HPCG contains source code of the HPCG v3.0 reference implementation with necessary modifications to include:
 Intel® architecture optimizations
 Prebuilt benchmark executables that link to
 Inspectorexecutor Sparse BLAS kernels for sparse matrixvector multiplication (SpMV)
 Sparse triangular solve (TRSV)
 Symmetric GaussSeidel smoother (SYMGS)
The Inspectorexecutor Sparse BLAS kernels SpMV, TRSV, and SYMGS are implemented using an inspectorexecutor model. The inspection step chooses the best algorithm for the input matrix and converts the matrix to a special internal representation to achieve high performance at the execution step.
Intel® oneAPI
Math Kernel LibraryProduct and Performance Information


Performance varies by use, configuration and other factors. Learn more at
www.Intel.com/PerformanceIndex.
Notice revision #20201201
