GROMACS for Intel® Xeon Phi™ Coprocessor

 

Purpose

This code recipe describes how to get, build, and use the GROMACS* code with support for the Intel® Xeon Phi™ coprocessor with Intel® Many-Integrated Core (MIC) architecture.

Introduction

GROMACS is a versatile package to perform molecular dynamics, using Newtonian equations of motion, for systems with hundreds to millions of particles. GROMACS is primarily designed for biochemical molecules, like proteins, lipids and nucleic acids that have a multitude of complicated bonded interactions. But, since GROMACS is extremely fast at calculating the non-bonded interactions typically dominating simulations, many researchers use it for research on non-biological systems, such as polymers.

GROMACS supports all the usual algorithms expected from a modern molecular dynamics implementation.

The GROMACS code is maintained by developers around the world. The code is available under the GNU General Public License from www.gromacs.org.

Code Support for Intel® Xeon Phi™ coprocessor

GROMACS 5.0-RC1 has been released with Intel Xeon Phi coprocessor native/symmetric support. The code is currently available at http://www.gromacs.org/Downloads, or via ftp at ftp://ftp.gromacs.org/pub/gromacs/gromacs-5.0-rc1.tar.gz.

Only source code is available; configure and build it using the instructions provided below.

In GROMACS 5.0, the code was restructured with a platform-independent SIMD layer, which simplifies moving to a new instruction set architecture by redefining macros in localized header files. Version 5.0 with Intel Xeon Phi coprocessor support provides a 16-way neighbor list for enabling 512-bit vector registers (KNC/KNL), support of FMA (fused multiply-add) instructions and mask registers, and heavy optimizations for Intel Xeon Phi coprocessor native computations. These optimizations including the following:

  • 16-wide SIMD non-bonded computations optimized with MIC-intrinsics
  • 4-wide SIMD intrinsics in PME (Partical Mesh Ewald method)
  • Improvements to force reduction over OpenMP* threads
  • Resolved issues of Intel® Math Kernel Library enabling for FFT, BLAS, LAPACK

Version 5.0 code performance scales well on the host node's Intel® Xeon® processor. However, coprocessor native/symmetric support has known scaling challenges beyond two nodes, and Intel is diligently working to resolve them. Additionally, an offload version is in development, which will asynchronously move Particle-Particle computations to the Intel Xeon Phi coprocessor.

Code Access

This version of GROMACS code supports both message passing and threading programming models of the Intel Xeon processor (referred to as 'host' in this document) with the Intel Xeon Phi coprocessor (referred to as 'coprocessor' in this document) in both a single node and a cluster environment.

To get access to the code and test workloads:

  1. Go to the downloads page: http://www.gromacs.org/Downloads
  2. Download the gromacs-5.0-rc1.tar.gz code.

Build Directions

Configure cmake by adding these essential parameters:

DGMX_FFT_LIBRARY=mkl       #enable MKL for FFT
DGMX_MPI=ON          #enable MPI
DGMX_OPENMP=ON       #enable OpenMP
DCMAKE_EXE_LINKER_FLAGS="-L$ZLIB_DIR/lib64"  #path to MIC zlib
DCMAKE_C_FLAGS="-O3 –mmic -I$ZLIB_DIR/include"  #C-compiler flags should include –mmic ##   and zlib include dir
DCMAKE_CXX_FLAGS="-O3 –mmic -I$ZLIB_DIR/include" #the same for C++ compiler
DGMX_SKIP_DEFAULT_CFLAGS=ON #omit default compiler flags
make -j 12

Full cmake configuration for Intel Xeon Phi coprocessor

cmake .. \
DBUILD_SHARED_LIBS=OFF \
DGMX_PREFER_STATIC_LIBS=ON \
DGMX_BUILD_MDRUN_ONLY=ON \
DGMX_FFT_LIBRARY=mkl \
DCMAKE_INSTALL_PREFIX=$GROMACS_INSTALL_DIR \
DGMX_MPI=ON -DGMX_OPENMP=ON \
DGMX_GPU=OFF \
DGMX_XML=OFF \
DGMX_SOFTWARE_INVSQRT=OFF \
DGMX_SKIP_DEFAULT_CFLAGS=ON \
DCMAKE_EXE_LINKER_FLAGS="-L$ZLIB_DIR/lib64 -mkl=sequential" \
DCMAKE_C_COMPILER=mpiicc \
DCMAKE_C_FLAGS="-std=gnu99 -O3 -mmic -vec-report1 -fno-alias -ip -funroll-all-loops -fimf-domain-exclusion=15 -g -DNDEBUG -I$ZLIB_DIR/include" \
DCMAKE_CXX_COMPILER=mpiicpc \
DCMAKE_CXX_FLAGS="-std=c++0x -O3 -mmic -vec-report1 -fno-alias -ip -funroll-all-loops -fimf-domain-exclusion=15 -g -DNDEBUG -I$ZLIB_DIR/include"

Running Workloads on Intel Xeon Phi coprocessor Only

To run the workload on the Intel Xeon Phi coprocessor only, do the following:

  1. Source the Intel® compiler, so libraries can be found.

    export LD_LIBRARY_PATH=$MIC_LD_LIBRARY_PATH:$LD_LIBRARY_PATH

  2. Set up extra environment variables for run time.

    export I_MPI_MIC=1
    export I_MPI_PIN_MODE=pm
    export I_MPI_PIN_DOMAIN=omp
    export KMP_AFFINITY=verbose,compact,0
    export OMP_NUM_THREADS=$NTHREADS

  3. Create an appropriate machinefile list for MPI, such as:

    <nodename>-mic<device_id>:<MIC_PPN>
    nodename – name of host, check by `uname-n`
    device_id – 0,1,etc, depending on MIC device used
    MIC_PPN – number of MPI processes executed on MIC card

Running Workloads on the Host Processor and Coprocessor


To run workloads on both the host's Intel Xeon processor and the Intel Xeon Phi coprocessor, do the following:

  1. Source the Intel® compiler, so libraries can be found.

    export LD_LIBRARY_PATH=$MIC_LD_LIBRARY_PATH:$LD_LIBRARY_PATH

  2. Set up extra environment variables for run time.

    export I_MPI_MIC=1
    export I_MPI_PIN_MODE=pm
    export MIC_OMP_NUM_THREADS=$NTHREADS #num of OMP threads on MIC
    export IVB_OMP_NUM_THREADS=1 #num of OMP threads on Host
    export I_MPI_PIN_MODE=pm
    export MIC_KMP_AFFINITY=verbose,compact,0 #KMP_AFFINITY for MIC threads
    export IVB_KMP_AFFINITY=verbose,compact,1 #KMP_AFFINITY for Host threads

  3. Create the appropriate machinefile list for MPI, such as:

    <nodename>:<HOST_PPN>
    <nodename>-mic<device_id>:<MIC_PPN>

    Where
    nodename = name of host, check by `uname-n`
    HOST_PPN = number of MPI processes executed on HOST
    device_id = 0, 1, etc. depending on MIC device used
    MIC_PPN = number of MPI processes executed on the Intel Xeon Phi coprocessor

  4. Mpiexec executes the wrapper script mdrun.sh, which runs an Intel Xeon processor binary on the host and an MIC binary on the Intel Xeon Phi coprocessor:

    if [[ `uname -n | grep "\-mic"` ]] ; then
    export OMP_NUM_THREADS=${MIC_OMP_NUM_THREADS}
    export KMP_AFFINITY=${MIC_KMP_AFFINITY}
    $BIN_DIR/mdrun_mpi.MIC $@
    else
    export OMP_NUM_THREADS=${IVB_OMP_NUM_THREADS}
    export KMP_AFFINITY=${IVB_KMP_AFFINITY}
    $BIN_DIR/mdrun_mpi.IVB $@
    fi

Optimizing performance differs for each workload. The KMP_AFFINITY variable allows you to easily adjust affinity for MPI ranks and the number of threads per rank on the Intel Xeon Phi coprocessor. Experimenting with the values for KMP_AFFINITY allow you to get best performance for a given workload.

Performance Testing Results1,2

The following graph shows the results achieved from the GROMACS code using the hardware and software configurations shown below. Up to 1.8x performance speedup can be achieved on the RF workload with symmetric mode, using both processors and coprocessors together. To achieve these results, the engineers used these affinity settings:

  • 2 CPU : 24 MPI x 1 OMP
  • 2 CPU + 1 Coprocessor : 24 MPI x 1 OMP + 30 MPI x 8 OMP
  • 2 CPU + 2 Coprocessor : 24 MPI x 1 OMP + 30 MPI x 8 OMP + 30 MPI x 8 OMP

Testing Platform Configurations

The following hardware was used for the above recipe and performance testing.3

Server Configuration:

  • 2-socket/24 cores:
  • Processor: Intel® Xeon® processor E5-2697 V2 @ 2.70GHz (12 cores) with Intel® Hyper-Threading Technology
  • Operating System: Red Hat Enterprise Linux* 2.6.32-358.el6.x86_64 #1 SMP Tue Jan 29 11:47:41 EST 2013 x86_64 x86_64 x86_64 GNU/Linux
  • Memory: 64GB
  • Coprocessor: 2X Intel® Xeon® Phi™ Coprocessor (Board SKU "C0-7120P/7120X/7120"): 61 cores @ 1.238 GHz, 4-way Intel Hyper-Threading Technology4, Memory: 15872 MB
  • Intel® Many-core Platform Software Stack Version 2.1.6720-21
  • Intel® C++ Compiler Version 14.0.1.106

GROMACS

  • FFT: Intel® Math Kernel Library
  • Configuration parameters were modified to achieve optimal performance

DISCLAIMERS:

INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.

A "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death. SHOULD YOU PURCHASE OR USE INTEL'S PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS' FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS.

Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined". Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information.

The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.

Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order.

Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or go to: http://www.intel.com/design/literature.htm

1. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.

2. Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel.

Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804

3. For more information go to http://www.intel.com/performance

4. Available on select Intel® processors. Requires an Intel® HT Technology-enabled system. Consult your PC manufacturer. Performance will vary depending on the specific hardware and software used. For more information including details on which processors support HT Technology, visit http://www.intel.com/info/hyperthreading.

Intel, the Intel logo, Xeon and Xeon Phi are trademarks of Intel Corporation in the US and/or other countries.

*Other names and brands may be claimed as the property of others.

Copyright © 2014 Intel Corporation. All rights reserved.

For more complete information about compiler optimizations, see our Optimization Notice.

12 comments

Top
Attila F.'s picture

 

Aman - I got the same error. Do you have any idea what's the reason of this message? 

Aman S.'s picture

Sir ,

I have succesfully compiled the gromacs . But while i try to run it is always givving an error

-bash: /home/aman/GROMACS/MIC/gromacs-5.1.2/install-mic/bin/gmx_mpi: cannot execute binary file

please help me out.

Ferenc B.'s picture

 

Haneol C. said on Thu, 02/12/2015 - 06:30

Furthermore, I think, for co working between host CPU and MIC, executable files for CPU architecture are also needed. Is that right? I guess, this procedure doesn't generate that kind of files. Then, do I need another compiling without -mmic tag.

That is right. This above solution is based on the native use of the MIC, which is not the best one for GMX. I shall try to build the offload version.

Best regards, Ferenc BARTHA 

Mikhail P.'s picture

For CPU separate binary (it is called mdrun_mpi.IVB in the wrapper) should be built in a usual way with corresponding compiler flags for CPU. For example, the following cmake command can be used:

cmake .. \
-DBUILD_SHARED_LIBS=OFF \
-DGMX_PREFER_STATIC_LIBS=ON \
-DGMX_BUILD_MDRUN_ONLY=ON \
-DGMX_FFT_LIBRARY=mkl \
-DCMAKE_INSTALL_PREFIX=$GROMACS_INSTALL_DIR \
-DGMX_MPI=ON -DGMX_OPENMP=ON \
-DGMX_GPU=OFF \
-DGMX_XML=OFF \
-DGMX_SOFTWARE_INVSQRT=OFF \
-DCMAKE_C_COMPILER=mpiicc \
-DCMAKE_C_FLAGS="-O3 -xCORE-AVX-I -fno-alias -g -DNDEBUG" \
-DCMAKE_CXX_COMPILER=mpiicpc \
-DCMAKE_CXX_FLAGS="-O3 -xCORE-AVX-I -fno-alias -g -DNDEBUG"

 

Command line for executing MPI will look like:

mpirun -np $NPE mdrun.sh <mdrun options>

where <mdrun options> are all options you wanted to pass to mdrun_mpi binary.

Regards, Mikhail

Haneol C.'s picture

I've succeeded for compile with way on this posting. Unfortunately, I don't understand the exact command line for execution.

Furthermore, I think, for co working between host CPU and MIC, executable files for CPU architecture are also needed. Is that right? I guess, this procedure doesn't generate that kind of files. Then, do I need another compiling without -mmic tag. 

... It's too tough. 

Please give me any comments.. :(

Haneol C.'s picture

@Vincent L. You need zlib.a for mic architecture(with -mmic). and you may need zlib.a for main cpu architecture too.

prem s.'s picture

Hi, I ve been trying to install gromacs on xeon phi coprocessor and the make produces a catastrophic error where it cannot open source file mkl_dfti.h. But the file exists in mkl directory. How to overcome this error ?

Thanks in advance

Vincent L.'s picture

HI, How can I get the mic version of zlib? When I try to compiler it show error "x86_64-k1om-linux-ld: i386:x86-64 architecture of input file `/usr/local/lib/libz.a(compress.o)' is incompatible with k1om output"

Thanks

Vincent

Belinda Liviero (Intel)'s picture

Benchmark information of Gromacs on this platform, and other platforms, is available here:  http://www.intel.com/content/www/xr/en/benchmarks/server/xeon-phi/xeon-phi-life-sciences.html

 

Pages

Add a Comment

Have a technical question? Visit our forums. Have site or software product issues? Contact support.