GROMACS Recipe for Symmetric Intel® MPI Using PME Workloads

By Heinrich Bockhorst, Published: 05/27/2015, Last Updated: 05/27/2015


This package (scripts with instructions) delivers a build and run environment for symmetric Intel® MPI runs. This file is actually the README of the package. Symmetric stands for employing a Intel® Xeon® executable and a Intel® Xeon Phi™ executable both running together exchanging MPI messages and collective data via Intel MPI.

There is already a GROMACS recipe for symmetric Intel MPI: Intel® Xeon Phi™ Processor but this recipe addresses the so called RF data sets and does not take advantage of the special Particle Mesh Ewald (PME) configuration option.

The symmetric run configurations of this recipe use the PME mode of GROMACS. In this mode the so called Particle-Mesh part of GROMACS for the far reaching forces can run in parallel to the direct force calculation. The idea for an efficient use of both architectures is to run the direct forces on Intel Xeon Phi because highly vectorized kernels exist. The PME calculations that make heavy use of FFT and need very intensive MPI_Alltoall communication run on the Xeon® executable.

This package contains run scripts for running GROMACS on Clusters equipped with Intel® Xeon and Intel Xeon Phi Processors. It is also possible to run GROMACS separately on Intel Xeon and Intel Xeon Phi alone. Scripts assist interactive running but can also be integrated in batch scripts. The full package is attached to this recipe.


The following software and files are necessary for the installation. A user may take these packages or download probably newer versions when they exist.

  1. Download the package GROMACS-SYM-VERSION.tgz provided at the bottom of this article.
  2. GROMACS package:    
    GROMACS Downloads
  3. cmake package: this software is needed in a recent version. Some OS distributions will still have cmake version that do not build GROMACS correctly. Version number > 2.8.8 is mandatory for this GROMACS package. cmake is found in CMake Get the Software please read the contained Readme.txt for help on the installation.
  4. *.tpr input file: You have to have an *.tpr GROMACS input file. This package contains an artificial input topol500k.tpr.  

1. Installation of GROMACS


  1. Untar the package GROMACS-SYM-version.tgz
    (e.g. GROMACS-SYM-0.9.4.tgz)

    $ tar -xvzf GROMACS-SYM-version.tgz


  2. Enter directory

    $ cd GROMACS-SYM-version/

    ​This directory with absolute path is from now: $ BASE_DIR=$PWD.
    Update the version string for GROMACS Versions other than 5.0.5

    $ cat VERSION

    if this is 5.0.5 and you intend to use this version you are done. Update the version number for different GROMACS versions.

  3. Enter the Package directory and copy the original distribution.
    $ cd $BASE_DIR
    $ cd package

    Copy original GROMACS package to this directory and unpack.
    $ cp /<path to package>/gromacs-5.0.5.tar.gz .
    $ tar -xvzf gromacs-5.0.5.tar.gz
  4. Setup environment:
    $ vi $BASE_DIR/

    check environment settings for compiler and mpi by sourcing the environment. Use empty when the environment is taken from shell.
    $ source ./
    $ which icc
    $ which mpiicc


Tested Software Versions

icc                 :  Version 15.0.3, 15.0.2
Intel® MPI     :  Version 5.0.3, 5.0.2   
gcc                :  Version  4.4.7 20120313 (Intel Xeon)
                         Version  4.8.1
MPSS: 3.2.1, 3.4.2, 3.5

gcc version is crucial for stdc++ library and the C++ Flags!

Install Intel® Xeon® executable

It makes sense to compile on the same architecture as the target Xeon® architecture because the GROMACS cmake configuration script will detect the best options.

  1. Go to the Intel Xeon build directory
    $ cd $BASE_DIR/build-xeon

    this directory contains 3 scripts:   : configures the build directory using cmake  : builds and installs software (make)  : removes all configured files if you intend to change parameters and re-install
  2. Configure GROMACS: the script contains GROMACS configuration that will be transformed into a makefile by using cmake. In case of failure you may inspect conf.log and conf.err. These file contain log information and error output. The GROMACS installation will be in $BASE_DIR/gromacs
    $ ./ contains cmake with some proven options (compare original GROMACS installation information). The C++ flags must be different for gcc versions >= 4.7. In case of error follow the instructions given inside
  3. Build GROMACS for Intel Xeon: the build script simply executes the makefile generated in step b. and installs the executable
    $ ./

    in case of success there will be an executable in: $BASE_DIR/gromacs/bin/mdrun_mpi
    in case of error check if gcc version fits to settings. Building the executable should take less than 5 minutes on recent Intel Xeon servers. 

Install MIC executable

MPSS software stack must be present.

  1. Enter the build-mic directory:
    $ cd $BASE_DIR/build-mic

    The following steps are completely analogous to 1.2 a-c. Only the additional -mmic Flag and potentially a different C++ flag. The user does not have to make changes. 
  2. $ ./
  3. $ ./
    should generate a intel Xeon Phi executable in $BASE_DIR/gromacs-mic/bin


Run environment

Starting with interactive tests. Reserve an interactive node for direct testing containing one or more mic cards.If interactive usage is not allowed follow the instructions for running under a batch system (see below).

You will need to provide a hosts file with the name of your host as minimal entry.

The script tries to update your hosts file when it does not find the hostname inside /etc/hosts. Please also check for the Intel Xeon Phi names <hostname>-mic0… inside /etc/hosts .Enter run directory:

$ cd $BASE_DIR/run

Scripts and files:

  • starts a run by defining all environment settings etc. sources different scripts. These scripts are:
  • define some auxiliary bash functions used in
  • source scripts for compiler and mpi -- update for your system. For clusters using modules this script may be empty. The environment settings will be taken from shell if no is present
  • contains MPI and OpenMP specific environment
  • contains application specific settings like program name and program path. This package contains for IMB testing and for running GROMACS
  • executes the MPI command line
  • wrapper script for the executable(s) distinguishes Xeon® and Xeon Phi™ environment                          
  • generates MPI machinefile from hostfile. Default name for machinefile is mach.txt. Please check if hostnames are correct.
  • protocols settings inside settings.prot
  •   : protocols environment settings inside env.prot
  • : runs seven different configurations for systems with 2 Intel Xeon Phi cards. For single card systems only 3 configurations will be tested.

IMB Tests (Optional, Test Run Scripts Independent of GROMACS)

To make sure that the run system is working, use it with Intel® MPI Benchmarks (IMB) as a test for different scenarios. The IMB Benchmarks are already build for Intel® 64 and mic architecture. They can be found in: $I_MPI_ROOT/intel64/bin and $I_MPI_ROOT/mic/bin.

Set application to IMB:

$ rm

make a soft link to

$ ln -s contains all necessary imb definitions for running different scenarios.

Run the test script

$ ./

this will generate an output directory output_Sendrecv_TEST. This directory contains 7 sub-directories. The sub-directory names are the configuration e.g.:
N-1_H4T6_2xMIC12T15: 1 Node with 4 Host processes with each 6 threads and 2 MIC with 12 processes and 15 threads each.
Each directory contains all used scripts from the run directory and the output files:

settings.prot and env.prot    : configuration logs
command.txt                        : command line
OUT.txt                                : stdout
OUT.err                               : stderr

the stdout files of each directory contain an IMB sendrecv benchmark showing potential bottlenecks in MPI message passing.

7 different tests are only possible for compute nodes with at least 2 Intel Xeon Phi cards!


Set application to GROMACS:

$ ln -s

settings in are defined for the artificial
test case topol500k.tpr. Please adapt settings to your
input set.

Run the test cases

$ ./

This generates an output directory: output_topol500k_TEST which contains 7 sub directories. In case of success, each directory contains an md.log for GROMACS and prints out a performance statement in the end.

Please see IMB Tests (Optional, Test Run Scripts Independent of GROMACS) for an explanation of the directory/file names.

Define New Runs

The current machinefile generation script supports only a subset of possible configurations.

New configurations can be created by changing three variables inside
Open script:

$ vi

# HOST_PE: Ranks on host,           (=0 : host not used)
# NUM_MIC: number of used MIC cards (=0 : no mic card used)
# PP_MIC : number of Ranks on each MIC card

export HOST_PE=${HOST_PE:-2}
export NUM_MIC=${NUM_MIC:-2}
export PP_MIC=${PP_MIC:-12}

These variables determine the number of MPI ranks on the host, the number of used mic cards and the number of ranks on each of the cards.

The number of threads on host and mic are determined by:

# automatic setting of thread number
# this overwrites explicit thread number
# compare output in file settings.prot

export NUM_CORES=12
export MIC_NUM_CORES=57


Here we define the number of cores. The choices are minimal and should be adapted it can be determined by reading the output of micinfo and cpuinfo. Please adapt to your Intel Xeon Phi and Intel Xeon.

After changing the parameters it makes sense to do a dry run with


Running will generate all settings but will not execute the program. This mode will show e.g. if the machine file is correct:

$ cat mach.txt

Batch Usage

The start script also contains


This branch will just work as in interactive mode but sends the file to a batch queue using command line options of the batch system: compare settings under run/TEMPLATES for templates of This methodology works for LSF and PBS and SLURM but it might need some additional knowledge of the job manager.

It will be easier to write a batch script as supposed by the cluster documentation. The Script can look like this:

 #QSUB <you settings>
 #QSUB ...

#generate hosts file e.g. = $PBS_NODEFILE

 # define configuration

export HOST_PE=<num of host pe>
export NUM_MIC=<number of mics>
export PP_MIC=<number of ranks per mic>

 ./ [<number of nodes>]  


  • Check settings.prot and env.prot for protocolled settings.
  • Check machine file mach.txt
  • Use imb as application and check if the system works with imb.
  • Timing output is spoiled for symmetric GOMACS runs. PME part is scaled by wrong factor (just for information).
  • Before configuring check if there are no LDFLAGS and CFLAGS defined inside the shell. This will confuse the cmake configuration.
  • Please check if your gcc version number is >= 4.7. This might need an additional flag for the CXX_FLAGS inside Please read the note inside
  • Check if the general rule for mic hosts is valid:
    name for mic0 is: <hostname>-mic0
    if this is not the case, please adapt the function
    host2mic inside  

Attachment Size
gromacs-sym-0-9-4.tgz 2.7 MB

Product and Performance Information


Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804