Recipe: Build and Run NAMD on Intel® Xeon® Processors on Single Node

Published: 02/04/2020, Last Updated: 02/04/2020

For cluster run, please refer to the recipe: Build and Run NAMD on Intel® Xeon® Processors on a Cluster

Purpose

This recipe describes a step-by-step process of how to get, build, and run NAMD, Scalable Molecular Dynamic, code on Intel® Xeon® processors for better performance.

Introduction

NAMD is a parallel molecular dynamics code designed for high-performance simulation of large biomolecule systems. Based on Charm++ parallel objects, NAMD scales to hundreds of cores for typical simulations and beyond 500,000 cores for the largest simulations. NAMD uses the popular molecular graphics program VMD for simulation setup and trajectory analysis, but is also file-compatible with AMBER, CHARMM, and X-PLOR.

NAMD is distributed free of charge with source code. You can build NAMD yourself or download binaries for a wide variety of platforms. Find the details below of how to build on Intel® Xeon® processors and learn more about NAMD.

Building and Running NAMD on Intel® Xeon® Processor E5-2697 v4 (BDW), Intel® Xeon® Scalable Gold 6148 Processor (SKX), or Intel® Xeon® Platinum 8260L Processor (CLX)

Download the Codes

  1. Download the latest “Source Code” of NAMD
  2. Download Charm++ 6.8.2version

       a. You can get Charm++ from the NAMD “Source Code” of the “Version Nightly Build”
       b.Or download it separately from Charmplusplus

  3. Download fftw3 version

       a. Version 3.3.7 is used is this run

  4. Download TLC for NAMD version 2.13 or later
  5. Download apao and stvm workloads

Build the Binaries

Note Use –xCORE-AVX512 for SKX or CLX, and –xCORE-AVX2 for BDW

  1. Set environment for compilation:
    $ CC=icc; CXX=icpc; F90=ifort; F77=ifort
    $ export CC CXX F90 F77
    $ source /opt/intel/compiler/<version>/compilervars.sh intel64
  2. Build fftw3:
    $ cd <fftw_root_path>
    $ ./configure --prefix=<fftw_install_path> --enable-single --disable-fortran CC=icc
    $ make CFLAGS="-O3 -xCORE-AVX512 -fp-model fast=2 -no-prec-div -qoverride-limits" clean install
    
  3. Build TLC:
    $ cd $TCL_SRC/unix
    $ ./configure --disable-shared --prefix=<tcl_install_dir>
    $ make clean install
    
  4. Build multicore version of Charm++:
    $ cd <charm_root_path>
    $ base_charm_opts="-O3 -ip -g -xCORE-AVX512" 
    $ unset I_MPI_LINK
    $ unset I_MPI_CC I_MPI_CXX I_MPI_F90 I_MPI_F77
    $ ./build charm++ multicore-linux64 iccstatic --with-production $base_charm_opts -DCMK_OPTIMIZE -DMPICH_IGNORE_CXX_SEEK
    
  5. Build NAMD:
    • Modify the arch/Linux-x86_64-icc to look like the following (select the one of possible FLOATOPTS options depending on CPU type):
      NAMD_ARCH = Linux-x86_64
      CHARMARCH = multicore-linux64-iccstatic
      
      # For SKX or CLX
      FLOATOPTS = -ip -xCORE-AVX512  -O3 -g -fp-model fast=2 -no-prec-div -qoverride-limits -DNAMD_DISABLE_SSE -qopenmp-simd -qopt-zmm-usage=high
      
      # For BDW
      FLOATOPTS = -ip -xCORE-AVX2  -O3 -g -fp-model fast=2 -no-prec-div -qoverride-limits -DNAMD_DISABLE_SSE
      
      CXX = icpc -std=c++11 -DNAMD_KNL
      CXXOPTS = -static-intel -O2 $(FLOATOPTS)
      CXXNOALIASOPTS = -O3 -fno-alias $(FLOATOPTS) -qopt-report-phase=loop,vec -qopt-report=4
      CXXCOLVAROPTS = -O2 -ip
      CC = icc
      COPTS = -static-intel -O2 $(FLOATOPTS)
      
    • Modify the arch/Linux-x86_64.tcl to look like the following:
      TCLDIR=<tcl_install_dir>
      TCLINCL=-I$(TCLDIR)/include
      TCLLIB=-L$(TCLDIR)/lib -ltcl8.7 -ldl -lpthread -lz
      TCLFLAGS=-DNAMD_TCL
      TCL=$(TCLINCL) $(TCLFLAGS)
    • Add these patches to use compiler higher than icc version 2016u4:
      $ sed -i -e '1151 i #pragma omp simd simdlen(16)' <namdSource>/src/ComputeNonbondedBase.h
      $ sed -i -e '1171 i #pragma omp ordered simd monotonic(hu:1)' <namdSource>//src/ComputeNonbondedBase.h
      $ sed -i -e '1522 i #pragma omp simd simdlen(16)' <namdSource>/src/ComputeNonbondedBase.h
      $ sed -i -e '1537 i #pragma omp ordered simd monotonic(plin:1, pli:1)' <namdSource>//src/ComputeNonbondedBase.h
      
      $ sed -i -e 's|simd assert|omp simd|g' <namdSource>/src/ComputeNonbondedBase2.h
      $ sed -i -e 's|simd assert|omp simd|g' <namdSource>/src/ComputeNonbondedBase2KNL.h
      $ sed -i -e 's|simd|omp simd|g' <namdSource>/src/ComputeNonbondedMICKernelBase2_scalar.h
      $ sed -i -e 's|simd|omp simd|<namdSource>/src/ComputeNonbondedMICKernelBase.h
      $ sed -i -e 's|simd assert|omp simd|g' <namdSource>/src/Patch.C
      $ sed -i -e 's|simd assert|omp simd|g' <namdSource>/src/Settle.C
    • Compile NAMD:
      $ unset I_MPI_LINK
      $ unset I_MPI_CC I_MPI_CXX I_MPI_F90 I_MPI_F77
      $ ./config Linux-x86_64-icc --charm-base <charm_root_path> --charm-arch multicore-linux64-iccstatic --with-fftw3 --fftw-prefix <fftw_install_path>  --tcl-prefix <tcl_install_dir>  --charm-opts -verbose 
      $cd Linux-x86_64-icc
      $make clean
      $gmake -j
  6. Change next lines in *.namd file for both workloads (apoa and stmv):

    numsteps 1000
    outputtiming 20
    outputenergies 600

Run NAMD:

on SKL/BDW (ppn = 40 / ppn = 72 correspondingly): 
./namd2 +p $ppn apoa1/apoa1.namd +pemap 0-($ppn-1)

SKL example:

./namd2 +p 80 stmv/stmv.namd +pemap 0-79

Performance results reported in Intel Salesforce repository (ns/day; higher is better):

Workload 2S Intel® Xeon® Processor E5-2697 v4 18c 2.3Ghz (ns/day) 2S Intel® Xeon® Scalable Gold 6148 Processor 20c 2.4Ghz (ns/day) 2S Intel® Xeon® Scalable Gold 6148 Processor vs. 2S Intel® Xeon® Processor E5-2697 v4 (speedup)
stmv 0.45 0.65 1.45x
apoa1 5.5 7.95 1.44x

Systems configuration:

Processor Intel® Xeon® Processor E5-2697 v4 Intel® Xeon® Scalable Gold 6148 Processor
Stepping 1 (B0) 1 (B0)
Sockets / TDP 2S / 290W 2S / 300W
Frequency / Cores / Threads 2.3 GHz / 36 / 72 2.4 GHz / 40 / 80
DDR4 8x16GB 2400 MHz(128GB) 12x16GB 2666 MHz(192GB)
Cluster/Snoop Mode/Mem Mode Home Home
Turbo On On
BIOS GRRFSDP1.86B0271.R00.1510301446  
Compiler ICC-2017.0.098 ICC-2019.4.243
Operating System Red Hat Enterprise Linux* 7.2 Red Hat Enterprise Linux* 7.3
(3.10.0-327.e17.x86_64) 3.10.0-862.11.6.el7.x86_64

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804