MPI crash in physics code: Fatal error in PMPI_Allgatherv

MPI crash in physics code: Fatal error in PMPI_Allgatherv

I am running a large program called Vienna ab-initio Simulation Program (VASP) under parallel studio and intel mpi.  I have compiled the program without problem and it runs apparenly correctly on all of the examples and produces correct results when run under mpi, however, a slightly larger job, which is what I bought the program for repeatedly crashes with an PMPI_Allgatherv error.  As no other users of (this fairly widely used program) report similar errors, I am concerned that it is either an intel mpi bug.  Another possibility is that it is a linking error on my part, hence, I have enclosed the makefile I used for building VASP for reference.  Any help would be greatly appreciated.

Best wishes,

                          Paul Fons  

The crash traceback is below:

mpirun -np 16 vasp_gamma 
running on 16 total cores 
distrk: each k-point on 16 cores, 1 groups 
distr: one band on 8 cores, 2 groups 
using from now: INCAR 
vasp.5.3.3 18Dez12 (build May 13 2013 15:17:23) gamma-only 

POSCAR found : 3 types and 108 ions 
scaLAPACK will be used 
LDA part: xc-table for Pade appr. of Perdew

POSCAR, INCAR and KPOINTS ok, starting setup 
WARNING: small aliasing (wrap around) errors must be expected 
FFT: planning ... 
WAVECAR not read 
WARNING: random wavefunctions but no delay for mixing, default for NELMDL 
prediction of wavefunctions initialized - no I/O 
entering main loop 
N E dE d eps ncg rms rms(c) 
RMM: 1 0.438327604116E+04 0.43833E+04 -0.94880E+04 346 0.506E+02 
RMM: 2 0.127584402674E+04 -0.31074E+04 -0.31882E+04 346 0.152E+02 
RMM: 3 0.299986400145E+03 -0.97586E+03 -0.12137E+04 346 0.925E+01 
RMM: 4 -0.606823381784E+02 -0.36067E+03 -0.41045E+03 346 0.659E+01 
RMM: 5 -0.228060471875E+03 -0.16738E+03 -0.15696E+03 346 0.362E+01 
RMM: 6 -0.308261614897E+03 -0.80201E+02 -0.68222E+02 346 0.255E+01 
RMM: 7 -0.345624028023E+03 -0.37362E+02 -0.34195E+02 346 0.155E+01 
RMM: 8 -0.366489442688E+03 -0.20865E+02 -0.18421E+02 346 0.116E+01 
RMM: 9 -0.391069482656E+03 -0.24580E+02 -0.24063E+02 842 0.755E+00 
RMM: 10 -0.392923752147E+03 -0.18543E+01 -0.32951E+01 884 0.200E+00 
RMM: 11 -0.393354485197E+03 -0.43073E+00 -0.41138E+00 833 0.520E-01 
RMM: 12 -0.393407181438E+03 -0.52696E-01 -0.49653E-01 802 0.148E-01 0.950E+00 
RMM: 13 -0.390937720994E+03 0.24695E+01 -0.45304E+00 697 0.142E+00 0.601E+00 
RMM: 14 -0.390311322970E+03 0.62640E+00 -0.37173E+00 716 0.136E+00 0.276E+00 
RMM: 15 -0.390294966246E+03 0.16357E-01 -0.96176E-01 783 0.688E-01 0.135E+00 
RMM: 16 -0.390280817461E+03 0.14149E-01 -0.18087E-01 700 0.375E-01 0.504E-01 
RMM: 17 -0.390284202366E+03 -0.33849E-02 -0.26515E-02 722 0.159E-01 0.268E-01 
RMM: 18 -0.390287549580E+03 -0.33472E-02 -0.92233E-03 724 0.899E-02 0.158E-01 
RMM: 19 -0.390289808941E+03 -0.22594E-02 -0.63338E-03 696 0.761E-02 0.924E-02 
RMM: 20 -0.390290458142E+03 -0.64920E-03 -0.14378E-03 695 0.422E-02 0.458E-02 
RMM: 21 -0.390290916003E+03 -0.45786E-03 -0.10433E-03 599 0.298E-02 0.274E-02 
RMM: 22 -0.390290971931E+03 -0.55928E-04 -0.23374E-04 438 0.159E-02 
1 T= 600. E= -.38199243E+03 F= -.39029097E+03 E0= -.39022511E+03 EK= 0.82985E+01 SP= 0.00E+00 SK= 0.00E+00 
bond charge predicted 
N E dE d eps ncg rms rms(c) 
RMM: 1 -0.390242713772E+03 0.48202E-01 -0.45969E+00 692 0.226E+00 0.275E-01 
RMM: 2 -0.390240986761E+03 0.17270E-02 -0.10085E-01 751 0.279E-01 0.152E-01 
RMM: 3 -0.390241059805E+03 -0.73044E-04 -0.90892E-03 811 0.726E-02 0.953E-02 
RMM: 4 -0.390240906267E+03 0.15354E-03 -0.10114E-03 675 0.269E-02 0.464E-02 
RMM: 5 -0.390240893714E+03 0.12552E-04 -0.28526E-04 438 0.162E-02 
2 T= 596. E= -.38199224E+03 F= -.39024089E+03 E0= -.39017463E+03 EK= 0.82487E+01 SP= 0.00E+00 SK= 0.00E+00 
Fatal error in PMPI_Allgatherv: Internal MPI error!, error stack: 
PMPI_Allgatherv(1430).....: MPI_Allgatherv(sbuf=0x8051d70, scount=12110, MPI_DOUBLE_PRECISION, rbuf=0x8051d70, rcounts=0x75fc7d0, displs=0x77e78f0, MPI_DOUBLE_PRECISION, comm=0xc4010000) failed 
MPIR_Allgatherv_impl(1002): 
MPIR_Allgatherv(958)......: 
MPIR_Allgatherv_intra(708): 
MPIR_Localcopy(381).......: memcpy arguments alias each other, dst=0x8051d70 src=0x8051d70 len=96880 
Fatal error in PMPI_Allgatherv: Internal MPI error!, error stack: 
PMPI_Allgatherv(1430).....: MPI_Allgatherv(sbuf=0x80507c0, scount=12110, MPI_DOUBLE_PRECISION, rbuf=0x8038d50, rcounts=0x763d1c0, displs=0x763d210, MPI_DOUBLE_PRECISION, comm=0x84000006) failed 
MPIR_Allgatherv_impl(1002): 
MPIR_Allgatherv(958)......: 
MPIR_Allgatherv_intra(708): 
MPIR_Localcopy(381).......: memcpy arguments alias each other, dst=0x80507c0 src=0x80507c0 len=96880 
Fatal error in PMPI_Allgatherv: Internal MPI error!, error stack: 
PMPI_Allgatherv(1430).....: MPI_Allgatherv(sbuf=0x80875a0, scount=12110, MPI_DOUBLE_PRECISION, rbuf=0x8040650, rcounts=0x75ea8f0, displs=0x77a1770, MPI_DOUBLE_PRECISION, comm=0x84000006) failed 
MPIR_Allgatherv_impl(1002): 
MPIR_Allgatherv(958)......: 
MPIR_Allgatherv_intra(708): 
MPIR_Localcopy(381).......: memcpy arguments alias each other, dst=0x80875a0 src=0x80875a0 len=96880 
Fatal error in PMPI_Allgatherv: Internal MPI error!, error stack: 
PMPI_Allgatherv(1430).....: MPI_Allgatherv(sbuf=0x80b7fa0, scount=11764, MPI_DOUBLE_PRECISION, rbuf=0x8013160, rcounts=0x72b1060, displs=0x72b10b0, MPI_DOUBLE_PRECISION, comm=0x84000006) failed
MPIR_Allgatherv_impl(1002): 
MPIR_Allgatherv(958)......: 
MPIR_Allgatherv_intra(708): 
MPIR_Localcopy(381).......: memcpy arguments alias each other, dst=0x80b7fa0 src=0x80b7fa0 len=94112 
Fatal error in PMPI_Allgatherv: Internal MPI error!, error stack: 
PMPI_Allgatherv(1430).....: MPI_Allgatherv(sbuf=0x806b190, scount=12110, MPI_DOUBLE_PRECISION, rbuf=0x803bcb0, rcounts=0x72b0c00, displs=0x72b0c50, MPI_DOUBLE_PRECISION, comm=0x84000006) failed 
MPIR_Allgatherv_impl(1002): 
MPIR_Allgatherv(958)......: 
MPIR_Allgatherv_intra(708): 
MPIR_Localcopy(381).......: memcpy arguments alias each other, dst=0x806b190 src=0x806b190 len=96880 
Fatal error in PMPI_Allgatherv: Internal MPI error!, error stack: 
PMPI_Allgatherv(1430).....: MPI_Allgatherv(sbuf=0x8084d60, scount=12110, MPI_DOUBLE_PRECISION, rbuf=0x800e930, rcounts=0x72b1060, displs=0x72b10b0, MPI_DOUBLE_PRECISION, comm=0x84000006) failed
MPIR_Allgatherv_impl(1002): 
MPIR_Allgatherv(958)......: 
MPIR_Allgatherv_intra(708): 
MPIR_Localcopy(381).......: memcpy arguments alias each other, dst=0x8084d60 src=0x8084d60 len=96880 
Fatal error in PMPI_Allgatherv: Internal MPI error!, error stack: 
PMPI_Allgatherv(1430).....: MPI_Allgatherv(sbuf=0x809e0f0, scount=11764, MPI_DOUBLE_PRECISION, rbuf=0x8010250, rcounts=0x72b10b0, displs=0x72b0b70, MPI_DOUBLE_PRECISION, comm=0x84000006) failed
MPIR_Allgatherv_impl(1002): 
MPIR_Allgatherv(958)......: 
MPIR_Allgatherv_intra(708): 
MPIR_Localcopy(381).......: memcpy arguments alias each other, dst=0x809e0f0 src=0x809e0f0 len=94112 
Fatal error in PMPI_Allgatherv: Internal MPI error!, error stack: 
PMPI_Allgatherv(1430).....: MPI_Allgatherv(sbuf=0x80703f0, scount=12110, MPI_DOUBLE_PRECISION, rbuf=0x8011a30, rcounts=0x72b10b0, displs=0x779fc20, MPI_DOUBLE_PRECISION, comm=0x84000006) failed 
MPIR_Allgatherv_impl(1002): 
MPIR_Allgatherv(958)......: 
MPIR_Allgatherv_intra(708): 
MPIR_Localcopy(381).......: memcpy arguments alias each other, dst=0x80703f0 src=0x80703f0 len=96880 
Fatal error in PMPI_Allgatherv: Internal MPI error!, error stack: 
PMPI_Allgatherv(1430).....: MPI_Allgatherv(sbuf=0x80ca510, scount=11764, MPI_DOUBLE_PRECISION, rbuf=0x800e730, rcounts=0x72b1100, displs=0x72b1150, MPI_DOUBLE_PRECISION, comm=0xc4010000) failed 
MPIR_Allgatherv_impl(1002): 
MPIR_Allgatherv(958)......: 
MPIR_Allgatherv_intra(708): 
MPIR_Localcopy(381).......: memcpy arguments alias each other, dst=0x80ca510 src=0x80ca510 len=94112 
Fatal error in PMPI_Allgatherv: Internal MPI error!, error stack: 
PMPI_Allgatherv(1430).....: MPI_Allgatherv(sbuf=0x80b6ac0, scount=11764, MPI_DOUBLE_PRECISION, rbuf=0x7fe3d40, rcounts=0x72b0c40, displs=0x72b0c90, MPI_DOUBLE_PRECISION, comm=0x84000006) failed 
MPIR_Allgatherv_impl(1002): 
MPIR_Allgatherv(958)......: 
MPIR_Allgatherv_intra(708): 
MPIR_Localcopy(381).......: memcpy arguments alias each other, dst=0x80b6ac0 src=0x80b6ac0 len=94112 

The Makefile is below: 

.SUFFIXES: .inc .f .f90 .F 
#----------------------------------------------------------------------- 
# Makefile for Intel Fortran compiler for Pentium/Athlon/Opteron 
# based systems 
# we recommend this makefile for both Intel as well as AMD systems 
# for AMD based systems appropriate BLAS (libgoto) and fftw libraries are 
# however mandatory (whereas they are optional for Intel platforms) 
# For Athlon we recommend 
# ) to link against libgoto (and mkl as a backup for missing routines) 
# ) odd enough link in libfftw3xf_intel.a (fftw interface for mkl) 
# feedback is greatly appreciated 

# The makefile was tested only under Linux on Intel and AMD platforms 
# the following compiler versions have been tested: 
# - ifc.7.1 works stable somewhat slow but reliably 
# - ifc.8.1 fails to compile the code properly 
# - ifc.9.1 recommended (both for 32 and 64 bit) 
# - ifc.10.1 partially recommended (both for 32 and 64 bit) 
# tested build 20080312 Package ID: l_fc_p_10.1.015 
# the gamma only mpi version can not be compiles 
# using ifc.10.1 
# - ifc.11.1 partially recommended (some problems with Gamma only and intel fftw) 
# Build 20090630 Package ID: l_cprof_p_11.1.046 
# - ifort.12.1 strongly recommended (we use this to compile vasp) 
# Version 12.1.5.339 Build 20120612 

# it might be required to change some of library path ways, since 
# LINUX installations vary a lot 

# Hence check ***ALL*** options in this makefile very carefully 
#----------------------------------------------------------------------- 

# BLAS must be installed on the machine 
# there are several options: 
# 1) very slow but works: 
# retrieve the lapackage from ftp.netlib.org 
# and compile the blas routines (BLAS/SRC directory) 
# please use g77 or f77 for the compilation. When I tried to 
# use pgf77 or pgf90 for BLAS, VASP hang up when calling 
# ZHEEV (however this was with lapack 1.1 now I use lapack 2.0) 
# 2) more desirable: get an optimized BLAS 

# the two most reliable packages around are presently: 
# 2a) Intels own optimised BLAS (PIII, P4, PD, PC2, Itanium) 
http://developer.intel.com/software/products/mkl/ 
# this is really excellent, if you use Intel CPU's 

# 2b) probably fastest SSE2 (4 GFlops on P4, 2.53 GHz, 16 GFlops PD, 
# around 30 GFlops on Quad core) 
# Kazushige Goto's BLAS 
http://www.cs.utexas.edu/users/kgoto/signup_first.html 
http://www.tacc.utexas.edu/resources/software/ 

#----------------------------------------------------------------------- 

# all CPP processed fortran files have the extension .f90 
SUFFIX=.f90 

#----------------------------------------------------------------------- 
# fortran compiler and linker 
#----------------------------------------------------------------------- 
FC=ifort -I$(MKL_ROOT)/include/fftw -I$(MKLROOT)/include/mic/lp64 -I$(MKLROOT)/include -mmic 
# fortran linker 
FCL=$(FC) -static 

#----------------------------------------------------------------------- 
# whereis CPP ?? (I need CPP, can't use gcc with proper options) 
# that's the location of gcc for SUSE 5.3 

# CPP_ = /usr/lib/gcc-lib/i486-linux/2.7.2/cpp -P -C 

# that's probably the right line for some Red Hat distribution: 

# CPP_ = /usr/lib/gcc-lib/i386-redhat-linux/2.7.2.3/cpp -P -C 

# SUSE X.X, maybe some Red Hat distributions: 

CPP_ = ./preprocess <$*.F | /usr/bin/cpp -P -C -traditional >$*$(SUFFIX) 

# this release should be fpp clean 
# we now recommend fpp as preprocessor 
# if this fails go back to cpp 
#CPP_=fpp -f_com=no -free -w0 $*.F $*$(SUFFIX) 

#----------------------------------------------------------------------- 
# possible options for CPP: 
# NGXhalf charge density reduced in X direction 
# wNGXhalf gamma point only reduced in X direction 
# avoidalloc avoid ALLOCATE if possible 
# PGF90 work around some for some PGF90 / IFC bugs 
# CACHE_SIZE 1000 for PII,PIII, 5000 for Athlon, 8000-12000 P4, PD 
# RPROMU_DGEMV use DGEMV instead of DGEMM in RPRO (depends on used BLAS) 
# RACCMU_DGEMV use DGEMV instead of DGEMM in RACC (depends on used BLAS) 
# tbdyn MD package of Tomas Bucko 
#----------------------------------------------------------------------- 

CPP = $(CPP_) -DHOST=\"LinuxIFC\" \ 
-DCACHE_SIZE=12000 -DPGF90 -Davoidalloc -DNGZhalf \ 
# -DRPROMU_DGEMV -DRACCMU_DGEMV 

#----------------------------------------------------------------------- 
# general fortran flags (there must a trailing blank on this line) 
# byterecl is strictly required for ifc, since otherwise 
# the WAVECAR file becomes huge 
#----------------------------------------------------------------------- 

FFLAGS = -FR -names lowercase -assume byterecl -I$(MKLROOT)/include 

#----------------------------------------------------------------------- 
# optimization 
# we have tested whether higher optimisation improves performance 
# -axK SSE1 optimization, but also generate code executable on all mach. 
# xK improves performance somewhat on XP, and a is required in order 
# to run the code on older Athlons as well 
# -xW SSE2 optimization 
# -axW SSE2 optimization, but also generate code executable on all mach. 
# -tpp6 P3 optimization 
# -tpp7 P4 optimization 
#----------------------------------------------------------------------- 

# ifc.9.1, ifc.10.1 recommended 
#OFLAG=-O2 -ip 
OFLAG= -xHOST -O3 -ip -static 
OFLAG_HIGH = $(OFLAG) 
OBJ_HIGH = 
OBJ_NOOPT = 
DEBUG = -FR -O0 
INLINE = $(OFLAG) 

#----------------------------------------------------------------------- 
# the following lines specify the position of BLAS and LAPACK 
# we recommend to use mkl, that is simple and most likely 
# fastest in Intel based machines 
#----------------------------------------------------------------------- 

# mkl path for ifc 11 compiler 
#MKL_PATH=$(MKLROOT)/lib/em64t 

# mkl path for ifc 12 compiler 
MKL_PATH=$(MKLROOT)/lib/intel64 

MKL_FFTW_PATH=$(MKLROOT)/interfaces/fftw3xf/ 

# BLAS 
# setting -DRPROMU_DGEMV -DRACCMU_DGEMV in the CPP lines usually speeds up program execution 
# BLAS= -Wl,--start-group $(MKL_PATH)/libmkl_intel_lp64.a $(MKL_PATH)/libmkl_intel_thread.a $(MKL_PATH)/libmkl_core.a -Wl,--end-group -lguide 
# faster linking and available from at least version 11 
#BLAS= -lguide -mkl 
#BLAS = /home/paulfons/VASP/src/GotoBlas2/libgoto2_nehalemp-r1.13.a 
BLAS = $(MKLROOT)/lib/intel64/libmkl_blas95_lp64.a $(MKLROOT)/lib/intel64/libmkl_lapack95_lp64.a $(MKLROOT)/lib/intel64/libmkl_scalapack_lp64.a -Wl,--start-group $(MKLROOT)/lib/intel64/libmkl_cdft_core.a $(MKLROOT)/lib/intel64/libmkl_intel_lp64.a $(MKLROOT)/lib/intel64/libmkl_sequential.a $(MKLROOT)/lib/intel64/libmkl_core.a $(MKLROOT)/lib/intel64/libmkl_blacs_intelmpi_lp64.a -Wl,--end-group -lpthread -lm 
# LAPACK, use vasp.5.lib/lapack_double 

#LAPACK= ../vasp.5.lib/lapack_double.o 

# LAPACK from mkl, usually faster and contains scaLAPACK as well 

#LAPACK= $(MKL_PATH)/libmkl_intel_lp64.a 

# here a tricky version, link in libgoto and use mkl as a backup 
# also needs a special line for LAPACK 
# this is the best thing you can do on AMD based systems !!!!!! 

#BLAS = -Wl,--start-group /opt/libs/libgoto/libgoto.so $(MKL_PATH)/libmkl_intel_thread.a $(MKL_PATH)/libmkl_core.a -Wl,--end-group -liomp5 
#LAPACK= /opt/libs/libgoto/libgoto.so $(MKL_PATH)/libmkl_intel_lp64.a 

#----------------------------------------------------------------------- 

LIB = -L../vasp.5.lib -ldmy \ 
../vasp.5.lib/linpack_double.o $(LAPACK) \ 
$(BLAS) 

# options for linking, nothing is required (usually) 
#LINK = -parallel 
LINK = 

#----------------------------------------------------------------------- 
# fft libraries: 
# VASP.5.2 can use fftw.3.1.X (http://www.fftw.org) 
# since this version is faster on P4 machines, we recommend to use it 
#----------------------------------------------------------------------- 

FFT3D = fft3dfurth.o fft3dlib.o 

# alternatively: fftw.3.1.X is slighly faster and should be used if available 
#FFT3D = fftw3d.o fft3dlib.o /opt/libs/fftw-3.1.2/lib/libfftw3.a 

# you may also try to use the fftw wrapper to mkl (but the path might vary a lot) 
# it seems this is best for AMD based systems 
#FFT3D = fftw3d.o fft3dlib.o $(MKL_FFTW_PATH)/libfftw3xf_intel.a 
#INCS = -I$(MKLROOT)/include/fftw 

#=======================================================================
# MPI section, uncomment the following lines until 
# general rules and compile lines 
# presently we recommend OPENMPI, since it seems to offer better 
# performance than lam or mpich 

# !!! Please do not send me any queries on how to install MPI, I will 
# certainly not answer them !!!! 
#=======================================================================
#----------------------------------------------------------------------- 
# fortran linker for mpi 
#----------------------------------------------------------------------- 

#FC=mpif90 
FC=mpiifort 
FCL=$(FC) 

#----------------------------------------------------------------------- 
# additional options for CPP in parallel version (see also above): 
# NGZhalf charge density reduced in Z direction 
# wNGZhalf gamma point only reduced in Z direction 
# scaLAPACK use scaLAPACK (recommended if mkl is available) 
# avoidalloc avoid ALLOCATE if possible 
# PGF90 work around some for some PGF90 / IFC bugs 
# CACHE_SIZE 1000 for PII,PIII, 5000 for Athlon, 8000-12000 P4, PD 
# RPROMU_DGEMV use DGEMV instead of DGEMM in RPRO (depends on used BLAS) 
# RACCMU_DGEMV use DGEMV instead of DGEMM in RACC (depends on used BLAS) 
# tbdyn MD package of Tomas Bucko 
#----------------------------------------------------------------------- 

#----------------------------------------------------------------------- 

#CPP = $(CPP_) -DMPI -DHOST=\"LinuxIFC\" -DIFC \ 
# -DCACHE_SIZE=4000 -DPGF90 -Davoidalloc -DNGZhalf \ 
# -DMPI_BLOCK=262144 -Duse_collective -DscaLAPACK \ 
# -DRPROMU_DGEMV -DRACCMU_DGEMV 
#CPP = $(CPP_) -DMPI -DHOST=\"LinuxIFC\" -DIFC \ 
# -DCACHE_SIZE=4000 -DPGF90 -Davoidalloc \ 
# -DMPI_BLOCK=262144 -Duse_collective -DscaLAPACK \ 
# -DRPROMU_DGEMV -DRACCMU_DGEMV 
CPP = $(CPP_) -DMPI -DHOST=\"SiriusMKL_ifort13\" -DIFC \ 
-DCACHE_SIZE=4000 -DPGF90 -Davoidalloc -DwNGZhalf -DNGZhalf \ 
-DMPI_BLOCK=8000 -Duse_collective -DscaLAPACK -Dtbdyn \ 
-DRPROMU_DGEMV -DRACCMU_DGEMV 

# -DMPI_BLOCK=8000 -Duse_collective -DscaLAPACK 
#----------------------------------------------------------------------- 
# location of SCALAPACK 
# if you do not use SCALAPACK simply leave this section commented out 
#----------------------------------------------------------------------- 

# usually simplest link in mkl scaLAPACK 
#BLACS= -lmkl_blacs_openmpi_lp64 
#SCA= $(MKL_PATH)/libmkl_scalapack_lp64.a $(BLACS) 
#SCA= -lmkl_scalapack_lp64 -lmkl_core0 
#----------------------------------------------------------------------- 
# libraries for mpi? 
#----------------------------------------------------------------------- 

LIB = -L../vasp.5.lib -ldmy \ 
../vasp.5.lib/linpack_double.o \ 
$(SCA) $(LAPACK) $(BLAS) -L/opt/intel/composer_xe_2013/mkl/lib/intel64/ 

#----------------------------------------------------------------------- 
# parallel FFT 
#----------------------------------------------------------------------- 

# FFT: fftmpi.o with fft3dlib of Juergen Furthmueller 
#FFT3D = fftmpi.o fftmpi_map.o fft3dfurth.o fft3dlib.o 

# alternatively: fftw.3.1.X is slighly faster and should be used if available 
#FFT3D = fftmpiw.o fftmpi_map.o fftw3d.o fft3dlib.o /opt/local/fftw3/lib/libfftw3.a 

# you may also try to use the fftw wrapper to mkl (but the path might vary a lot) 
# it seems this is best for AMD based systems 
FFT3D = fftmpiw.o fftmpi_map.o fftw3d.o fft3dlib.o /opt/intel/composer_xe_2013/mkl/interfaces/fftw3xf/libfftw3xf_intel.a 
#INCS = -I$(MKLROOT)/include/fftw 

#----------------------------------------------------------------------- 
# general rules and compile lines 
#----------------------------------------------------------------------- 
BASIC= symmetry.o symlib.o lattlib.o random.o 

SOURCE= base.o mpi.o smart_allocate.o xml.o \ 
constant.o jacobi.o main_mpi.o scala.o \ 
asa.o lattice.o poscar.o ini.o mgrid.o xclib.o vdw_nl.o xclib_grad.o \ 
radial.o pseudo.o gridq.o ebs.o \ 
mkpoints.o wave.o wave_mpi.o wave_high.o spinsym.o \ 
$(BASIC) nonl.o nonlr.o nonl_high.o dfast.o choleski2.o \ 
mix.o hamil.o xcgrad.o xcspin.o potex1.o potex2.o \ 
constrmag.o cl_shift.o relativistic.o LDApU.o \ 
paw_base.o metagga.o egrad.o pawsym.o pawfock.o pawlhf.o rhfatm.o hyperfine.o paw.o \ 
mkpoints_full.o charge.o Lebedev-Laikov.o stockholder.o dipol.o pot.o \ 
dos.o elf.o tet.o tetweight.o hamil_rot.o \ 
chain.o dyna.o k-proj.o sphpro.o us.o core_rel.o \ 
aedens.o wavpre.o wavpre_noio.o broyden.o \ 
dynbr.o hamil_high.o rmm-diis.o reader.o writer.o tutor.o xml_writer.o \ 
brent.o stufak.o fileio.o opergrid.o stepver.o \ 
chgloc.o fast_aug.o fock_multipole.o fock.o mkpoints_change.o sym_grad.o \ 
mymath.o internals.o npt_dynamics.o dynconstr.o dimer_heyden.o dvvtrajectory.o vdwforcefield.o \ 
nmr.o pead.o subrot.o subrot_scf.o \ 
force.o pwlhf.o gw_model.o optreal.o steep.o davidson.o david_inner.o \ 
electron.o rot.o electron_all.o shm.o pardens.o paircorrection.o \ 
optics.o constr_cell_relax.o stm.o finite_diff.o elpol.o \ 
hamil_lr.o rmm-diis_lr.o subrot_cluster.o subrot_lr.o \ 
lr_helper.o hamil_lrf.o elinear_response.o ilinear_response.o \ 
linear_optics.o \ 
setlocalpp.o wannier.o electron_OEP.o electron_lhf.o twoelectron4o.o \ 
mlwf.o ratpol.o screened_2e.o wave_cacher.o chi_base.o wpot.o \ 
local_field.o ump2.o ump2kpar.o fcidump.o ump2no.o \ 
bse_te.o bse.o acfdt.o chi.o sydmat.o dmft.o \ 
rmm-diis_mlr.o linear_response_NMR.o wannier_interpol.o linear_response.o 

vasp: $(SOURCE) $(FFT3D) $(INC) main.o 
rm -f vasp 
$(FCL) -o vasp main.o $(SOURCE) $(FFT3D) $(LIB) $(LINK) 
makeparam: $(SOURCE) $(FFT3D) makeparam.o main.F $(INC) 
$(FCL) -o makeparam $(LINK) makeparam.o $(SOURCE) $(FFT3D) $(LIB) 
zgemmtest: zgemmtest.o base.o random.o $(INC) 
$(FCL) -o zgemmtest $(LINK) zgemmtest.o random.o base.o $(LIB) 
dgemmtest: dgemmtest.o base.o random.o $(INC) 
$(FCL) -o dgemmtest $(LINK) dgemmtest.o random.o base.o $(LIB) 
ffttest: base.o smart_allocate.o mpi.o mgrid.o random.o ffttest.o $(FFT3D) $(INC) 
$(FCL) -o ffttest $(LINK) ffttest.o mpi.o mgrid.o random.o smart_allocate.o base.o $(FFT3D) $(LIB) 
kpoints: $(SOURCE) $(FFT3D) makekpoints.o main.F $(INC) 
$(FCL) -o kpoints $(LINK) makekpoints.o $(SOURCE) $(FFT3D) $(LIB) 

clean:
-rm -f *.g *.f *.o *.L *.mod ; touch *.F 

main.o: main$(SUFFIX) 
$(FC) $(FFLAGS)$(DEBUG) $(INCS) -c main$(SUFFIX) 
xcgrad.o: xcgrad$(SUFFIX) 
$(FC) $(FFLAGS) $(INLINE) $(INCS) -c xcgrad$(SUFFIX) 
xcspin.o: xcspin$(SUFFIX) 
$(FC) $(FFLAGS) $(INLINE) $(INCS) -c xcspin$(SUFFIX) 

makeparam.o: makeparam$(SUFFIX) 
$(FC) $(FFLAGS)$(DEBUG) $(INCS) -c makeparam$(SUFFIX) 

makeparam$(SUFFIX): makeparam.F main.F 

# MIND: I do not have a full dependency list for the include 
# and MODULES: here are only the minimal basic dependencies 
# if one strucuture is changed then touch_dep must be called 
# with the corresponding name of the structure 

base.o: base.inc base.F 
mgrid.o: mgrid.inc mgrid.F 
constant.o: constant.inc constant.F 
lattice.o: lattice.inc lattice.F 
setex.o: setexm.inc setex.F 
pseudo.o: pseudo.inc pseudo.F 
mkpoints.o: mkpoints.inc mkpoints.F 
wave.o: wave.F 
nonl.o: nonl.inc nonl.F 
nonlr.o: nonlr.inc nonlr.F 

$(OBJ_HIGH): 
$(CPP) 
$(FC) $(FFLAGS) $(OFLAG_HIGH) $(INCS) -c $*$(SUFFIX) 
$(OBJ_NOOPT): 
$(CPP) 
$(FC) $(FFLAGS) $(INCS) -c $*$(SUFFIX) 

fft3dlib_f77.o: fft3dlib_f77.F 
$(CPP) 
$(F77) $(FFLAGS_F77) -c $*$(SUFFIX) 

.F.o: 
$(CPP) 
$(FC) $(FFLAGS) $(OFLAG) $(INCS) -c $*$(SUFFIX) 
.F$(SUFFIX): 
$(CPP) 
$(SUFFIX).o: 
$(FC) $(FFLAGS) $(OFLAG) $(INCS) -c $*$(SUFFIX) 

# special rules 
#----------------------------------------------------------------------- 
# these special rules have been tested for ifc.11 and ifc.12 only 

fft3dlib.o : fft3dlib.F 
$(CPP) 
$(FC) -FR -lowercase -O2 -c $*$(SUFFIX) 
fft3dfurth.o : fft3dfurth.F 
$(CPP) 
$(FC) -FR -lowercase -O1 -c $*$(SUFFIX) 
fftw3d.o : fftw3d.F 
$(CPP) 
$(FC) -FR -lowercase -O1 $(INCS) -c $*$(SUFFIX) 
fftmpi.o : fftmpi.F 
$(CPP) 
$(FC) -FR -lowercase -O1 -c $*$(SUFFIX) 
fftmpiw.o : fftmpiw.F 
$(CPP) 
$(FC) -FR -lowercase -O1 $(INCS) -c $*$(SUFFIX) 
wave_high.o : wave_high.F 
$(CPP) 
$(FC) -FR -lowercase -O1 -c $*$(SUFFIX) 
# the following rules are probably no longer required (-O3 seems to work) 
wave.o : wave.F 
$(CPP) 
$(FC) -FR -lowercase -O2 -c $*$(SUFFIX) 
paw.o : paw.F 
$(CPP) 
$(FC) -FR -lowercase -O2 -c $*$(SUFFIX) 
cl_shift.o : cl_shift.F 
$(CPP) 
$(FC) -FR -lowercase -O2 -c $*$(SUFFIX) 
us.o : us.F 
$(CPP) 
$(FC) -FR -lowercase -O2 -c $*$(SUFFIX) 
LDApU.o : LDApU.F 
$(CPP) 
$(FC) -FR -lowercase -O2 -c $*$(SUFFIX) 

6 posts / novo 0
Último post
Para obter mais informações sobre otimizações de compiladores, consulte Aviso sobre otimizações.
imagem de James Tullos (Intel)

Hi Paul,

What version of the Intel® MPI Library are you using?  If you aren't, please try using 4.1.0.030.

Also, as a future reference, please attach large files rather than pasting the text directly.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

imagem de Ariel B.

Any news on this?

imagem de James Tullos (Intel)

I would make a small update to my recommendation, since we have released 3 updates since then.  Are you encountering the same problem?

I got same problem with VASP compiled by INTEL MPI 4.1.3.045

Hi,

I also faced same problem.

You have to export this value, before submitting the job. Not only VASP, Siesta also will give same issue.

export I_MPI_COMPATIBILITY=4

For me, this error problem resolved.

Thanks,

Vijay Amirtharaj A

Faça login para deixar um comentário.