ILP64 model: using MPI_IN_PLACE in MPI_REDUCE seems to yield wrong results

ILP64 model: using MPI_IN_PLACE in MPI_REDUCE seems to yield wrong results

Bild des Benutzers Stefan K.

hi,

i am using the ifort compiler v. 13.0.1 20121010 together with Intel MPI v.4.1.0.024 on an x86_64 Linux cluster. Using 64-bit integers as default (ILP64 model) in my little Fortran program i obtain wrong results when i use MPI_IN_PLACE in MPI_REDUCE calls (both for integer and real(8)):

my code is as follows:

program test
 include "mpif.h"
! use mpi
 integer :: iraboof
 integer :: mytid, numnod, ierr
 real(8) :: rraboof
 mytid = 0
 ! initialize MPI environment
 call mpi_init(ierr)
 call mpi_comm_rank(mpi_comm_world, mytid,ierr)
 call mpi_comm_size(mpi_comm_world, numnod,ierr)
 iraboof = 1
 if (mytid == 0) then
 call mpi_reduce(MPI_IN_PLACE, iraboof, 1, mpi_integer, mpi_sum, 0, mpi_comm_world, ierr)
 else
 call mpi_reduce(iraboof, 0 , 1, mpi_integer, mpi_sum, 0, mpi_comm_world, ierr)
 end if
 if (mytid == 0) then
 print *, 'raboof mpi reduce', iraboof, numnod
 end if
 rraboof = 1.0d0
 if (mytid == 0) then
 call mpi_reduce(MPI_IN_PLACE, rraboof, 1, mpi_real8 , mpi_sum, 0, mpi_comm_world, ierr)
 else
 call mpi_reduce(rraboof, 0 , 1, mpi_real8 , mpi_sum, 0, mpi_comm_world, ierr)
 end if
 if (mytid == 0) then
 print *, 'raboof mpi reduce', rraboof, numnod
 end if
 call mpi_finalize(ierr)
end program
 

Compilation is done with

mpiifort -O3 -i8 impi.F90

It compiles and links fine

ldd ./a.out
linux-vdso.so.1 => (0x00007ffff7893000)
 libdl.so.2 => /lib64/libdl.so.2 (0x0000003357c00000)
 libmpi_ilp64.so.4 => /global/apps/intel/2013.1/impi/4.1.0.024/intel64/lib/libmpi_ilp64.so.4 (0x00002ad1a4a3f000)
 libmpi.so.4 => /global/apps/intel/2013.1/impi/4.1.0.024/intel64/lib/libmpi.so.4 (0x00002ad1a4c69000)
 libmpigf.so.4 => /global/apps/intel/2013.1/impi/4.1.0.024/intel64/lib/libmpigf.so.4 (0x00002ad1a528e000)
 librt.so.1 => /lib64/librt.so.1 (0x0000003358800000)
 libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003358000000)
 libm.so.6 => /lib64/libm.so.6 (0x0000003357800000)
 libc.so.6 => /lib64/libc.so.6 (0x0000003357400000)
 libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x0000003359c00000)
 /lib64/ld-linux-x86-64.so.2 (0x0000003357000000)

Running the program I however obtain

mpirun -np 4 ./a.out 
 raboof mpi reduce 3 4
 raboof mpi reduce 3.00000000000000 4

whereas it should produce

mpirun -np 4 ./a.out 
raboof mpi reduce 4 4
raboof mpi reduce 4.00000000000000 4

which is what I also obtain with other MPI libraries.

I would appreciate any comment/help. 

with best regards,

stefan

p.s.: when i use the F90-interface ("use mpi") i obtain the following warnings at compile time:

mpiifort -O3 -i8 impi.F90 
impi.F90(9): warning #6075: The data type of the actual argument does not match the definition. [IERR]
 call mpi_init(ierr)
-----------------^
impi.F90(10): warning #6075: The data type of the actual argument does not match the definition. [MYTID]
 call mpi_comm_rank(mpi_comm_world, mytid,ierr)
--------------------------------------^
impi.F90(10): warning #6075: The data type of the actual argument does not match the definition. [IERR]
 call mpi_comm_rank(mpi_comm_world, mytid,ierr)
--------------------------------------------^
impi.F90(11): warning #6075: The data type of the actual argument does not match the definition. [NUMNOD]
 call mpi_comm_size(mpi_comm_world, numnod,ierr)
--------------------------------------^
impi.F90(11): warning #6075: The data type of the actual argument does not match the definition. [IERR]
 call mpi_comm_size(mpi_comm_world, numnod,ierr)
---------------------------------------------^

and a crash at runtime

mpirun -np 4 ./a.out 
Fatal error in PMPI_Reduce: Invalid buffer pointer, error stack:
PMPI_Reduce(1894): MPI_Reduce(sbuf=MPI_IN_PLACE, rbuf=0x693828, count=1, MPI_INTEGER, MPI_SUM, root=0, MPI_COMM_WORLD) failed
PMPI_Reduce(1823): sendbuf cannot be MPI_IN_PLACE
Fatal error in PMPI_Reduce: Invalid buffer pointer, error stack:
PMPI_Reduce(1894): MPI_Reduce(sbuf=MPI_IN_PLACE, rbuf=0x693828, count=1, MPI_INTEGER, MPI_SUM, root=0, MPI_COMM_WORLD) failed
PMPI_Reduce(1823): sendbuf cannot be MPI_IN_PLACE
Fatal error in PMPI_Reduce: Invalid buffer pointer, error stack:
PMPI_Reduce(1894): MPI_Reduce(sbuf=MPI_IN_PLACE, rbuf=0x693828, count=1, MPI_INTEGER, MPI_SUM, root=0, MPI_COMM_WORLD) failed
PMPI_Reduce(1823): sendbuf cannot be MPI_IN_PLACE

12 Beiträge / 0 neu
Letzter Beitrag
Nähere Informationen zur Compiler-Optimierung finden Sie in unserem Optimierungshinweis.
Bild des Benutzers Tim Prince

Your ldd result showing that you linked against the gfortran compatible library looks like a problem.  This shouldn't happen if you use mpiifort consistently.  The gfortran and ifort libraries can't coexist. Adding -# to the mpiifort command should give a lot more detail about what goes into the script which will pass over to ld.

Bild des Benutzers Stefan K.

dear Tim,

thanks for your immediate reply. please find below the output for compiling my program (the one above in the file impi.F90) with your suggested flag:

mpiifort -i8 -# imi.F90

this compilation yields:

mpiifort -i8 -# impi.F90 
/global/hds/home/install/intel/2013.1/composer_xe_2013.1.117/bin/intel64/fpp 
 -D__INTEL_COMPILER=1300 
 -D__unix__ 
 -D__unix 
 -D__linux__ 
 -D__linux 
 -D__gnu_linux__ 
 -Dunix 
 -Dlinux 
 -D__ELF__ 
 -D__x86_64 
 -D__x86_64__ 
 -D_MT 
 -D__INTEL_COMPILER_BUILD_DATE=20121010 
 -D__INTEL_OFFLOAD 
 -D__i686 
 -D__i686__ 
 -D__pentiumpro 
 -D__pentiumpro__ 
 -D__pentium4 
 -D__pentium4__ 
 -D__tune_pentium4__ 
 -D__SSE2__ 
 -D__SSE__ 
 -D__MMX__ 
 -I. 
 -I/global/apps/intel/2013.1/impi/4.1.0.024/intel64/include 
 -I/global/apps/intel/2013.1/impi/4.1.0.024/intel64/include 
 -I/global/apps/intel/2013.1/mkl/include 
 -I/global/apps/intel/2013.1/tbb/include 
 -I/global/hds/home/install/intel/2013.1/composer_xe_2013.1.117/compiler/include/intel64 
 -I/global/hds/home/install/intel/2013.1/composer_xe_2013.1.117/compiler/include 
 -I/usr/local/include 
 -I/usr/lib/gcc/x86_64-redhat-linux/4.4.7/include 
 -I/usr/include 
 -4Ycpp 
 -4Ncvf 
 -f_com=yes 
 impi.F90 
 /tmp/ifortBOT7lB.i90
/global/hds/home/install/intel/2013.1/composer_xe_2013.1.117/bin/intel64/fortcom 
 -D__INTEL_COMPILER=1300 
 -D__unix__ 
 -D__unix 
 -D__linux__ 
 -D__linux 
 -D__gnu_linux__ 
 -Dunix 
 -Dlinux 
 -D__ELF__ 
 -D__x86_64 
 -D__x86_64__ 
 -D_MT 
 -D__INTEL_COMPILER_BUILD_DATE=20121010 
 -D__INTEL_OFFLOAD 
 -D__i686 
 -D__i686__ 
 -D__pentiumpro 
 -D__pentiumpro__ 
 -D__pentium4 
 -D__pentium4__ 
 -D__tune_pentium4__ 
 -D__SSE2__ 
 -D__SSE__ 
 -D__MMX__ 
 -mGLOB_pack_sort_init_list 
 -I. 
 -I/global/apps/intel/2013.1/impi/4.1.0.024/intel64/include 
 -I/global/apps/intel/2013.1/impi/4.1.0.024/intel64/include 
 -I/global/apps/intel/2013.1/mkl/include 
 -I/global/apps/intel/2013.1/tbb/include 
 -I/global/hds/home/install/intel/2013.1/composer_xe_2013.1.117/compiler/include/intel64 
 -I/global/hds/home/install/intel/2013.1/composer_xe_2013.1.117/compiler/include 
 -I/usr/local/include 
 -I/usr/lib/gcc/x86_64-redhat-linux/4.4.7/include 
 -I/usr/include 
 "-integer_size 64" 
 -O2 
 -simd 
 -offload_host 
 -mP1OPT_version=13.0-intel64 
 -mGLOB_diag_file=/tmp/ifort7GVk2e.diag 
 -mGLOB_source_language=GLOB_SOURCE_LANGUAGE_F90 
 -mGLOB_tune_for_fort 
 -mGLOB_use_fort_dope_vector 
 -mP2OPT_static_promotion 
 -mP1OPT_print_version=FALSE 
 -mCG_use_gas_got_workaround=F 
 -mP2OPT_align_option_used=TRUE 
 -mGLOB_gcc_version=447 
 "-mGLOB_options_string=-I/global/apps/intel/2013.1/impi/4.1.0.024/intel64/include -I/global/apps/intel/2013.1/impi/4.1.0.024/intel64/include -ldl -i8 -# -L/global/apps/intel/2013.1/impi/4.1.0.024/intel64/lib -Xlinker --enable-new-dtags -Xlinker -rpath -Xlinker /global/apps/intel/2013.1/impi/4.1.0.024/intel64/lib -Xlinker -rpath -Xlinker /opt/intel/mpi-rt/4.1 -lmpi_ilp64 -lmpi -lmpigf -lmpigi -lrt -lpthread" 
 -mGLOB_cxx_limited_range=FALSE 
 -mCG_extend_parms=FALSE 
 -mGLOB_compiler_bin_directory=/global/hds/home/install/intel/2013.1/composer_xe_2013.1.117/bin/intel64 
 -mGLOB_as_output_backup_file_name=/tmp/ifortK2gIZoas_.s 
 -mIPOPT_activate 
 -mIPOPT_lite 
 -mGLOB_machine_model=GLOB_MACHINE_MODEL_EFI2 
 -mGLOB_product_id_code=0x22006d91 
 -mCG_bnl_movbe=T 
 -mGLOB_extended_instructions=0x8 
 -mP3OPT_use_mspp_call_convention 
 -mP2OPT_subs_out_of_bound=FALSE 
 -mGLOB_ansi_alias 
 -mPGOPTI_value_profile_use=T 
 -mP2OPT_il0_array_sections=TRUE 
 -mP2OPT_offload_unique_var_string=ifort607026576Zo54LN 
 -mP2OPT_hlo_level=2 
 -mP2OPT_hlo 
 -mP2OPT_hpo_rtt_control=0 
 -mIPOPT_args_in_regs=0 
 -mP2OPT_disam_assume_nonstd_intent_in=FALSE 
 -mGLOB_imf_mapping_library=/global/hds/home/install/intel/2013.1/composer_xe_2013.1.117/bin/intel64/libiml_attr.so 
 -mIPOPT_obj_output_file_name=/tmp/ifort7GVk2e.o 
 -mIPOPT_whole_archive_fixup_file_name=/tmp/ifortwarchNyvxkL 
 "-mGLOB_linker_version=2.20.51.0.2-5.36.el6 20100205" 
 -mGLOB_long_size_64 
 -mGLOB_routine_pointer_size_64 
 -mGLOB_driver_tempfile_name=/tmp/iforttempfilenQtt0t 
 -mP3OPT_asm_target=P3OPT_ASM_TARGET_GAS 
 -mGLOB_async_unwind_tables=TRUE 
 -mGLOB_obj_output_file=/tmp/ifort7GVk2e.o 
 -mGLOB_source_dialect=GLOB_SOURCE_DIALECT_FORTRAN 
 -mP1OPT_source_file_name=impi.F90 
 -mP2OPT_symtab_type_copy=true 
 /tmp/ifortBOT7lB.i90
ld 
 /usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../lib64/crt1.o 
 /usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../lib64/crti.o 
 /usr/lib/gcc/x86_64-redhat-linux/4.4.7/crtbegin.o 
 --eh-frame-hdr 
 --build-id 
 -dynamic-linker 
 /lib64/ld-linux-x86-64.so.2 
 -L/global/apps/intel/2013.1/impi/4.1.0.024/intel64/lib 
 -o 
 a.out 
 /global/hds/home/install/intel/2013.1/composer_xe_2013.1.117/compiler/lib/intel64/for_main.o 
 -L/global/apps/intel/2013.1/impi/4.1.0.024/intel64/lib 
 -L/global/apps/intel/2013.1/mkl/lib/intel64 
 -L/global/apps/intel/2013.1/tbb/lib/intel64 
 -L/global/apps/intel/2013.1/ipp/lib/intel64 
 -L/global/apps/intel/2013.1/composerxe/lib/intel64 
 -L/global/hds/home/install/intel/2013.1/composer_xe_2013.1.117/compiler/lib/intel64 
 -L/usr/lib/gcc/x86_64-redhat-linux/4.4.7/ 
 -L/usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../lib64 
 -L/usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../lib64/ 
 -L/lib/../lib64 
 -L/lib/../lib64/ 
 -L/usr/lib/../lib64 
 -L/usr/lib/../lib64/ 
 -L/global/apps/intel/2013.1/impi/4.1.0.024/intel64/lib/ 
 -L/global/apps/intel/2013.1/mkl/lib/intel64/ 
 -L/global/apps/intel/2013.1/tbb/lib/intel64/ 
 -L/global/apps/intel/2013.1/ipp/lib/intel64/ 
 -L/global/apps/intel/2013.1/composerxe/lib/intel64/ 
 -L/usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../ 
 -L/lib64 
 -L/lib/ 
 -L/usr/lib64 
 -L/usr/lib 
 -ldl 
 /tmp/ifort7GVk2e.o 
 --enable-new-dtags 
 -rpath 
 /global/apps/intel/2013.1/impi/4.1.0.024/intel64/lib 
 -rpath 
 /opt/intel/mpi-rt/4.1 
 -lmpi_ilp64 
 -lmpi 
 -lmpigf 
 -lmpigi 
 -lrt 
 -lpthread 
 -Bstatic 
 -lifport 
 -lifcore 
 -limf 
 -lsvml 
 -Bdynamic 
 -lm 
 -Bstatic 
 -lipgo 
 -lirc 
 -Bdynamic 
 -lpthread 
 -Bstatic 
 -lsvml 
 -Bdynamic 
 -lc 
 -lgcc 
 -lgcc_s 
 -Bstatic 
 -lirc_s 
 -Bdynamic 
 -ldl 
 -lc 
 /usr/lib/gcc/x86_64-redhat-linux/4.4.7/crtend.o 
 /usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../lib64/crtn.o
rm /tmp/ifortlibgccyi9h59
rm /tmp/ifortgnudirs06mNow
rm /tmp/ifort7GVk2e.o
rm /tmp/ifortBOT7lB.i90
rm /tmp/ifortakfVFX.c
rm /tmp/ifortdashvdk0IZj
rm /tmp/ifortargC1wikG
rm /tmp/ifortgas65oTE2
rm /tmp/ifortK2gIZoas_.s
rm /tmp/ifortldashv7B4mF7
rm /tmp/iforttempfilenQtt0t
rm /tmp/ifortargvFMClQ
rm /tmp/ifortgnudirsMR2abY
rm /tmp/ifortgnudirsHeROwk
rm /tmp/ifortgnudirsDsnJSG
rm /tmp/ifortldashvJ79Ve3
rm /tmp/ifortgnudirsXiurBp
rm /tmp/ifortgnudirsp3WeYL
rm /tmp/ifortgnudirsmUDkl8
rm /tmp/ifort7GVk2e.o

Bild des Benutzers James Tullos (Intel)

Hi Stefan,

The problem is not related to gfortran.  The libmpigf.so library is used both for gfortran and the Intel® MPI Library.  I am able to get the same behavior here.  I'll check with the developers, but I'm expecting that MPI_IN_PLACE may not be correctly handled in ILP64.

As a note, the MPI Fortran module is not supported for ILP64 programming in the Intel® MPI Library.  Please see Section 3.5.6 of the Intel® MPI Library Reference Manual for more information on ILP64 support.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

Bild des Benutzers Stefan K.

hi James,

thanks for your detailed answer. I am looking forward to hear about the feedback from the developers. a similar part of the MPI-parallelized code above constitutes a central piece in a core functionality of a quantum chemistry program package (called "Dirac") where I am contributing developer. It would be great to know that with one of the next releases IntelMPI with the ILP64 model could then be fully supported. 

with best regards,

stefan

Bild des Benutzers James Tullos (Intel)

Hi Stefan,

Try compiling and running with -ilp64.

mpiifort -ilp64 -O3 test.f90 -o test

mpirun -ilp64 -n 4 ./test

This works for me.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

Bild des Benutzers Stefan K.

hi James,

indeed reduce+MPI_IN_PLACE works with that setup also for me. However, MPI_COMM_SIZE does no longer work:

program test
 include "mpif.h"
 integer :: mytid, numnod, ierr
mytid = 0
 ! initialize MPI environment
 call mpi_init(ierr)
 call mpi_comm_rank(mpi_comm_world, mytid,ierr)
 call mpi_comm_size(mpi_comm_world, numnod,ierr)
print *, 'mytid, numnod ', mytid, numnod
call mpi_finalize(ierr)
end program

Compiling and running the above test program with 

mpiifort -ilp64 -O3 test.F90 
mpirun -ilp64 -np 4 ./a.out 
 mytid, numnod 1 0
 mytid, numnod 0 0
 mytid, numnod 2 0
 mytid, numnod 3 0

yields a "0" for the size of the communicator MPI_COMM_WORLD. 

Any idea what could be wrong?

with best regards,

stefan

Bild des Benutzers James Tullos (Intel)

Hi Stefan,

So I see.  I am able to get the correct results by compiling and linking with -ilp64, but without -i8, and changing the declaration of numnod to integer*8.  Let me check with the developers and see what we can do about this.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

Bild des Benutzers Stefan K.

hi James,

thanks for your feedback, i get exactly the same now as you described above. what i should maybe emphasize is that i was aiming at a working compilation with 64-bit integers as default size (-i8 or -integer-size 64) which somehow implies the ILP64 model as far as i can see. 

What exactly does the

-ilp64
flag set during compilation? obviously, it does not imply 64-bit default integers in the Fortran code as such. does it only enable linking to the ILP64 Intel libraries?

with best regards,

stefan 

Bild des Benutzers James Tullos (Intel)

Hi Stefan,

Using -ilp64 links to libmpi_ilp64 instead of libmpi.  The correct way to utilize this is to compile with -i8, then link and run with -ilp64.  However, this is not giving correct results either.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

Bild des Benutzers Stefan K.

hi James,

thanks for the clarification and your patience. Let's see what the developers can come up with. 

with best regards,

stefan

Bild des Benutzers James Tullos (Intel)

Hi Stefan,

There are two workarounds for this.  The first is to not use MPI_IN_PLACE in a program with -i8.  The second is to modify mpif.h.  Change

       INTEGER MPI_BOTTOM, MPI_IN_PLACE, MPI_UNWEIGHTED

to

       INTEGER*4 MPI_BOTTOM, MPI_IN_PLACE, MPI_UNWEIGHTED

This works for your test program.  Try it on your

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

Melden Sie sich an, um einen Kommentar zu hinterlassen.