INTEL-MPI-5.0: Bug in MPI-3 shared-memory allocation (MPI_WIN_ALLOCATE_SHARED, MPI_WIN_SHARED_QUERY)

INTEL-MPI-5.0: Bug in MPI-3 shared-memory allocation (MPI_WIN_ALLOCATE_SHARED, MPI_WIN_SHARED_QUERY)

Dear developers of Intel-MPI,

First of all:   Congratulations, that INTEL-MPI now supports also MPI-3 !

However, I found a bug  in INTEL-MPI-5.0 when running the MPI-3 shared memory feature (calling MPI_WIN_ALLOCATE_SHARED, MPI_WIN_SHARED_QUERY) on a Linux Cluster (NEC Nehalem)  by a  Fortran95 CFD-code.

I isolated the problem into a small Ftn95 example program, which allocates shared an integer*4-array of array dimension N , then uses it by the MPI-processes (on the same node), and then repeats the same for the next shared allocation. So, the number of shared windows do accumulate in the run, because I do not free the shared windows allocated so far. This allocation of shared windows works, but only until the total number of allocated memory exceeds a limit of ~30 millions of Integer*4 numbers (~120 MB).

When that limit is reached, the next call of MPI_WIN_ALLOCATE_SHARED, MPI_WIN_SHARED_QUERY  to allocated one more shared window do not give an error message, but the 1st attempt to use that allocated shared array results in a bus error (because the shared array has not been allocated correctly).

 

The problem is independent of the number of MPI-processes started by mpirun on the node (I used only 1 node)

   Example:     N=      100 000   à  bus error occurred at iwin=288   (i.e. the allocation of the 288-th shared window had failed)

                         N=   1 000 000   à  bus error occurred at iwin=  30

                         N=   5 000 000   à  bus error occurred at iwin=    6

                         N= 28 000 000   à  bus error occurred at iwin=    2

                         N= 30 000 000   à  bus error occurred at iwin=    1   (i.e. already the 1st allocation failed)

 

The node on the cluster has 8 Nehalem cores, and had a free memory of 10 GB, and I was the only user on it. I used the INTEL-13 and also the INTEL-14 compiler for compiling the example program.

       mpiifort -O0 -debug -traceback -check -fpe0 sharedmemtest.f90

       mpirun -binding -prepend-rank -ordered-output -np 4 ./a.out

If it is helpful for you, I could send you the source code of the program.

It seems to me, that there is an internal storage limitation in the implementation of the MPI-3 shared memory feature in INTEL-MPI-5.0 . I cannot use INTEL-MPI in my real CFD-code with that limitation, because in case of very large grids the total storage allocated simultaneously by the shared windows can exceed 10 GB.

Greetings to you all

 Michael R.

 

 

 

 

 

 

 

 

publicaciones de 2 / 0 nuevos
Último envío
Para obtener más información sobre las optimizaciones del compilador, consulte el aviso sobre la optimización.

This is a known bug at this time, and we are working to correct it.

Deje un comentario

Por favor inicie sesión para agregar un comentario. ¿No es socio? Únase ya