INTEL-MPI-5.0: Bug in MPI-3 shared-memory allocation (MPI_WIN_ALLOCATE_SHARED, MPI_WIN_SHARED_QUERY)

INTEL-MPI-5.0: Bug in MPI-3 shared-memory allocation (MPI_WIN_ALLOCATE_SHARED, MPI_WIN_SHARED_QUERY)

Dear developers of Intel-MPI,

First of all:   Congratulations, that INTEL-MPI now supports also MPI-3 !

However, I found a bug  in INTEL-MPI-5.0 when running the MPI-3 shared memory feature (calling MPI_WIN_ALLOCATE_SHARED, MPI_WIN_SHARED_QUERY) on a Linux Cluster (NEC Nehalem)  by a  Fortran95 CFD-code.

I isolated the problem into a small Ftn95 example program, which allocates shared an integer*4-array of array dimension N , then uses it by the MPI-processes (on the same node), and then repeats the same for the next shared allocation. So, the number of shared windows do accumulate in the run, because I do not free the shared windows allocated so far. This allocation of shared windows works, but only until the total number of allocated memory exceeds a limit of ~30 millions of Integer*4 numbers (~120 MB).

When that limit is reached, the next call of MPI_WIN_ALLOCATE_SHARED, MPI_WIN_SHARED_QUERY  to allocated one more shared window do not give an error message, but the 1st attempt to use that allocated shared array results in a bus error (because the shared array has not been allocated correctly).

 

The problem is independent of the number of MPI-processes started by mpirun on the node (I used only 1 node)

   Example:     N=      100 000   à  bus error occurred at iwin=288   (i.e. the allocation of the 288-th shared window had failed)

                         N=   1 000 000   à  bus error occurred at iwin=  30

                         N=   5 000 000   à  bus error occurred at iwin=    6

                         N= 28 000 000   à  bus error occurred at iwin=    2

                         N= 30 000 000   à  bus error occurred at iwin=    1   (i.e. already the 1st allocation failed)

 

The node on the cluster has 8 Nehalem cores, and had a free memory of 10 GB, and I was the only user on it. I used the INTEL-13 and also the INTEL-14 compiler for compiling the example program.

       mpiifort -O0 -debug -traceback -check -fpe0 sharedmemtest.f90

       mpirun -binding -prepend-rank -ordered-output -np 4 ./a.out

If it is helpful for you, I could send you the source code of the program.

It seems to me, that there is an internal storage limitation in the implementation of the MPI-3 shared memory feature in INTEL-MPI-5.0 . I cannot use INTEL-MPI in my real CFD-code with that limitation, because in case of very large grids the total storage allocated simultaneously by the shared windows can exceed 10 GB.

Greetings to you all

 Michael R.

 

 

 

 

 

 

 

 

2 帖子 / 0 全新
最新文章
如需更全面地了解编译器优化,请参阅优化注意事项

This is a known bug at this time, and we are working to correct it.

发表评论

登录添加评论。还不是成员?立即加入