MPI Internal Error: invalid error code 489e0e (Ring ids do not match)

MPI Internal Error: invalid error code 489e0e (Ring ids do not match)

kellerd@lle.rochester.edu的头像

I have an MPI code that works fine on my windows machine vs2010.  It has one master process that has MPI_COMM_ACCEPT ed a connection to another job that is running two MPI procs.  This setup also works when I have the process running on my intel cluster node as long as it is only a one process job that has been accepted.  But when I try two I get the message:

 

Internal Error: invalid error code 489e0e (Ring ids do not match) in MPIR_Barrier_impl:712 Fatal error in PMPI_Barrier: Other MPI error, error stack: PMPI_Barrier(949).....: MPI_Barrier(comm=0x84000000) failed MPIR_Barrier_impl(720): Failure during collective MPIR_Barrier_impl(712):

I note that there have some complaints of 'Ring ids do not match' for the latest mphich2 release.

Any help would be appreciated.

 

I am running Intel13 level of software.  Is is a Fortran code

ifoMPI_INCLUDE=/opt/lic/intel13/impi/4.1.0.024/include64 LIBRARY_PATH=/opt/lic/intel13/impi/4.1.0.024/lib64:/opt/lic/intel13/composer_xe_2013.1.117/tbb/lib/intel64:/opt/lic/intel13/composer_xe_2013.1.117/mkl/lib/intel64:/opt/lic/intel13/

composer_xe_2013.1.117/ipp/lib/intel64:/opt/lic/intel13/composer_xe_2013.1.117/compiler/lib/intel64 rt version 13.0.1

 

I also have since tried using MPICH-3.0.2 with the same results.

Any ideas out there?

Dave

 

 

 

2 帖子 / 0 new
最新文章
如需更全面地了解编译器优化,请参阅优化注意事项
James Tullos (Intel)的头像

Hi Dave,

I don't think this will solve the problem, but try running both jobs with I_MPI_ADJUST_BARRIER=1.  It is possible that the jobs are selecting different algorithms.

Do you have a reproducer you can share?

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

登陆并发表评论。