MPI_Allreduce strange result

MPI_Allreduce strange result

Bild des Benutzers FortCpp

Hi users and developers,

I am having a stringe result come from MPI_Allreduce subroutine in fortran. Here is the thing:

ALLOCATE(mesh%idx%lxyz_inv(nr(1, 1):nr(2, 1), nr(1, 2):nr(2, 2), nr(1, 3):nr(2, 3)))
mesh%idx%lxyz_inv(:,:,:) = 0
!In a subroutine, An array was first allocated and initialized. nr(1, ?):nr(2, ?) are all -36:36 for the test run
!...
!...
npoints = product(nr(2, 1:3) - nr(1, 1:3) + 1)
call MPI_Allreduce(MPI_IN_PLACE, mesh%idx%lxyz_inv(nr(1, 1), nr(1, 2), nr(1, 3)), npoints, MPI_INTEGER, MPI_BOR, mpi_world%comm, mpi_err)

There were something wrong here at run time (I just ran it by typing a.exe. Not a parallel run). Before MPI_Allreduce, the first 48 elements of mesh%idx%lxyz_inv were all zero. But after the call was made, the first 48 elements of mesh%idx%lxyz_inv became:

0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 1140850688 1 0
0 0 0 0 0 32768
64 4194304 64 896 384 0
-2 37010544 0 0 0 0
0 0 0 0 1078984704 0
0 1 0 1 0 3899248

And I noticed that the first strange number 1140850688 is the value of MPI_IN_PLACE defined in mpif.h. Other than that I have completely no clue of it.

I cannot simplify it to a small test program since when I test MPI_Allreduce with the same size array in a simple code, it works fine.

I am using intel Cluster studio 13 with everyting updated and intel MPI. I am trying to compile a 64 bit program so I used the header and libray in $(I_MPI_ROOT)em64t. I don't know whether that's the problem of this strange result. The compile is fine except a lot of "warning LNK4049: locally defined symbol mpifcmb5_ imported", "warning LNK4049: locally defined symbol mpipriv1_ imported", "warning LNK4049: locally defined symbol mpipriv2_ imported" and "warning LNK4049: locally defined symbol mpiprivc_ imported" when linking. OS is win 7 64bit.

Any comment will be appreciated! I can attach my project if it is necessary. Thanks.

12 Beiträge / 0 neu
Letzter Beitrag
Nähere Informationen zur Compiler-Optimierung finden Sie in unserem Optimierungshinweis.
Bild des Benutzers James Tullos (Intel)

Hi,

You should not be seeing the linker errors.  The easiest way to solve this is if you can send the project.  Either attach it to a post, or if you'd prefer, you can send it directly to me.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

Bild des Benutzers FortCpp

James,

Thanks for your response. I sent you a message with my solution attached.

Bild des Benutzers James Tullos (Intel)

Hi,

I've got the Visual Studio* solution you sent.  What version of Visual Studio* are you using?  I'm using 2010, and thus far I'm up to 2 hours waiting for the solution to load.  If you have the Intel® Trace Analyzer and Collector, try linking with VTmc.lib.  This is the Correctness Checker library, and will verify that the MPI calls are correct.  It will slow the application significantly, so I don't recommend using it outside of development.  I'll try this myself once I can get the solution loaded.  Have you tried this on Linux*?

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

Bild des Benutzers FortCpp

Hi James,

I am using the same version, MSVS 2010. There seems to be a small problem somewhere in the IVF. I post it here:http://software.intel.com/en-us/forums/topic/368495

There is a quick fix by diabling the Database of IVF. Then everything should be fine. Could you please try it again after disabling the Database?

I never used Trace Analyzer. I'll learn to use it. Not sure how far I can go, so I am still hoping that you can help me out.

Thanks,

Yonghui

Bild des Benutzers James Tullos (Intel)

Hi Yonghui,

Thank you for the information, that does help load it more quickly.  I'll try linking with the Correctness Checker and see if I can find anything.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

Bild des Benutzers FortCpp

Hi James,

Did you see what is the problem in the code? I tried to remove it from the source and test it with single CPU. It turned out to be OK.

Other than that I don't have any new discovery.

Thanks,

Yonghui

Bild des Benutzers James Tullos (Intel)

Hi Yonghui,

I have not been able to find anything yet.  I'm going to try to get a smaller reproducer together and have our developers look at it.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

Bild des Benutzers FortCpp

OK. Thanks a lot.

I am looking forward for the result.

Yonghui

Bild des Benutzers James Tullos (Intel)

Hi Yonghui,

The developers have found and corrected the problem.  The fix will be available in our next release.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

Bild des Benutzers FortCpp

Great! It was the Fortran wrapper problem or the MPI library problem? Just curious.

Yonghui

Bild des Benutzers James Tullos (Intel)

Hi Yonghui,

The problem was due to an incorrectly exported symbol from MPI.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

Melden Sie sich an, um einen Kommentar zu hinterlassen.