MPI_SCATTERV broken on upgrade?

MPI_SCATTERV broken on upgrade?

Bild des Benutzers chasmotron

I bought/installed cluster studio for Windows back in early 2012. I am just now compiling our MPI Fortran code in parallel on Windows. I had it compiled and running, I successfully ran hundreds of check-out problems.

I wanted to see how easy it would be to allow our users to run in parallel on Windows. I downloaded the MPI runtime, installed it on another machine, brought my executable over, and it crashed on a SCATTERV call-- "memcpy arguments alias each other".

I thought, "Oh, I need to update my cluster studio to be up-to-date with this MPI runtime I just installed." So I updated my cluster studio, recompiled the exact same code that was running, and it now crashes with the same SCATTERV complaint. I tracked it down to a SCATTERV call that uses the same sendbuf and recvbuf, and altered it so that it would use the MPI_IN_PLACE on the root processor. (This already makes me nervous for portability, because not all MPI libraries support this yet.)

Now it crashes on a later SCATTERV, complaining that one of the sendcounts is zero. I have no idea why this is a problem.

Is there some compiler/runtime flag that I need to set to make MPI a little less strict?

Why did the update break what worked?

-Charles

11 Beiträge / 0 neu
Letzter Beitrag
Nähere Informationen zur Compiler-Optimierung finden Sie in unserem Optimierungshinweis.
Bild des Benutzers James Tullos (Intel)

Hi Charles,

Can you provide a sample code showing this behavior?

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

Bild des Benutzers chasmotron

I discovered that I could go back and download the 4.0.3.010 version of the MPI runtime, and if I installed that on my box and the users box, my MPI code worked again. Installing either the 4.1.0.023 or the 4.1.0.028 breaks it.

I wrote up a quick test case with what I assumed would be the primary features that seemed to be going wrong, but I have not been able to get the test case to run in any environment, and since I can get my primary code to work, I'm going to drop it.

I am including my test case, but I'm not really expecting you to tackle it. It is designed to be compiled with /real_size:64 and run on 4 procs. The first SCATTERV uses the same array as the sendbuf and recvbuf, and the second SCATTERV has a 0 in the sendcounts. Those appeared to be the complaints that the 4.1.* runtimes had against my code, but again, I have not gotten the test case to run at all.

Anlagen: 

AnhangGröße
Herunterladen scattest.f901.28 KB
Bild des Benutzers James Tullos (Intel)

Hi Charles,

I tested the reproducer you provided.  At first, I received an error of Invalid Communicator.  Adding include 'mpif.h' in the subroutine corrected this.  Once this was done, the program appears to run correctly.  Please try this change and see if it runs correctly for you as well.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

Bild des Benutzers chasmotron

James,

I added the mpif.h and confirmed that it worked on my linux box, using both portland group with MPICH and g95 with open mpi. I then transferred it back to Windows, compiled with IVF, linking to the 4.0.3.009 libraries, ran it with the 4.0.3 mpiexec (not sure if mpiexec is from 009 or 010), and I get the "memcpy arguments alias each other" message. If I comment out the first of the two PARALLEL_SCATTERV calls, it runs.

After further experimentation, the Release version runs correctly with both scatters. Only the debug version dies on the first scatter.

-Charles

Bild des Benutzers James Tullos (Intel)

Hi Charles,

I'm not getting that error with 4.0.3 or with 4.1.0.030.  What version of Intel® Visual Fortran are you using (I tested with13.1.1.171)?  What version of SMPD is installed on the system?  Are you passing any arguments (other than /real_size:64 and /debug) at compile or link time?  Do you have any I_MPI_* environment variables set?

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

Bild des Benutzers chasmotron

James,

I was having trouble compiling with the command line, so I have now completely uninstalled the 4.0.3.* MPI Runtimes, the 4.1.0.023 MPI Runtime, and I re-installed the 4.1.0.28 MPI library. I rebooted, and I figured out how to copy and paste from the command line window.

First I compiled and ran test.f90 from the Intel MPI installation. Then I compiled and failed to run scattest.f90 that you already have. SMPD version 4.1, I have not set any environment variables knowingly. Copied output below:

Intel(R) Parallel Studio XE 2013
Copyright (C) 1985-2013 Intel Corporation. All rights reserved.
Intel(R) Composer XE 2013 Update 3 (package 171)
Setting environment for using Microsoft Visual Studio Shell 2010 x64 tools.

C:\Epic\Epic\shortform_examples>mpif90 test.f90
mpifc.bat for the Intel(R) MPI Library 4.1 for Windows*
Copyright(C) 2007-2012, Intel Corporation. All rights reserved.

Intel(R) Visual Fortran Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 13.1.1.171 Build 20130313
Copyright (C) 1985-2013 Intel Corporation.  All rights reserved.

Microsoft (R) Incremental Linker Version 10.00.30319.01
Copyright (C) Microsoft Corporation.  All rights reserved.

-out:test.exe
-subsystem:console
"/LIBPATH:C:\Program Files (x86)\Intel\MPI\4.1.0.028\\em64t\lib"
impi.lib
test.obj
C:\Epic\Epic\shortform_examples>mpiexec -n 4 ./test.exe
 Hello world: rank            0  of            4  running on
 cgerlach7.div18.swri.edu

 Hello world: rank            1  of            4  running on
 cgerlach7.div18.swri.edu

 Hello world: rank            2  of            4  running on
 cgerlach7.div18.swri.edu

 Hello world: rank            3  of            4  running on
 cgerlach7.div18.swri.edu

C:\Epic\Epic\shortform_examples>mpif90 scattest.f90
mpifc.bat for the Intel(R) MPI Library 4.1 for Windows*
Copyright(C) 2007-2012, Intel Corporation. All rights reserved.

Intel(R) Visual Fortran Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 13.1.1.171 Build 20130313
Copyright (C) 1985-2013 Intel Corporation.  All rights reserved.

Microsoft (R) Incremental Linker Version 10.00.30319.01
Copyright (C) Microsoft Corporation.  All rights reserved.

-out:scattest.exe
-subsystem:console
"/LIBPATH:C:\Program Files (x86)\Intel\MPI\4.1.0.028\\em64t\lib"
impi.lib
scattest.obj

C:\Epic\Epic\shortform_examples>mpiexec -n 4 ./scattest.exe
Fatal error in PMPI_Scatterv: Internal MPI error!, error stack:
PMPI_Scatterv(688)........: MPI_Scatterv(sbuf=000000013F23DF20, scnts=000000013F23DF50, displs=000000013F23DF60, MPI_DOUBLE_PRECISION, rbuf=000000013F23DF20, rcount=1, MPI_DOUBLE_PRECISION, root=0, MPI_COMM_WORLD) failed
MPIR_Scatterv_impl(212)...:
I_MPIR_Scatterv_intra(300):
MPIR_Scatterv(112)........:
MPIR_Localcopy(381).......: memcpy arguments alias each other, dst=000000013F23DF20 src=000000013F23DF20 len=8

job aborted:
rank: node: exit code[: error message]
0: cgerlach7.div18.swri.edu: 1: process 0 exited without calling finalize
1: cgerlach7.div18.swri.edu: 123
2: cgerlach7.div18.swri.edu: 123
3: cgerlach7.div18.swri.edu: 123

C:\Epic\Epic\shortform_examples>

Bild des Benutzers James Tullos (Intel)

Hi Charles,

To check the environment variables, use

set I_MPI

This will list all of the I_MPI* environment variables, which are the ones that should matter here.

Now, on to the error.  You mentioned earlier that the program needs to be compiled with /real_size:64 and /debug.  You did not include those when you compiled.  Compiling without /real_size:64 leads to very wrong output with this program.

Also, you had switched the first PARALLEL_SCATTERV call to use MPI_IN_PLACE, which corrects the error you are seeing here.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

Bild des Benutzers chasmotron

James,

You are of course correct that I failed to compile with /real_size:64. I apologize for my sloppiness. It does not make a difference, however. Compiling with /real_size:64 from the command line, and then running scattest with the 4.1.0.028 MPI run-time dies, as demonstrated below. The run also dies whether or not the /debug flag is added to the command-line compile.

You are also correct to point out that scattest will run with the 4.1.0.028 MPI run-time if I switch to the MPI_IN_PLACE usage. However, scattest is a proxy for my real code, which was written before the MPI 2.0 standard was adopted, and MPI_IN_PLACE became a possibility. I do not know if all of my users on the many platforms we support have compilers that support MPI_IN_PLACE yet.

I have now re-installed the 4.0.3.010 MPI run-time, and at the bottom of this message I demonstrate that scattest runs correctly with it.

Thus, I get back to my original questions:

Why did this get broken with the upgrade to 4.1.*?

Is there any run-time or compile-time flag I can use with the 4.1* libraries and run-times to make it work like the 4.0.3* libraries and run-times?

Thank you for your help.

!!!!!!!!!!!!!!!!!!!******************************Demonstration that the 4.1.0.028 MPI run-time fails

c:\Epic\Epic\shortform_examples>set I_MPI

I_MPI_ROOT=C:\Program Files (x86)\Intel\MPI\4.1.0.028\

 

c:\Epic\Epic\shortform_examples>mpif90 /real_size:64 scattest.f90

mpifc.bat for the Intel(R) MPI Library 4.1 for Windows*

Copyright(C) 2007-2012, Intel Corporation. All rights reserved.

 

Intel(R) Visual Fortran Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 13.1.1.171 Build 20130313

Copyright (C) 1985-2013 Intel Corporation.  All rights reserved.

 

Microsoft (R) Incremental Linker Version 10.00.30319.01

Copyright (C) Microsoft Corporation.  All rights reserved.

 

-out:scattest.exe

-subsystem:console

"/LIBPATH:C:\Program Files (x86)\Intel\MPI\4.1.0.028\\em64t\lib"

impi.lib

scattest.obj

c:\Epic\Epic\shortform_examples>mpiexec -V

Intel(R) MPI Library for Windows* OS, Version 4.1 Build 12/10/2012 3:54:38 PM

Copyright (C) 2007-2012, Intel Corporation. All rights reserved.

 

c:\Epic\Epic\shortform_examples>mpiexec -n 4 ./scattest.exe

Fatal error in PMPI_Scatterv: Internal MPI error!, error stack:

PMPI_Scatterv(688)........: MPI_Scatterv(sbuf=000000013F49DF20, scnts=000000013F49DFD0, displs=000000013F49DFE0, MPI_DOUBLE_PRECISION, rbuf=000000013F49DF20, rcount=1, MPI_DOUBLE_PRECISION, root=0, MPI_COMM_WORLD) failed

MPIR_Scatterv_impl(212)...:

I_MPIR_Scatterv_intra(300):

MPIR_Scatterv(112)........:

MPIR_Localcopy(381).......: memcpy arguments alias each other, dst=000000013F49DF20 src=000000013F49DF20 len=8

 

job aborted:

rank: node: exit code[: error message]

0: cgerlach7.div18.swri.edu: 1: process 0 exited without calling finalize

1: cgerlach7.div18.swri.edu: 123

2: cgerlach7.div18.swri.edu: 123

3: cgerlach7.div18.swri.edu: 123

!!!!!!!!!!!!!!!!!!!*******************************Demonstration that the 4.0.3.010 runtime succeeds:

C:\Epic\Epic\shortform_examples>set I_MPI

I_MPI_ROOT=C:\Program Files (x86)\Intel\MPI-RT\4.0.3.010\em64t\bin\..\..

C:\Epic\Epic\shortform_examples>mpiexec -V

 

Intel(R) MPI Library for Windows* OS, Version 4.0 Update 3 Build 8/24/2011 3:07:12 PM

Copyright (C) 2007-2011, Intel Corporation. All rights reserved.

 

C:\Epic\Epic\shortform_examples>mpiexec -n 4 scattest.exe

 RANK:            1  SNDBUF    2.00000000000000        3.00000000000000  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000

 RANK:            1  RCVBUF    1.00000000000000        2.00000000000000  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000

 RANK:            0  SNDBUF    1.00000000000000        2.00000000000000   3.00000000000000        4.00000000000000        5.00000000000000   6.00000000000000        7.00000000000000        8.00000000000000   9.00000000000000        10.0000000000000

 RANK:            0  RCVBUF   0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000

 RANK:            3  SNDBUF    7.00000000000000        8.00000000000000   9.00000000000000        10.0000000000000       0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000

 RANK:            3  RCVBUF    6.00000000000000        7.00000000000000   8.00000000000000        9.00000000000000        10.0000000000000  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000

 RANK:            2  SNDBUF    4.00000000000000        5.00000000000000   6.00000000000000       0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000

 RANK:            2  RCVBUF    3.00000000000000        4.00000000000000   5.00000000000000       0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000

Bild des Benutzers James Tullos (Intel)

Hi Charles,

I'm checking with our developers.

James.

Bild des Benutzers James Tullos (Intel)

Hi Charles,

This is actually the expected behavior.  The MPI 2.2 standard prohibits buffer aliasing.  From chapter 2.3:

"Unless specied otherwise, an argument of type OUT or type INOUT cannot be aliased with any other argument passed to an MPI procedure."

If you want to restore buffer aliasing, please set I_MPI_COMPATIBILITY to 3 or 4.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

Melden Sie sich an, um einen Kommentar zu hinterlassen.