One-sided MPI communication never returns in some cases

One-sided MPI communication never returns in some cases

Аватар пользователя rettenbs

Hi,

I tried running the following code on a Linux cluster with Intel MPI (Version 4.0 Update 3 Build 20110824) and slurm 2.2.7 on 2 nodes with 8 cores each (16 tasks).

Unfortunately, it hangs at the MPI_Win_unlock command during the 11th or 12th iteration. I have tried Intel compiler and gcc with no success.

#include 
#include 
#define USE_BARRIER 1

#define LOCAL_RANK 10

#define REMOTE_RANK 3
int main(int argc, char** argv)

{

        int rank, error;

        MPI_Win win;
        double* value;

        double local_value;
        MPI_Init(&argc, &argv);
        MPI_Comm_rank(MPI_COMM_WORLD, &rank);
        error = MPI_Alloc_mem(sizeof(double), MPI_INFO_NULL, &value);

        if (error != MPI_SUCCESS)

                MPI_Abort(MPI_COMM_WORLD, error);
        error = MPI_Win_create(value, sizeof(double), sizeof(double), MPI_INFO_NULL, MPI_COMM_WORLD, &win);

        if (error != MPI_SUCCESS)

                MPI_Abort(MPI_COMM_WORLD, error);
        if (rank == LOCAL_RANK)

                for (int i = 0; i < 25; i++) {

                        std::cout << "Iteration " << i << " in rank " << rank << std::endl;

                        error = MPI_Win_lock(MPI_LOCK_SHARED, REMOTE_RANK, 0, win);

                        if (error != MPI_SUCCESS)

                                MPI_Abort(MPI_COMM_WORLD, error);
                        error = MPI_Get(&local_value, 1, MPI_DOUBLE, REMOTE_RANK, 0, 1, MPI_DOUBLE, win);

                        if (error != MPI_SUCCESS)

                                MPI_Abort(MPI_COMM_WORLD, error);
                        error = MPI_Win_unlock(REMOTE_RANK, win);

                        if (error != MPI_SUCCESS)

                                MPI_Abort(MPI_COMM_WORLD, error);

                }
#ifdef USE_BARRIER

        MPI_Barrier(MPI_COMM_WORLD);

#endif
        MPI_Win_free(&win);
        MPI_Free_mem(value);
        MPI_Finalize();

}
Other MPI libraries work as expected, also other "configurations" work. E.g.:
#define USE_BARRIER 0

#define LOCAL_RANK 10

#define REMOTE_RANK 3
or
#define USE_BARRIER 1

#define LOCAL_RANK 2

#define REMOTE_RANK 3
If you need more information, let me know.

Thanks for your help,
Sebastian

11 сообщений / 0 новое
Последнее сообщение
Пожалуйста, обратитесь к странице Уведомление об оптимизации для более подробной информации относительно производительности и оптимизации в программных продуктах компании Intel.
Аватар пользователя James Tullos (Intel)

Hi Sebastian,

Try adding "-env I_MPI_DEBUG 5" to the mpirun command. This will generate additional debug information and might provide some indication of what is causing the lock. I am able to run the original program you provided without any hangs. I will try some other combinations and see if I can cause the hang.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel Cluster Tools

Аватар пользователя rettenbs

"srun" does not support -env

This is the output of "I_MPI_DEBUG=5 srun ./test"

[-1] MPI startup(): Imported environment partly inaccesible. Map=0 Info=607830

[-1] MPI startup(): Imported environment partly inaccesible. Map=0 Info=607830

[-1] MPI startup(): Imported environment partly inaccesible. Map=0 Info=607830

[-1] MPI startup(): Imported environment partly inaccesible. Map=0 Info=607830

[-1] MPI startup(): Imported environment partly inaccesible. Map=0 Info=607830

[-1] MPI startup(): Imported environment partly inaccesible. Map=0 Info=607830

[-1] MPI startup(): Imported environment partly inaccesible. Map=0 Info=607830

[-1] MPI startup(): Imported environment partly inaccesible. Map=0 Info=607830

[-1] MPI startup(): Imported environment partly inaccesible. Map=0 Info=607830

[-1] MPI startup(): Imported environment partly inaccesible. Map=0 Info=607830

[-1] MPI startup(): Imported environment partly inaccesible. Map=0 Info=607830

[-1] MPI startup(): Imported environment partly inaccesible. Map=0 Info=607830

[-1] MPI startup(): Imported environment partly inaccesible. Map=0 Info=607830

[-1] MPI startup(): Imported environment partly inaccesible. Map=0 Info=607830

[-1] MPI startup(): Imported environment partly inaccesible. Map=0 Info=607830

[-1] MPI startup(): Imported environment partly inaccesible. Map=0 Info=607830

[8] MPI startup(): shm and ofa data transfer modes

[9] MPI startup(): shm and ofa data transfer modes

[2] MPI startup(): shm and ofa data transfer modes

[6] MPI startup(): shm and ofa data transfer modes

[4] MPI startup(): shm and ofa data transfer modes

[5] MPI startup(): shm and ofa data transfer modes

[0] MPI startup(): shm and ofa data transfer modes

[1] MPI startup(): shm and ofa data transfer modes

[3] MPI startup(): shm and ofa data transfer modes

[10] MPI startup(): shm and ofa data transfer modes

[7] MPI startup(): shm and ofa data transfer modes

[14] MPI startup(): shm and ofa data transfer modes

[11] MPI startup(): shm and ofa data transfer modes

[12] MPI startup(): shm and ofa data transfer modes

[13] MPI startup(): shm and ofa data transfer modes

[15] MPI startup(): shm and ofa data transfer modes

[0] MPI startup(): Rank    Pid      Node name  Pin cpu

[0] MPI startup(): 0       22239    r1i0n0     +1

[0] MPI startup(): 1       22240    r1i0n0     +1

[0] MPI startup(): 2       22241    r1i0n0     +1

[0] MPI startup(): 3       22242    r1i0n0     +1

[0] MPI startup(): 4       22243    r1i0n0     +1

[0] MPI startup(): 5       22244    r1i0n0     +1

[0] MPI startup(): 6       22245    r1i0n0     +1

[0] MPI startup(): 7       22246    r1i0n0     +1

[0] MPI startup(): 8       14354    r1i1n0     +1

[0] MPI startup(): 9       14355    r1i1n0     +1

[0] MPI startup(): 10      14356    r1i1n0     +1

[0] MPI startup(): 11      14357    r1i1n0     +1

[0] MPI startup(): 12      14358    r1i1n0     +1

[0] MPI startup(): 13      14359    r1i1n0     +1

[0] MPI startup(): 14      14360    r1i1n0     +1

[0] MPI startup(): 15      14361    r1i1n0     +1

[0] MPI startup(): I_MPI_DEBUG=5

[0] MPI startup(): I_MPI_FABRICS=shm:ofa

Iteration 0 in rank 10

Iteration 1 in rank 10

Iteration 2 in rank 10

Iteration 3 in rank 10

Iteration 4 in rank 10

Iteration 5 in rank 10

Iteration 6 in rank 10

Iteration 7 in rank 10

Iteration 8 in rank 10

Iteration 9 in rank 10

Iteration 10 in rank 10

Iteration 11 in rank 10

Аватар пользователя James Tullos (Intel)

Hi Sebastian,

Are you able to test outside of SLURM? What distribution are you using? Please try these configurations:

#define LOCAL_RANK 11

#define REMOTE_RANK 3
#define LOCAL_RANK 11

#define REMOTE_RANK 4
#define LOCAL_RANK 3

#define REMOTE_RANK 10

Sincerely,
James Tullos
Technical Consulting Engineer
Intel Cluster Tools

Аватар пользователя rettenbs

#define LOCAL_RANK 11

#define REMOTE_RANK 3
Works.

#define LOCAL_RANK 11

#define REMOTE_RANK 4
Hangs.

#define LOCAL_RANK 3

#define REMOTE_RANK 10

Hangs as well.

The distribution is a SUSE Linux Enterprise Server 11.

I wasn't able to run the program outside of SLURM, at least not on this cluster. If you need this information, I can contact the help desk, maybe they know a way how to run the program without SLURM.

Аватар пользователя James Tullos (Intel)

Hi Sebastian,

I'll set up some virtual machines here to replicate your setup. Would you be able to run all of the processes on a single node (technically overdrawing resources, but for this program it shouldn't cause a problem).

For the new two that hang, do they hang atthe same iteration as the original?

Sincerely,
James Tullos
Technical Consulting Engineer
Intel Cluster Tools

Аватар пользователя rettenbs

Adding the option "--ntasks-per-core=2" (which means that all of the 16 tasks run on one node) solves the problem, too.

And yes, they all hang in the same iteration.

Аватар пользователя James Tullos (Intel)

Hi Sebastian,

It definitely appears to be related to having the tasks involved in the communication on different nodes. Are you able to reliably run other MPI programs involving these two nodes? Have you tried using a different fabric for your connection? What is the output from "env | grep I_MPI"?

Sincerely,
James Tullos
Technical Consulting Engineer
Intel Cluster Tools

Аватар пользователя rettenbs

"I_MPI_FABRICS=shm:ofa" is the default. However, I'm unable to reproduce the error when farbic is set to "(shm:)dapl" or "(shm:)tcp". (tmi does not work at all)

Output of "env | grep I_MPI"

I_MPI_FABRICS=shm:ofa

I_MPI_PMI_LIBRARY=/usr/lib64/libpmi.so

I_MPI_JOB_FAST_STARTUP=0

I_MPI_HOSTFILE=/tmp/di56zem/mpd.hosts.11693

I_MPI_ROOT=/lrz/sys/intel/mpi_40_3_00
I haven't tried any other MPI programs, but according to the service provider, the Intel MPI library should work.

Аватар пользователя James Tullos (Intel)

Hi Sebastian,

I have been able to reproduce the error you are receiving by matching the fabric. I'm going to do some more modifications to your code to see if I can get a more general reproducer, and I'll be submitting a defect report for this.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel Cluster Tools

Аватар пользователя rettenbs

Thanks a lot.

Would be nice, if you can post an update here as soon as this gets fixed in the latest release.

Зарегистрируйтесь, чтобы оставить комментарий.