MPI DAPL error in Intel Xeon Phi

MPI DAPL error in Intel Xeon Phi

I am trying to use MPI in multiple MICs, and I get the DAPL error, The following is the info that I enable the MPI_DEBUG=5

wwu12:lips ~/work/mic/mpitest> mpirun -genv I_MPI_DAPL_PROVIDER_LIST=ofa-v2-scif0 -env I_MPI_DEBUG=5 -env I_MPI_MIC=enable -hostfile mpi_host -perhost 1 -n 2 /tmp/test.mic
Max MV2_DEFAULT_MAX_SG_LIST is 0, set to 1
Max MV2_SRQ_SIZE is 0, set to 512
Max MV2_DEFAULT_MAX_SG_LIST is 0, set to 1
Max MV2_SRQ_SIZE is 0, set to 512
[0] DAPL startup(): trying to open DAPL provider from I_MPI_DAPL_PROVIDER: ofa-v2-scif0
[1] DAPL startup(): trying to open DAPL provider from I_MPI_DAPL_PROVIDER: ofa-v2-scif0
[1] MPI startup(): DAPL provider ofa-v2-scif0
[0] MPI startup(): DAPL provider ofa-v2-scif0
[1] MPI startup(): dapl data transfer mode
[0] MPI startup(): dapl data transfer mode
[0:mic0] unexpected DAPL event 0x4003
Assertion failed in file ../../dapl_init_rc.c at line 1337: 0

There is no error when I run the program in a single MIC or in host and 1 MIC. Anyone know where the problem is?

3 Beiträge / 0 neu
Letzter Beitrag
Nähere Informationen zur Compiler-Optimierung finden Sie in unserem Optimierungshinweis.

Hi Wei,

Let's check a few basics.  Make certain that each coprocessor has a unique name and IP address on the network.  Ensure that you can connect, via SSH, from one coprocessor to another.  What version of MPSS are you using, and is it the same on every coprocessor?

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

Quote:

James Tullos (Intel) wrote:

Hi Wei,

Let's check a few basics.  Make certain that each coprocessor has a unique name and IP address on the network.  Ensure that you can connect, via SSH, from one coprocessor to another.  What version of MPSS are you using, and is it the same on every coprocessor?

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

Thanks very much. I figured out I am not able to ssh from one coprocessor to another. I will contact my machine administrator to report this problem. 

Kommentar hinterlassen

Bitte anmelden, um einen Kommentar hinzuzufügen. Sie sind noch nicht Mitglied? Jetzt teilnehmen