Symmetric MPI run

Symmetric MPI run

Hello,

I'm trying to run an mpi symmetric model on the host and mic architecture. Everything works fine as long as the total number of processes (mic+host) is less 10. But when it's greater than 10, I get the attached error.

This is my mpirun command:

mpirun -v -host p-linux -check-mpi -env I_MPI_DEBUG 5 -genv I_MPI_FABRICS shm:tcp -np 1 <host executable> : -host mic0 -iface mic0 -env I_MPI_DEBUG 5 -env LD_LIBRARY_PATH=/opt/intel/composer_xe_2013.5.192/compiler/lib/mic:/opt/intel/mic/coi/device-linux-release/lib:/opt/intel/mic/myo/lib:/opt/intel/mic/coi/device-linux-release/lib:/opt/intel/mic/myo/lib:/opt/intel/composer_xe_2013.5.192/compiler/lib/mic:/opt/intel/composer_xe_2013.5.192/mkl/lib/mic:/opt/intel/composer_xe_2013.5.192/tbb/lib/mic  -np 11  <mic executable>

Here the total number of processes is 11+1=12 which is greater than 10 and so I get the error. If it's less than 10 the program executes correctly.

I noticed that in the bottom part of the above error message, i.e. :

[proxy:0:1@p-linux-mic0] got pmi command (from 30): put
kvsname=kvs_9321_0 key=P10-businesscard-0 value=description#

there is no $port ... $ifname after the "value=description#" part, which is not the case for other processes.

Thanks

 

 

 

 

 

 

AttachmentSize
Downloadtext/plain MPI_error.txt2.73 KB
6 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Just for an update, I ran the mpi test case, and got exactly the same problem.

 

mpirun -host p-linux -env I_MPI_DEBUG 5 -genv I_MPI_FABRICS shm:tcp -np 1 -check_mpi ./a.host : -host mic0 -iface mic0 -env I_MPI_DEBUG 5 -np 10  ./a.mic

ERROR: ld.so: object 'libVTmc.so' from LD_PRELOAD cannot be preloaded: ignored.
ERROR: ld.so: object 'libVTmc.so' from LD_PRELOAD cannot be preloaded: ignored.
ERROR: ld.so: object 'libVTmc.so' from LD_PRELOAD cannot be preloaded: ignored.
ERROR: ld.so: object 'libVTmc.so' from LD_PRELOAD cannot be preloaded: ignored.
ERROR: ld.so: object 'libVTmc.so' from LD_PRELOAD cannot be preloaded: ignored.
ERROR: ld.so: object 'libVTmc.so' from LD_PRELOAD cannot be preloaded: ignored.
ERROR: ld.so: object 'libVTmc.so' from LD_PRELOAD cannot be preloaded: ignored.
ERROR: ld.so: object 'libVTmc.so' from LD_PRELOAD cannot be preloaded: ignored.
ERROR: ld.so: object 'libVTmc.so' from LD_PRELOAD cannot be preloaded: ignored.
ERROR: ld.so: object 'libVTmc.so' from LD_PRELOAD cannot be preloaded: ignored.
ERROR: ld.so: object 'libVTmc.so' from LD_PRELOAD cannot be preloaded: ignored.
[0] MPI startup(): shm and tcp data transfer modes
[6] MPI startup(): shm and tcp data transfer modes
[5] MPI startup(): shm and tcp data transfer modes
[1] MPI startup(): shm and tcp data transfer modes
[4] MPI startup(): shm and tcp data transfer modes
[7] MPI startup(): shm and tcp data transfer modes
[10] MPI startup(): shm and tcp data transfer modes
[2] MPI startup(): shm and tcp data transfer modes
[3] MPI startup(): shm and tcp data transfer modes
[8] MPI startup(): shm and tcp data transfer modes
[9] MPI startup(): shm and tcp data transfer modes
[mpiexec@p-linux] handle_pmi_cmd (./pm/pmiserv/pmiserv_cb.c:78): Unrecognized PMI command: k | cleaning up processes
[mpiexec@p-linux] control_cb (./pm/pmiserv/pmiserv_cb.c:868): unable to process PMI command
[mpiexec@p-linux] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[mpiexec@p-linux] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:435): error waiting for event
[mpiexec@p-linux] main (./ui/mpich/mpiexec.c:901): process manager error waiting for completion

But if I run if for fewer processes, I get the correct result:

 

mpirun -host p-linux -env I_MPI_DEBUG 5 -genv I_MPI_FABRICS shm:tcp -np 1 -check_mpi ./a.host : -host mic0 -iface mic0 -env I_MPI_DEBUG 5 -np 5  ./a.mic
ERROR: ld.so: object 'libVTmc.so' from LD_PRELOAD cannot be preloaded: ignored.
ERROR: ld.so: object 'libVTmc.so' from LD_PRELOAD cannot be preloaded: ignored.
ERROR: ld.so: object 'libVTmc.so' from LD_PRELOAD cannot be preloaded: ignored.
ERROR: ld.so: object 'libVTmc.so' from LD_PRELOAD cannot be preloaded: ignored.
ERROR: ld.so: object 'libVTmc.so' from LD_PRELOAD cannot be preloaded: ignored.
ERROR: ld.so: object 'libVTmc.so' from LD_PRELOAD cannot be preloaded: ignored.
[0] MPI startup(): shm and tcp data transfer modes
[4] MPI startup(): shm and tcp data transfer modes
[3] MPI startup(): shm and tcp data transfer modes
[5] MPI startup(): shm and tcp data transfer modes
[2] MPI startup(): shm and tcp data transfer modes
[1] MPI startup(): shm and tcp data transfer modes
[0] MPI startup(): Rank    Pid      Node name            Pin cpu
[0] MPI startup(): 0       894      p-linux       {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,
                                           30,31,32,33,34,35,36,37,38,39}
[0] MPI startup(): 1       5235     p-linux-mic0  {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30
                                           ,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45}
[0] MPI startup(): 2       5236     p-linux-mic0  {46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72
                                           ,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90}
[0] MPI startup(): 3       5237     p-linux-mic0  {91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,1
                                           13,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,1
                                           33,134,135}
[0] MPI startup(): 4       5238     p-linux-mic0  {136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,
                                           156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,
                                           176,177,178,179,180}
[0] MPI startup(): 5       5239     p-linux-mic0  {0,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199,20
                                           0,201,202,203,204,205,206,207,208,209,210,211,212,213,214,215,216,217,218,219,22
                                           0,221,222,223,224}
[0] MPI startup(): I_MPI_DEBUG=5
[0] MPI startup(): I_MPI_FABRICS=shm:tcp
[0] MPI startup(): I_MPI_MIC=1
[0] MPI startup(): I_MPI_PIN_MAPPING=1:0 0
 Hello world: rank            0  of            6  running on
 p-linux                                                                 
                                                 
 Hello world: rank            1  of            6  running on
 p-linux-mic0                                                            
                                                 
 Hello world: rank            2  of            6  running on
 p-linux-mic0                                                            
                                                 
 Hello world: rank            3  of            6  running on
 p-linux-mic0                                                            
                                                 
 Hello world: rank            4  of            6  running on
 p-linux-mic0                                                            
                                                 
 Hello world: rank            5  of            6  running on
 p-linux-mic0                                                            

 

Hi George,

- Could you run your application on host only (without coprocessor) with more than 10 ranks? Do you see the same error?

- Repeat the test again on the coprocessor only (without host) with more than 10 ranks? Do you see the same error?

Hi

Everything works fine when I run on host or phi only. The problem in is the symmetric execution. I'm using the Intel mpi version 4.1.3.

Is there a MaxSessions specified in /etc/ssh/sshd_config?

Hi George,

You use the option "-iface" only in your command when you want to specify a network interface. In your case, you don't need to specify mic0. Therefore, can you retry your command without "-iface mic0" and observe if you still have that error:

# mpirun -host p-linux -env I_MPI_DEBUG 5 -genv I_MPI_FABRICS shm:tcp -np 1 -check_mpi ./a.host : -host mic0 -env I_MPI_DEBUG 5 -np 10  ./a.mic

Leave a Comment

Please sign in to add a comment. Not a member? Join today