Hi..Is it possible to place the MPI ranks on the specific cores of Xeon Phi (1-60) during the native mode computation. All that I understand is that the scheduler assigns ranks in round robin fashion on all the nodes starting from the first available core. Is it possible to overwrite this?
I have been trying to run IMB over 40GigE RoCE using IntelMPI and OpenMPI. I could run it using OpenMPI using the following FAQ
But I have been having issues running IMB using Intel MPI. I did not see many resources online. I have been trying to run it as follows.
So.. the code above is something I think it should've worked but it's bugging (by bugging i mean that rank 0 is expecting the messages
and the other ranks are waiting for them to be received but that never happens).
What I tried to do was something like this:
I am running a job on a 4,000-node cluster with Infiniband. For small scale like 8 to 64 node, command mpirun works well; and for medium sclale like 256 to 512 node, mpiexec.hydra has to be used; but when it goes up to 1024 node, I got errors, see attached. My job script is like this:
module load intel-compilers/12.1.0
module load intelmpi/4.0.3.008
#mpirun -np 64 -perhost 1 -hostfile $PBS_NODEFILE ./paraEllip3d input.txt
mpiexec.hydra -np 1000 -perhost 1 -hostfile $PBS_NODEFILE ./paraEllip3d input.txt
The errors from 1024 nodes are:
I connected coprocessor like
# ssh mic0
it didn't need a password before.
but after I run this command below ssh asked password.
# ./sshconnectivity.exp machines.LINUX
What should I do for recovery like before.
I just run sshconnectivity.exp
Simple mpi-helloworld mpi program crashes when using shm:dapl mode and MLNX OFED 2.1-1.0.0 IB stack. shm:ofa works fine. shm:dapl mode used to work fine with MLNX OFED 1.5.3 but latest el6.5 kernel requires 2.1-1.0.0 version.
I_MPI_FABRICS=shm:dapl srun -pdebug -n2 -N2 ~/mpi/intelhellog
Example of hipo cluster :
"Node 1" -> 4 cores so [0,4[ rank
"Node 2" -> 4 cores so [4,8[ rank
So rank = 0 would be first core in Node 1
In a situation like Core #5 wants to send a message to Core #6
which are both in "Node 2"
I hope Core 5 doesn't sends its message to Rank #0 in "Node 1" that sends it back to Core #6.
So how does it works?
I've using MPI_Comm_spawn in my code to dynamic create only one process but it takes a long time to complete (about 15s on Intel Xeon E5620 2.40GHz). I'm doing anything else but to call MPI_Comm_spawn. My simple code is:
I'm trying to collect data with MPI_Allgatherv with a large receive buffer for which the total size is larger than 2GB. As I could understand here (http://software.intel.com/en-us/forums/topic/361060) this is not supported. Unfortunately when I try to use the -ilp64 option with mpiifort I run into several problems:
1) when using include 'mpif.h' to include mpi, then after the following commands:
mpiifort -warn -O1 -g -traceback -check bounds -i8 -c gather.f
I have installed on a cluster intel mpi. I run ./sshconnectivity.exp machines.LINUX
on a cluster with on master and 3 nodes. using the following machines.LINUX
Than I installed mpi on all the nodes using the install.sh.
Everything went great...apparently
- Page 1