Intel® Clusters and HPC Technology

running Intel MPI 4.0 with SLURM

Hi,I have just installed intel mpi (v4.0.0.027) on a nahelem-IB based cluster system which uses the SLURM resource manager. All of the compilers and mpi stacks are installed using modules, including the intel mpi. After I load the intel-mpi module, build the application, and try to run it using a SLURM batch file, the program crashes, as the intel-mpi runtime environment does not obtain all of the SLURM environments. I get the message..<---mpiexec_rm1867: cannot connect to local mpd (/tmp/mpd2.console_apurkay); possible causes:1. no mpd is running on this host2.

L3 caches in Nehalem


I would like to know if any presently available tool can
be used to get the L3 cache misses numbers from the performance
counters of Nehalem based architecture? Also, is it possible to
identify the exact point in the execution of a program when this L3
cache miss occurs?

Thanks in advance,


Analyzing mpd Ring Failures


we're using IntelMPI in an SGE cluster (tight integration). For some nodes, the jobs consistently fail with message similar to these: check for mpd daemons (2 of 10) got all 24 of 24 nodes
node26-05_46554 (handle_rhs_input 2425): connection with the right neighboring mpd daemon was lost; attempting to re-enter the mpd ring
node26-22_42619: connection error in connect_lhs call: Connection refused
node26-22_42619 (connect_lhs 777): failed to connect to the left neighboring daemon at node26-23 40826

one-sided communication in Intel MPI

Does Intel MPI Library support passive target one-sided communication? How about the perforamnce?
It seems the performance of passive target one-sided operations implemented through MPI-2
windows is very poor. The implementation of RMA in Intel MPI is not truly
one-sided. When a process call MPI_Get/Put, the target process won't response if it is busy on its own work.

IntelMPI first execution crashes, mpd process on remote host does not exit

I found during my test that in case the first execution crashes, a process on remote host does not exit automatically.

Here're the steps, assume you have two host host1 and host2

start a mpd ring on the two hosts under normal user:

export MPD_CON_EXT=1234

mpdboot -n 2 -f $hfile

in which hfile contains two hosts host1 and host2

on host1, kill -9 the mpd process and all intel mpi process in one shot

on host2 (remote host), you see a left over process

MPI shared memory


I have program which needs about 50GB of RAM. Is Intel MPI able to automaticly manage allocation of memory for this program over the network? In other words If I had 25 machines with 2GB of RAM, is MPI able to use RAM from this machines? If yes, is it enough to just add -env I_MPI_DEVICE ssm?. What about performance if I use 100MB ethernet? Perhaps just swapping to hard drive would works quicker?

Thank you for answer and best wishes,

Подписаться на Intel® Clusters and HPC Technology