Intel® Clusters and HPC Technology

mpdboot only running on 4 nodes

Hi,

When I run mpdboot (revision 1.4.9) with a number of hosts greater than 4 only 4 mpd instances are started on hosts. e.g. mpdboot -n 8 -v will only launch 4 mpd on hosts. My mpd.hosts file has 19 hosts in it. However if I run mpdboot -n 19 --maxbranch 18 then mpd starts on all hosts as expected. Unfortunately I am not very familiar with HPCs and I see lots of examples of mpdboot with the number of hosts great than 4 without the maxbranch option. Is this a problem with my configuration or can I ignore it and carry on using --maxbranch?

Any help appreciated.

Mark

Link error with MPICH2

I have got linking error with MPICH2 implementation and Visual Fortran 11.1.

The following code compiles and links with no problemes :

[fxfortran]             CALL MPI_FILE_READ(handle_csc, int_buffer, 5, MPI_INTEGER,
* status, mpi_err)
CALL MPI_FILE_READ(handle_csc, ZFLAG, 5, MPI_CHARACTER,
* status, mpi_err)[/fxfortran]

But, if i comments the first call, the linker fails to link to MPI_FILE_READ.

Generating Windows Core Dump

I am launching an MPI application (using mpiexec) on Win 2008 R1 Server platform. Occasionally, I get process crashes, and I need to debug it after the fact by using the core-dump file, with WinDbg debugger.

Could someone let me if its possible to generate a core dump automatically on any MPI application crash, and if so, how do I do it.

Thanks in advance.

Sashi.B

running Intel MPI 4.0 with SLURM

Hi,I have just installed intel mpi (v4.0.0.027) on a nahelem-IB based cluster system which uses the SLURM resource manager. All of the compilers and mpi stacks are installed using modules, including the intel mpi. After I load the intel-mpi module, build the application, and try to run it using a SLURM batch file, the program crashes, as the intel-mpi runtime environment does not obtain all of the SLURM environments. I get the message..<---mpiexec_rm1867: cannot connect to local mpd (/tmp/mpd2.console_apurkay); possible causes:1. no mpd is running on this host2.

L3 caches in Nehalem

Hello,

I would like to know if any presently available tool can
be used to get the L3 cache misses numbers from the performance
counters of Nehalem based architecture? Also, is it possible to
identify the exact point in the execution of a program when this L3
cache miss occurs?

Thanks in advance,

Aastha.

Analyzing mpd Ring Failures

Hello,

we're using IntelMPI in an SGE cluster (tight integration). For some nodes, the jobs consistently fail with message similar to these:

startmpich2.sh: check for mpd daemons (2 of 10)startmpich2.sh: got all 24 of 24 nodes
node26-05_46554 (handle_rhs_input 2425): connection with the right neighboring mpd daemon was lost; attempting to re-enter the mpd ring
...
node26-22_42619: connection error in connect_lhs call: Connection refused
node26-22_42619 (connect_lhs 777): failed to connect to the left neighboring daemon at node26-23 40826

one-sided communication in Intel MPI

Does Intel MPI Library support passive target one-sided communication? How about the perforamnce?
It seems the performance of passive target one-sided operations implemented through MPI-2
windows is very poor. The implementation of RMA in Intel MPI is not truly
one-sided. When a process call MPI_Get/Put, the target process won't response if it is busy on its own work.

Subscribe to Intel® Clusters and HPC Technology