Intel® Clusters and HPC Technology

Problems installing cluster toolkit under windows server 2008

Hi all,

during the installation of Cluster Toolkit for Windows HPC Server 2008 I had some "noncritical errors":

An error occured in Merge Module SUBST.
Installation will be continued but NOT all of the following files will be configured appropriately:

C:\Program Files (x86)\Intel\ICTCE\3.2.1.015\Compiler\bin\ifortvars.bat

etc.

After installing Cluster Tookit I have checked build environment with the following output:

.nsfxxxx files generated at execution with mpi

Hello

When I use my software running with Intel MPI 2.0 (sorry can't update MPI for the moment), I can see numerous temporary files named .nfsXXXXXXXX which worried my customers . The process running with MPI needs 4 files in input and generates4 files in output, all these files are read and written on a remote disk via the network and the I use 4 cpus.
Can you tell me more about these files, is there a relationship between the number of .nsfxxx files and the files in input/output?

Intel Fortran debugger and MPICH

Hi,
I want to debug this parallel Fortran program that I'm trying to make run on a Linux-type cluster using the Intel Fortran compiler (v.9.1.045) and debugger (v.9.1-28) and MPICH2.1.2. While building the executable with the -g option is straightforward, but when trying to invoke the compiler it crashes with this message:
$ idb -parallel mpiexec -machinefile machines -n 4 ./stagyympi
Intel Debugger for applications running on IA-32, Version 9.1-28, Build 20070305
execve failed: No such file or directory
Error: could not start debuggee

how to distribute data to different computing nodes, using MPI

Dear all, I am started to using MPI for a simple data decomposition of a 2-D domain. Assuming that I am using 2 computing nodes, each having 8 processors, I want to make message pass only between the two nodes, while inside each node, all processors can access their shared memory.
After calling MPI_rank and receiving 0~15 for processor rank, how can I know to which node a processor belongs? Do processors with rank 0 to 7 belong to computing node1 and 8 to 15 belong to computing node 2?

How does Intel MPI handle network failures

Hi all,

I am new to the forum and have a question regarding network failures and MPI applications (specifically using the Intel MPI binding).

What happens if I have a a number of processes running on a cluster, and someone unplugs a network cable? As far as I have read, the MPI processes gets terminated immediately. How can I circumvent this, say by using some sort of a WAIT or TIMEOUT command if a network fault is detected, so that they can see if maybe they can again recover after anumber of (set) seconds?

Any help would be very much appreciated!

mpd error

Hi!

I have a problem with Altair PBS PRO + Intel MPI. I can launch a task with mpiexec command on several nodes. But when I try to launch this task on several nodes under PBS I get error.

What I doing:
1) Starting mpd on nodes:
qwer@mgr:/mnt/share/piex> cat mpd.hosts
ib-mgr:10
ib-cn01:16
ib-cn02:16
ib-cn03:16
ib-cn04:16
ib-cn05:16
qwer@mgr:/mnt/share/piex> mpdboot -n 6 -f mpd.hosts -r ssh

memory distribution

Hello everyone.
I'm want to run two mpi (MPICH2) codes in my cluster.
I send the first work distributed with round robin and all is ok.
The problem appear when I send the second work. The memory used by the second one is from the cpu used by the first work and doesn't use the memory of the other, almost free, cpu.
There is a flag with which I can tell to MPICH2 that it should use the memory free cpus?
Thank you.

Error in Intel MPI 3.2.2 MPI I/O

I've been testing the new Intel MPI 3.2.2 release which has support for Panasas' PanFS. I've checked out an evaluation copy of the library, but am running into what I believe is a bug with the use of layout hints to a shared file. For the case of more than 1 thread accessing a file, (N-to-1), this fails as each thread is trying to perform an ioctl call to the file and returning:

"ADIOI_PANFS_OPEN: I/O Error doing ioctl on parent directory to create PanFS file using ioctl: File exists."

mpdboot fails to start nodes with different users.

I am trying to figure out why a few nodes in my cluster are acting differently.

We are running Rocks 5.2 with RHEL 5
We use torque/maui as our queing system.
They submit jobs that use
MPI version 3.2.1.009

When I start a job as a user with this
mpdboot --rsh=ssh -d -v -n 16 -f /scr/username/testinput.nodes.mpd

Assine o Intel® Clusters and HPC Technology