Intel® Clusters and HPC Technology

Systematic Slowing down of mpi task

I have a reproducible slowing down of calculations due to something which is going to be hard to track down so I welcome suggestions. I do not know if this is a memory leak, impi related, mkl related or what.

I first noticed it when one newish E5-2660 node was systematically running at ~1/2 the speed of others for mpi calculations, reproducibly. After rebooting it went back to running at the same speed as others, nothing else helped.

Unwanted output

I have a cluster with some  E5410 and some  E5-2660 all infiniband connected with Intel impi. Everything is working, but the E5410 nodes are giving a lot of unwanted output of form (condensed as there is one entry for every core):

node04.cluster:723a:f24164b0: 1094 us(1094 us): open_hca: device mlx4_0 not found

node04.cluster:723a:f24164b0: 28485 us(28485 us): open_hca: getaddr_netdev ERROR: No such file or directory. Is ib0 configured?

mpi password authorization using system account

I'm setting up a Win Server 2008 HPC cluster. I want to be able to have any engineer use the cluster without having to change the account and password every time a different engineer uses the cluster.

I created a system account in active directory, gave that account domain admin privledges and used the wmpiregister tool to register the account and password on the head node and all compute nodes.

At run time I'm getting job failure due to "password authentication failure for MPI during the launch attempt." How can I get MPI to accept a domain system account and password ?

MPI Internal Error: invalid error code 489e0e (Ring ids do not match)

I have an MPI code that works fine on my windows machine vs2010.  It has one master process that has MPI_COMM_ACCEPT ed a connection to another job that is running two MPI procs.  This setup also works when I have the process running on my intel cluster node as long as it is only a one process job that has been accepted.  But when I try two I get the message:

 

Bug of Intel MPI?

Dear all,

I am trying to run my program in a cluster with 10 nodes and every node has Windows 7 64bit + Intel MPI 4.1.

I run my program by

mpiexec -n 12 test

or

mpiexec -wdir \\n01\mytest\ -hosts 10 n01 12 n02 12 n03 12 n04 12 n05 12 n06 12 n07 12 n08 12 n09 12 n10 12 \\n01\mytest\test

When ONLY ONE Build Environment window opened, both command line works. However, when two  Build Environment windows opened, in one window the first command line still work but the second one failed with the following error message:

Redistributing MPI runtime components

Hello,

  We have a program in a software suite that has been built using the mpif90 compiler and need to redistribute the necessary runtime environment components with it.  There will be seperate 32 and 64 bit programs so what to include with each seems a little hairy and we just wanted to include the MPI Library Runtime installer with our installer (which runs 'Installshield').  Is this ok (read 'legal')?  If not, what's the best way to accomplish what we want to do as there seem to be various installations for different architectures?

MPI-IO error when running on lustre with a high number of stripes and processes

Hi,

I'm trying to run pNetCDF on lustre. The test code and pNetCDF library are both compiled with intel mpi library v4.0.2. Our lustre file system has 40 OSTs.

When running with stripes = 1 or processes = 32, the test codes works well and can output data correctly.

However, when I set stripe = 40 and run with 64 processes, the test code crashed as :

  rank 19 in job 1 c25b09_39645 caused collective abort of all ranks
      exit status of rank 19: killed by signal 9

The test code is attacted. Thank you in advance.

Vast unwanted output from mpirun task

I am getting way too many lines of what seems to be debug output from relatively simple mpirun with default options which I cannot seem to get rid of. This is not a problem with the code, it still runs fine. I assume that there is some environmental variable or similar variable which for some reason is turned on -- any suggestions?

Parts of the output which appear to be relevant are:

mpiexec options:
----------------
Base path: /opt/intel/impi/4.1.0.024/intel64/bin/
Launcher: ssh
Debug level: 1
Enable X: -1

...

Páginas

Assine o Intel® Clusters and HPC Technology