Intel® Clusters and HPC Technology

MPI_Comm_dup may hang with Intel MPI 4.1

Hi,

The attached program simple_repro.c reproduces what I believe is a bug in the Intel MPI implementation version 4.1.

In short, what it does is it spawns <num_threads> threads on 2 processes, such that thread i on rank 0 is supposed to communicate with thread i on rank 1 using their private communicator. The only difference between the 2 processes involved is that the threads on rank 0 are coordinated with a semaphore, such that they can't all be active at the same time. Threads on rank 1 run freely.

HPC and Magic of OpenMP thread Affinity Management: Compare performance when it is Not Used and Used...

HPC and Magic of OpenMP thread Affinity Management: Compare performance of matrix multiply when Thread Affinity is Not Used and Used...

Two screenshots are attached for your review.

 

Any good tools/methods to debug MPI based program?

Dear all,

I have a MPI-based Fortran code that can run with single or two processes, however, when lunch the program with more processes, for example, 4 processes, the program crashed with the following message:

forrtl: severe (157): Program Exception - access violation
forrtl: severe (157): Program Exception - access violation

job aborted:
rank: node: exit code[: error message]
0: N01: 123
1: N01: 123
2: n02: 157: process 2 exited without calling finalize
3: n02: 157: process 3 exited without calling finalize

 

Mpirun is treating -perhost, -ppn, -grr the same: always round-robin

Our cluster has 2 Haswell sockets per node, each with 12 cores (24 cores/node).

Using: intel/15.1.133, impi/5.0.3.048

Irrespective of which of the options mentioned in the subject line are used, ranks are always being placed in round-robin fashion.  The commands are being run in batch job that generates a host file that contains lines like the following when submitted with:

qsub -l nodes=2:ppn=1 ...

 

tfe02.% cat hostfile
t0728
t0731
tfe02.%

Assertion failed in file ../../dapl_conn_rc.c at line 914: vc->pg

Dear all,

we are running OpenFOAM (pisoFoam, up 130 Mio cells) using Intel-MPI 4.1 on up to 480 processes. In general it works fine but from time to time we see errors like the following:

Assertion failed in file ../../dapl_conn_rc.c at line 914: vc->pg
internal ABORT - process 398
[0:scc6n001] unexpected disconnect completion event from [398:scc6n173]
Assertion failed in file ../../dapl_conn_rc.c at line 1179: 0
internal ABORT - process 0

Do you have any idea what happend here? I really appreciate any help you can provide.

Source location in Trace Analyzer in applications statically linked with Intel® MPI Library

I want to perform analysis of the application which is compiled with the following command line:

$ mpiicc -static_mpi -trace -g myApp.c -o myApp
$ mpirun -n 4 ./myApp

Additionally I record the location of my function calls by setting the environment variable VT_PCTRACE with the following command

cannot open source file "mpi.h"

 

Dear all

I am trying to run this make file,  but I am having this error "catastrophic error: cannot open source file "mpi.h"
  #include <mpi.h>. I am sure I have a problem with the make file, but knowledge in linux is low. Thanks in advance for your help

export HDF5_INC_DIR=$/usr/local/hdf5/include

export HDF5_LIB_DIR=$/usr/local/hdf5/lib

export NETCDF_INC_DIR=$/usr/local/netcdf4/include

export NETCDF_LIB_DIR=$/usr/local/netcdf4/lib /usr/local/netcdf4-fortran/lib

export $MPI_INC_DIR=$/opt/intel/impi/5.1.1.109/bin64

Redistributable info, missing component

Hi,

I'm looking at including the intel mpi as part of our software package, this way the end-user do not have to install the MPI components on his system.
We will of course include this in the 3rd party EULA of our software.

However:

  - The "redist.txt" file of the intel MPI folder list the files which are OK to include in our package.  But the file bin/pmi_proxy.exe seems missing from the list.  It is required to run local computations (-localonly)

Подписаться на Intel® Clusters and HPC Technology