Intel® Clusters and HPC Technology

Any good tools/methods to debug MPI based program?

Dear all,

I have a MPI-based Fortran code that can run with single or two processes, however, when lunch the program with more processes, for example, 4 processes, the program crashed with the following message:

forrtl: severe (157): Program Exception - access violation
forrtl: severe (157): Program Exception - access violation

job aborted:
rank: node: exit code[: error message]
0: N01: 123
1: N01: 123
2: n02: 157: process 2 exited without calling finalize
3: n02: 157: process 3 exited without calling finalize

 

Mpirun is treating -perhost, -ppn, -grr the same: always round-robin

Our cluster has 2 Haswell sockets per node, each with 12 cores (24 cores/node).

Using: intel/15.1.133, impi/5.0.3.048

Irrespective of which of the options mentioned in the subject line are used, ranks are always being placed in round-robin fashion.  The commands are being run in batch job that generates a host file that contains lines like the following when submitted with:

qsub -l nodes=2:ppn=1 ...

 

tfe02.% cat hostfile
t0728
t0731
tfe02.%

Assertion failed in file ../../dapl_conn_rc.c at line 914: vc->pg

Dear all,

we are running OpenFOAM (pisoFoam, up 130 Mio cells) using Intel-MPI 4.1 on up to 480 processes. In general it works fine but from time to time we see errors like the following:

Assertion failed in file ../../dapl_conn_rc.c at line 914: vc->pg
internal ABORT - process 398
[0:scc6n001] unexpected disconnect completion event from [398:scc6n173]
Assertion failed in file ../../dapl_conn_rc.c at line 1179: 0
internal ABORT - process 0

Do you have any idea what happend here? I really appreciate any help you can provide.

Source location in Trace Analyzer in applications statically linked with Intel® MPI Library

I want to perform analysis of the application which is compiled with the following command line:

$ mpiicc -static_mpi -trace -g myApp.c -o myApp
$ mpirun -n 4 ./myApp

Additionally I record the location of my function calls by setting the environment variable VT_PCTRACE with the following command

cannot open source file "mpi.h"

 

Dear all

I am trying to run this make file,  but I am having this error "catastrophic error: cannot open source file "mpi.h"
  #include <mpi.h>. I am sure I have a problem with the make file, but knowledge in linux is low. Thanks in advance for your help

export HDF5_INC_DIR=$/usr/local/hdf5/include

export HDF5_LIB_DIR=$/usr/local/hdf5/lib

export NETCDF_INC_DIR=$/usr/local/netcdf4/include

export NETCDF_LIB_DIR=$/usr/local/netcdf4/lib /usr/local/netcdf4-fortran/lib

export $MPI_INC_DIR=$/opt/intel/impi/5.1.1.109/bin64

Redistributable info, missing component

Hi,

I'm looking at including the intel mpi as part of our software package, this way the end-user do not have to install the MPI components on his system.
We will of course include this in the 3rd party EULA of our software.

However:

  - The "redist.txt" file of the intel MPI folder list the files which are OK to include in our package.  But the file bin/pmi_proxy.exe seems missing from the list.  It is required to run local computations (-localonly)

Unusual requirement?

Dear All,

We are involved in setting up an HPC cluster with about 25 Dell PowerEdge 720 servers, each equipped with 172 GB of RAM and 24 Intel cores running at 2,4 GHz). Every node is connected to a Gigabit Ethernet switch and to a 56 Gbps Mellanox Infiniband switch that provides storage access.

订阅 Intel® Clusters and HPC Technology