Intel® Cluster Ready

Intel MPI Library Troubleshooting Guide

The latest versions of the Intel MPI Library User's Guides have added an expanded Troubleshooting section.  It provides the following information:

  • General Intel MPI Library troubleshooting practices
  • Typical MPI failures with corresponding output messages and behavior when a failure occurs
  • Recommendations on potential root causes and solutions

Here are direct links to these new sections:

Profiling MPI applilcation with Vtune

Hi, folks

I'd like to profile my MPI application with Vtune.

In ordered to see the inter-node behaviors,I definitely need to use '-gtool' options to aggregate the profiled result into one file.

1) When I run the application without profiling, the following command works perfect:

  • $ mpiexec.hydra -genvall -n 8 -machinefile /home/my_name/machines ARGS1 ARGS2 ...

2) The following command also does the job (running multiple MPI processes on a machine). I can see the aggregated results of them.

Using InfiniBand network fabrics to allocate globally shared memory for processes on different nodes

Dear Collegues,

My MPI program implements a globally shared memory for processes on multiple nodes (hosts) using MPI_Win_allocate_shared, MPI_Comm_split_type functions calls. Unfortunately, the memory address space allocated is not actually shared between processes on different nodes. I'm wondering what will actually happen if I run my MPI program on a cluster with InfiniBand network and change the network fabrics to I_MPI_FABRICS=shm:dapl or something like that. Is this can be a solution of the following problem ?

Thanks in advance.

Cheers, Arthur.

Intel OpenMPI problem with mpirun

$ mpirun --prefix /opt/openmpi/1.10.0-intel/ -np 2 --host atl4-06,atl4-07 ./a.out
/opt/openmpi/1.10.0-intel/bin/orted: error while loading shared libraries: cannot open shared object file: No such file or directory
ORTE was unable to reliably start one or more daemons.
This usually is caused by:

* not finding the required libraries and/or binaries on
  one or more nodes. Please check your PATH and LD_LIBRARY_PATH

Hybrid OpenMP+MPI: How to synchronize access to MPI shared RMA window by multiple OpenMP threads ?

Dear Collegues,

I've developed parallel code that implements both OpenMP + MPI hybrid parallelization and allocates a shared RMA window by calling MPI_Win_allocate_shared function to store the data shared among multiple OpenMP threads executed withing several MPI processes.

Is there any chance to synchronize access to the shared RMA window by the number of OpenMP threads executed within multiple MPI processes.?

Is there any pattern for hybrid OpenMP + MPI programming ?

Thanks for your replies in advance.

Cheers, Arthur.

MPI_FILE_SET_VIEW produces a seg fault in Windows 10

I have a large CFD code that uses a parallel MPI write routine. The code compiles and runs on our Windows 7 machines (Intel Fortran 16 and Intel MPI 5.1.2), but the code fails under Windows 10. The failure always occurs in the call MPI_FILE_SET_VIEW. I wrote a short program to demonstrate the problem. This program runs on a Windows 7 machine and fails under Windows 10, regardless of which platform we compile on.

execvp error on file, The requested operation requires elevation.

This is Tansel. I am not a direct MPI user. Someone from my team created an application using MPI, using a single batch file to run his executable on mpiexec.

The problem is, I need to integrate this into an automated system (written in Java) and my software will call this batch at some point. However, when I do, I get the following error: (batch file launches the software in the first line, [0+] are errors)
(Program runs fine if I run the batch file w/ double click or from any command line)

My MPI program doesn't work (hangs) when you launch processes on different nodes (hosts)

My MPI program doesn't work (hangs) when you launch processes on different nodes (hosts). In my program I use MPI_Win_allocate_shared function to allocate shared memory using RMA window. And I'm wondering what is the possible cause why my program doesn't work. Do I actually need to implement intercommunicators for that purpose? Here's the code:

Subscribe to Intel® Cluster Ready