Intel® Cluster Ready

Intel MPI Library Troubleshooting Guide

The latest versions of the Intel MPI Library User's Guides have added an expanded Troubleshooting section.  It provides the following information:

  • General Intel MPI Library troubleshooting practices
  • Typical MPI failures with corresponding output messages and behavior when a failure occurs
  • Recommendations on potential root causes and solutions

Here are direct links to these new sections:

How to use Intel® Cluster Checker v3 when the cluster head node has an Intel® Xeon Phi coprocessor installed

In case your cluster uses a head node with one or more Intel® Xeon PhiTM coprocessors a couple more step have to be taken in order to collect all information.

First of all execute the clck-collect command using your configuration file and node list.

clck-collect -a -c <your configuration file> -f <your node list>

Then create a second node list containing just the name of the head node, without any additional "role:" tag.

Now you can collect the additional information by executing

  • Partners
  • Linux*
  • Server
  • Intel® Cluster Checker
  • Intel® Cluster Ready
  • Cluster Computing
  • No Cost Options for Intel Parallel Studio XE, Support yourself, Royalty-Free

    Intel® Parallel Studio XE is a very popular product from Intel that includes the Intel® Compilers, Intel® Performance Libraries, tools for analysis, debugging and tuning, tools for MPI and the Intel® MPI Library. Did you know that some of these are available for free?

    Here is a guide to “what is available free” from the Intel Parallel Studio XE suites.

    Intel® Parallel Studio XE 2016: High Performance for HPC Applications and Big Data Analytics

    Intel® Parallel Studio XE 2016, launched on August 25, 2015, is the latest installment in our developer toolkit for high performance computing (HPC) and technical computing applications. This suite of compilers, libraries, debugging facilities, and analysis tools, targets Intel® architecture, including support for the latest Intel® Xeon® processors (codenamed Skylake) and Intel® Xeon Phi™ processors (codenamed Knights Landing).

    Need help making sense of NBC performance (MPI3)

    Hello everyone,

    I am fairly new to parallel computing, but am working on a certain legacy code that uses real-space domain decomposition for electronic structure calculations. I have spent a while modernizing the main computational kernel to hybrid MPI+openMP and upgraded the communication pattern to use nonblocking neighborhood alltoallv for the halo exchange and a nonblocking allreduce for the other communication in the kernel. I have now started to focus on "communication hiding", so that the calculations and communication happen alongside each other.

    Can each thread on Xeon Phi be given private data areas in the offload model

    Hi,

    I want to calculate a  Jacobian matrix, which is a sum of 960 (to be simple) 3x3 matrices  by distributing the calculations of these 3x3 matrices to a Xeon Phi card. The calculation of the 3x3 matrices uses a third party library whose subroutines use an interger vector not only for the storage of parameter values but also to write and read intermidiate results. It is therefore necessary for each task to have this integer vector protected from other tasks. Can this be obtained on the physical core level or even for each thread (each Xeon Phi has 60x4=240 threads. 

    mpirun with bad hostname hangs with [ssh] <defunct> until Enter is pressed

    We have been experiencing hangs with our MPI-based application and our investigation led us to observing the following behaviour of mpirun:

    mpirun -n 1 -host <good_hostname> hostname works as expected

    mpirun -n 1 -host <bad_hostname> hostname hangs, during which ps shows: 

    Varying Intel MPI results using different topologies

    Hello,

    I am compiling and running a massive electronic structure program on an NSF supercomputer.  I am compiling with the intel/15.0.2 Fortran compiler and impi/5.0.2, the latest-installed Intel MPI library.

    The program has hybrid parallelization (MPI and OpenMP).  When I run the program on a molecule using 4 MPI tasks on a single node (no OpenMP threading anywhere here), I obtain the correct result.

    However, when I spread out the 4 tasks on 2 nodes (still 4 total tasks, just 2 on each node), I get what seem to be numerical-/precision-related errors.

    Debugging Fortran MPI codes in VS2012 and Intel MPI

    Hi,

    Before this I was using VS2008 with ifort 11 and MPICH.

    I folllowed the 1st mtd (by attaching to a currently running process (one VS window for all selected MPI processes)) from:

    http://wiki.rac.manchester.ac.uk/community/MPI/VisualStudio_mpich2_howto

    It worked but fails for np >= 4. Seems to be MPICH problem.

    However, using the new setup, I can't get it to work, even with np = 1 or 2. Error is:

    Subscribe to Intel® Cluster Ready