Using the Intel® MPI Library on a Heterogeneous Network

NOTICE!

This is an unsupported method of using the Intel® MPI Library.  There are no guarantees that this will work.  This article is intended to provide hints that could help run an MPI program in a heterogeneous network.

Introduction

MPI programs are classically intended to run on a (mostly) homogeneous cluster.  However, there are times when a heterogeneous network is used instead.  The term network is used here, rather than cluster, because a heterogeneous network can be as simple as two PCs connected to the same LAN that are visible to each other, or as complex as thousands of compute nodes with different OS versions (but using the same OS, mixing Windows* and Linux* is very unlikely to work).  This can be for testing purposes, limitations due to resource availability, or simply the intent of the application.  Using MPI on a heterogeneous network presents several difficulties that must be addressed.

Authentication

In order to run on a remote computer, there must be an authentication step to determine if the user has appropriate permissions.  The exact authentications methods vary by system.  Typically, on a Linux* based system, MPI programs authenticate using ssh (or another, user-specified network protocol, rsh is common).  On Windows* based systems, MPI uses Remote Desktop Protocol and either a password (entered at runtime or encoded into the registry) or a domain authentication.  In a heterogeneous network, all authentication methods should be identical across the systems.

Libraries

The MPI libraries should be identical across the systems.  While this is not an absolute requirement, mismatched MPI libraries may not be able to communicate with each other correctly.  This could lead to program crashes or incorrect results.

Environment Variables

The Intel® MPI Library uses several environment variables when running. The primary variable to consider here is I_MPI_ROOT, which is used to determine the location of MPI executables and libraries. If the Intel® MPI Library installation location is the same on each system, then this is not a factor. However, if it is installed in a different location on each computer (for example, on a 32-bit Windows* computer, the default installation location is "C:\Program Files\Intel\MPI\" but on a 64-bit Windows* computer, the default location is "C:\Program Files (x86)\Intel\MPI\"), then this is a concern.

Normally, when running a program using mpiexec, the environment variables on the host computer will be transferred to each process in the job. However, if I_MPI_ROOT is different on a target system, the host value will override the target value, likely causing an error.  There are several ways around this problem.  The flag -genvnone will prevent any environment variables from being transferred to any of the processes.  If your program does not use environment variables in any way, this should be sufficient.  However, if you do use environment variables, you will need to selectively send environment variables.  There are several options to mpiexec that can be used for this purpose:

-env <ENVVAR> <value>    Use to explicitly set I_MPI_ROOT
-envlist <list of vars>  Only send variables in the list
-envexcl <list of vars>  Do not send variables in the list

File System

While this is a concern on homogeneous clusters, it can be more of a concern on a heterogeneous network, so I'll mention it here as a reminder.  If the application's binary file is not in the same location on each system, the path will need to be explicitly specified for MPI.  When launching an application, MPI will attempt to change to the working directory first.  If not explicitly specified, this will be the current working directory.  As such, it must exist on the remote system, in the same location.  If this path does not exist, a different directory will need to be specified, using

-wdir <working directory>


As an argument to mpiexec.  Finally, any files that the application will need (shared libraries, input/output files, etc.) will need to be in a location that the application can find and access.

Reportez-vous à notre Notice d'optimisation pour plus d'informations sur les choix et l'optimisation des performances dans les produits logiciels Intel.