Intel® MPI Library supports the majority of commonly used job schedulers in the HPC field.
The following job schedulers are supported on Linux* OS:
Altair* PBS Pro*
IBM* Platform LSF*
Univa* Grid Engine*
The Hydra Process manager detects Job Schedulers automatically by checking specific environment variables. These variables are used to determine how many nodes were allocated, which nodes, and the number of processes per tasks.
Altair* PBS Pro*, TORQUE*, and OpenPBS*
If you use one of these job schedulers, and
exists with the value
as a machine file for
. You do not need to specify the
An example of a batch job script may look as follows:
#PBS -l nodes=4:ppn=4
#PBS -q queue_name
mpirun -n 16 ./myprog
The IBM* Platform LSF* job scheduler is detected automatically if the
environment variables are set.
The Hydra process manager uses these variables to determine how many nodes were allocated, which nodes, and the number of processes per tasks. To run processes on the remote nodes, the Hydra process manager uses the
utility by default. This utility is provided by the IBM* Platform LSF*.
The number of processes, the number of processes per node, and node names may be overridden by the usual Hydra options (-n, -ppn, -hosts).
bsub -n 16 mpirun ./myprog
bsub -n 16 mpirun -n 2 -ppn 1 ./myprog
If you use Parallelnavi NQS* job scheduler and the
options are set, the
file is used as a machine file for
is used as remote shell by the process manager during startup.
is set, the
environment variables will be used to generate a machine file for
. The name of the machine file is
. The machine file will be removed when the job is completed.
For example, to submit a job, run the command:
$ srun -N2 --nodelist=host1,host2 -A
$ mpirun -n 2 ./myprog
To enable PMI2, set I_MPI_PMI_LIBRARY and specify --mpi option:
$ I_MPI_PMI_LIBRARY=<path to libpmi2.so>/libpmi2.so srun --mpi=pmi2 <application>
If you use the Univa* Grid Engine* job scheduler and the
is set, then two files will be generated:
. The latter is used as the machine file for
. These files are removed when the job is completed.
SIGINT, SIGTERM Signals Intercepting
If resources allocated to a job exceed the limit, most job schedulers terminate the job by sending a signal to all processes.
For example, Torque* sends
three times to a job and if this job is still alive,
will be sent to terminate it.
For Univa* Grid Engine*, the default signal to terminate a job is
. Intel® MPI Library is unable to process or catch that signal causing
to kill the entire job. You can change the value of the termination signal through the following queue configuration:
Use the following command to see available queues:
Execute the following command to modify the queue settings:
and change signal to
Save queue configuration.
Controlling Per-Host Process Placement
When using a job scheduler, by default Intel MPI Library uses per-host process placement provided by the scheduler. This means that the
option has no effect. To change this behavior and control process placement through
(and related options and variables), use the
$ export I_MPI_JOB_RESPECT_PROCESS_PLACEMENT=off