Job Schedulers Support

Intel® MPI Library supports the majority of commonly used job schedulers in the HPC field.

The following job schedulers are supported on Linux* OS:

  • Altair* PBS Pro*
  • Torque*
  • OpenPBS*
  • IBM* Platform LSF*
  • Parallelnavi* NQS*
  • SLURM*
  • Univa* Grid Engine*

The support is implemented in the mpirun wrapper script. mpirun detects the job scheduler under which it is started by checking specific environment variables and then chooses the appropriate method to start an application.

Altair* PBS Pro*, TORQUE*, and OpenPBS*

If you use one of these job schedulers, and $PBS_ENVIRONMENT exists with the value PBS_BATCH or PBS_INTERACTIVE, mpirun uses $PBS_NODEFILE as a machine file for mpirun. You do not need to specify the –machinefile option explicitly.

An example of a batch job script may look as follows:

#PBS –l nodes=4:ppn=4
#PBS –q queue_name
cd $PBS_O_WORKDIR
mpirun –n 16 ./myprog

IBM* Platform LSF*

If you use the IBM* Platform LSF* job scheduler, and $LSB_MCPU_HOSTS is set, it will be parsed to get the list of hosts for the parallel job. $LSB_MCPU_HOSTS does not store the main process name, therefore the local host name will be added to the top of the hosts list. Based on this host list, a machine file for mpirun is generated with a unique name: /tmp/lsf_${username}.$$. The machine file is removed when the job is complete.

For example, to submit a job, run the command:

$ bsub -n 16 mpirun -n 16 ./myprog

Parallelnavi NQS*

If you use Parallelnavi NQS* job scheduler and the $ENVIRONMENT, $QSUB_REQID, $QSUB_NODEINF options are set, the $QSUB_NODEINF file is used as a machine file for mpirun. Also, /usr/bin/plesh is used as remote shell by the process manager during startup.

SLURM*

If the $SLURM_JOBID is set, the $SLURM_TASKS_PER_NODE, $SLURM_NODELIST environment variables will be used to generate a machine file for mpirun. The name of the machine file is /tmp/slurm_${username}.$$. The machine file will be removed when the job is completed.

For example, to submit a job, run the command:

$ srun -N2 --nodelist=host1,host2 -A
$ mpirun -n 2 ./myprog

Univa* Grid Engine*

If you use the Univa* Grid Engine* job scheduler and the $PE_HOSTFILE is set, then two files will be generated: /tmp/sge_hostfile_${username}_$$ and /tmp/sge_machifile_${username}_$$. The latter is used as the machine file for mpirun. These files are removed when the job is completed.

SIGINT, SIGTERM Signals Intercepting

If resources allocated to a job exceed the limit, most job schedulers terminate the job by sending a signal to all processes.

For example, Torque* sends SIGTERM three times to a job and if this job is still alive, SIGKILL will be sent to terminate it.

For Univa* Grid Engine*, the default signal to terminate a job is SIGKILL. Intel® MPI Library is unable to process or catch that signal causing mpirun to kill the entire job. You can change the value of the termination signal through the following queue configuration:

  1. Use the following command to see available queues:

    $ qconf -sql
  2. Execute the following command to modify the queue settings:

    $ qconf -mq <queue_name>
  3. Find terminate_method and change signal to SIGTERM.

  4. Save queue configuration.

Controlling Per-Host Process Placement

When using a job scheduler, by default Intel MPI Library uses per-host process placement provided by the scheduler. This means that the -ppn option has no effect. To change this behavior and control process placement through -ppn (and related options and variables), use the I_MPI_JOB_RESPECT_PROCESS_PLACEMENT environment variable:

$ export I_MPI_JOB_RESPECT_PROCESS_PLACEMENT=off
For more complete information about compiler optimizations, see our Optimization Notice.
Select sticky button color: 
Orange (only for download buttons)