Developer Guide

Contents

Setting
the Number of OpenMP* Threads

The OpenMP* run-time library responds to the environment variable
OMP_NUM_THREADS
.
Intel® MKL
also has other mechanisms to set the number of OpenMP threads, such as the
MKL_NUM_THREADS
or
MKL_DOMAIN_NUM_THREADS
environment variables (see Using Additional Threading Control).
Make sure that the relevant environment variables have the same and correct values on all the nodes.
Intel® MKL
does not set the default number of OpenMP threads to one, but depends on the OpenMP libraries used with the compiler to set the default number. For the threading layer based on the Intel compiler (
lib
mkl_intel_thread.
a
), this value is the number of CPUs according to the OS.
Avoid over-prescribing the number of OpenMP threads, which may occur, for instance, when the number of MPI ranks per node and the number of OpenMP threads per node are both greater than one. The number of MPI ranks per node multiplied by the number of OpenMP threads per node should not exceed the number of hardware threads per node.
If you are using your login environment to set an environment variable, such as
OMP_NUM_THREADS
, remember that changing the value on the head node and then doing your run, as you do on a shared-memory (SMP) system, does not change the variable on all the nodes because
mpirun
starts a fresh default shell on all the nodes. To change the number of OpenMP threads on all the nodes, in
.bashrc
, add a line at the top, as follows:
OMP_NUM_THREADS=1; export OMP_NUM_THREADS
You can run multiple CPUs per node using MPICH. To do this, build MPICH to enable multiple CPUs per node. Be aware that certain MPICH applications may fail to work perfectly in a threaded environment (see the Known Limitations section in the
Release Notes
. If you encounter problems with MPICH and setting of the number of OpenMP threads is greater than one, first try setting the number of threads to one and see whether the problem persists.
For Cluster Sparse Solver, set the number of OpenMP threads to a number greater than one because the implementation of the solver only supports a multithreaded algorithm.

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804