Intel MKL 10.0 introduces new optional threading controls, that is, the environment variables and service functions. They behave similar to their OpenMP equivalents, but take precedence over them. By using these controls along with OpenMP variables, users can thread the part of the application that does not call Intel MKL and the library independently from each other.
Below table lists the Intel MKL environment variables for threading control, their equivalent functions, and OMP counterparts:
These controls enable you to specify the number of threads for Intel MKL independently from OpenMP settings. Although Intel MKL may actually use the number of threads that differs from the one suggested, the controls will also enable you to instruct the library to try using the suggested number in the event of undetectable threading behavior in the application calling the library.
Employing Intel MKL threading controls in your application is optional. If you do not use them, the library will mainly behave the same way as Intel MKL 9.1 in what relates to threading with the possible exception of a different default number of threads.
Note: Intel MKL does not always have a choice on the number of threads for certain reasons, such as system resources.
Users can employ different techniques to specify the number of threads to use in Intel MKL.
When choosing the appropriate ways to set threading numbers for MKL functions,take into account the following rules:
A subroutine call takes precedence over any environment variables. The exception is the OpenMP subroutine omp_set_num_threads(), which does not have precedence over Intel MKL environment variables, such as MKL_NUM_THREADS.
Hyper-Threading Technology (HT Technology) is especially effective when each thread is performing different types of operations and when there are under-utilized resources on the processor. Intel MKL fits neither of these criteria as the threaded portions of the library execute at high efficiencies using most of the available resources and perform identical operations on each thread. You may obtain higher performance when using Intel MKL without HT Technology enabled.
If Hyper-Threading technology is enabled on the systems, it is recommended that the threading numbers be set equal to the number of real processors or cores. That is only half number of the logical processors.
Note: If the requested number of threads exceeds the number of physical cores (perhaps because of hyper-threading), and MKL_DYNAMIC is not changed from its default value of TRUE, Intel MKL will scale down the number of threads to the number of physical cores.
MKL_DYNAMIC being TRUE means that Intel MKL will always try to pick what it considers the best number of threads, up to the maximum specified by the user. MKL_DYNAMIC being FALSE means that Intel MKL will not deviate from the number of threads the user requested, unless there are reasons why it has no choice. The value of MKL_DYNAMIC is by default set to TRUE, regardless of OMP_DYNAMIC, whose default value may be FALSE.
In general, you should set MKL_DYNAMIC to FALSE only under circumstances that Intel MKL is unable to detect, for example, when nested parallelism is desired where the library is called already from a parallel section. Please refer to "MKL_DYNAMIC" in the Intel MKL User's Guide for details.
MKL_DOMAIN_NUM_THREADS will allow user to suggest the number of threads for a particular function domain. The domain-specific settings take precedence over the overall ones. For example, the "MKL_BLAS=4" value of MKL_DOMAIN_NUM_THREADS suggests to try 4 threads for BLAS, regardless of later setting MKL_NUM_THREADS. Please refer to " MKL_DOMAIN_NUM_THREADS" in the Intel MKL User's Guide for details.
Introduction of additional threading control made it possible to optimize the commit stage of the FFT implementation and get rid of double data initialization. However, this optimization requires a change in the FFT usage. Suppose you create threads in the application yourself after initializing all FFT descriptors. In this case, threading is employed for the parallel FFT computation only, the descriptors are released upon return from the parallel region, and each descriptor is used only within the corresponding thread. Starting with Intel MKL 10.0, you must explicitly instruct the librar y before the commit stage to work on one thread. To do this, set MKL_NUM_THREADS=1 or MKL_DOMAIN_NUM_THREADS="MKL_FFT=1" or call the corresponding pair of service functions. Otherwise, the actual number of threads may be different because the DftiCommitDescriptor function is not in a parallel region. See Example C-27a "Using Parallel Mode with Multiple Descriptors Initialized in One Thread" in the Intel MKL Reference Manual.
The Intel® Math Kernel Library (Intel® MKL) contains functions that are more highly optimized for Intel microprocessors than for other microprocessors. While the functions in Intel® MKL offer optimizations for both Intel and Intel-compatible microprocessors, depending on your code and other factors, you will likely get extra performance on Intel microprocessors.
While the paragraph above describes the basic optimization approach for Intel® MKL as a whole, the library may or may not be optimized to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include Intel® Streaming SIMD Extensions 2 (Intel® SSE2), Intel® Streaming SIMD Extensions 3 (Intel® SSE3), and Supplemental Streaming SIMD Extensions 3 (Intel® SSSE3) instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors.
Intel recommends that you evaluate other library products to determine which best meets your requirements.