Memory Allocation MKL: Memory appears to be allocated and not released when calling some Intel® MKL routines (e.g. sgetrf).
One of the advantages of using the Intel MKL is that it is multithreaded using OpenMP*. OpenMP* requires buffers to perform some operations and allocates memory even for single-processor systems and single-thread applications. This memory allocation occurs once the first time the OpenMP software is encountered in the program. This memory allocation persists until the application terminates. In addition, the Windows* operating system will allocate a stack equal to the main stack for every additional thread created, so the amount of memory that is automatically allocated will depend on the main stack, the OpenMP allocations and the number of threads used. If your program needs to free memory, call mkl_free_buffers(). If another call is made to a library function that needs a memory buffer, the memory manager again allocates the buffers and they again remain allocated until either the program ends or the program deallocates the memory. This behavior facilitates better performance. However, some tools may report this behavior as a memory leak.
Please refer to the User's Guide for more details.
Using Threading with BLAS and LAPACK
Intel MKL is threaded in a number of places: LAPACK (*GETRF, *POTRF, *GBTRF routines and many others.), BLAS, DFTs, and FFTs. The more comprehensive list of MKL's routines are threaded see in the User's Guide, See chapter "Threaded Function and Problems". Intel MKL uses OpenMP* threading software. There are situations in which conflicts can exist that make the use of threads in Intel MKL problematic. We list them here with recommendations for dealing with these. First, a brief discussion of why the problem exists is appropriate.
If the user threads the program using OpenMP directives and uses the Intel® Compilers to compile the program, Intel MKL and the user program will both use the same threading library. Intel MKL tries to determine if it is in a parallel region in the program, and if it is, it does not spread its operations over multiple threads. But Intel MKL can be aware that it is in a parallel region only if the threaded program and Intel MKL are using the same threading library.
Please refer to the KB Article "Recommended settings for calling Intel® MKL routines from multi-threaded application" for getting our recommendation how to launch your application into a multithreaded environments.
Setting the Number of Threads for OpenMP* (OMP)
The OpenMP* software responds to the environment variable OMP_NUM_THREADS:
Changing the Number of Processors for Threading During Runtime
To improve performance of Intel MKL routines, use the following OpenMP and threading settings:
For more information, see the Knowledge Base article at http://software.intel.com/en-us/articles/performance-tips-of-using-intel-mkl-
Intel MKL 11.3 Beta update 1 has introduced Intel TBB support. Intel MKL 11.3 can increase performance of applications threaded using Intel TBB. Applications using Intel TBB can benefit from the following Intel MKL functions:
For more information, see the Knowledge Base article at
The Intel® Math Kernel Library (Intel® MKL) contains functions that are more highly optimized for Intel microprocessors than for other microprocessors. While the functions in Intel® MKL offer optimizations for both Intel and Intel-compatible microprocessors, depending on your code and other factors, you will likely get extra performance on Intel microprocessors.
While the paragraph above describes the basic optimization approach for Intel® MKL as a whole, the library may or may not be optimized to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include Intel® Streaming SIMD Extensions 2 (Intel® SSE2), Intel® Streaming SIMD Extensions 3 (Intel® SSE3), and Supplemental Streaming SIMD Extensions 3 (Intel® SSSE3) instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors.
Intel recommends that you evaluate other library products to determine which best meets your requirements.
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804