| June 22, 2009 12:00 AM PDT | |
|
Page Contents:
Memory Allocation MKL: Memory appears to be allocated and not released when calling some Intel® MKL routines (e.g. sgetrf). One of the advantages of using the IntelMKL is that it is multithreaded using OpenMP*. OpenMP* requires buffers to perform some operations and allocates memory even for single-processor systems and single-thread applications. This memory allocation occurs once the first time the OpenMP software is encountered in the program. This memory allocation persists until the application terminates. In addition, the Windows* operating system will allocate a stack equal to the main stack for every additional thread created, so the amount of memory that is automatically allocated will depend on the main stack, the OpenMP allocations and the number of threads used. If your program needs to free memory, call mkl_free_buffers(). If another call is made to a library function that needs a memory buffer, the memory manager again allocates the buffers and they again remain allocated until either the program ends or the program deallocates the memory. This behavior facilitates better performance. However, some tools may report this behavior as a memory leak. Please refer to the User's Guide for more details. Using Threading with BLAS and LAPACK Intel MKL is threaded in a number of places: LAPACK (*GETRF, *POTRF, *GBTRF routines and many others.), BLAS, DFTs, and FFTs. The more comprehensive list of MKL's routines are threaded see in the User's Guide, See chapter "Threaded Function and Problems". Intel MKL uses OpenMP* threading software. There are situations in which conflicts can exist that make the use of threads in Intel MKL problematic. We list them here with recommendations for dealing with these. First, a brief discussion of why the problem exists is appropriate. If the user threads the program using OpenMP directives and uses the Intel® Compilers to compile the program, Intel MKL and the user program will both use the same threading library. Intel MKL tries to determine if it is in a parallel region in the program, and if it is, it does not spread its operations over multiple threads. But Intel MKL can be aware that it is in a parallel region only if the threaded program and Intel MKL are using the same threading library. Please refer to the KB Article "Recommended settings for calling Intel® MKL routines from multi-threaded application" for getting our recommendation how to launch your application into a multithreaded environments. Setting the Number of Threads for OpenMP* (OMP) The OpenMP* software responds to the environment variable OMP_NUM_THREADS:
It is not possible to change the number of processors during runtime using the environment variable OMP_NUM_THREADS. You can call OpenMP API functions from your program to change the number of threads during runtime. The following sample code demonstrates changing the number of threads during runtime using the omp_set_num_threads() routine: #include "omp.h" #include "mkl.h" #include <stdio.h> #define SIZE 1000 void main(int args, char *argv[]){ double *a, *b, *c; a = new double [SIZE*SIZE]; b = new double [SIZE*SIZE]; c = new double [SIZE*SIZE]; double alpha=1, beta=1; int m=SIZE, n=SIZE, k=SIZE, lda=SIZE, ldb=SIZE, ldc=SIZE, i=0, j=0; char transa='n', transb='n'; for( i=0; i<SIZE; i++){ for( j=0; j<SIZE; j++){ a[i*SIZE+j]= (double)(i+j); b[i*SIZE+j]= (double)(i*j); c[i*SIZE+j]= (double)0; } } cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc); printf("row a c "); for ( i=0;i<10;i++){ printf("%d: %f %f ", i, a[i*SIZE], c[i*SIZE]); } omp_set_num_threads(1); for( i=0; i<SIZE; i++){ for( j=0; j<SIZE; j++){ a[i*SIZE+j]= (double)(i+j); b[i*SIZE+j]= (double)(i*j); c[i*SIZE+j]= (double)0; } } cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc); printf("row a c "); for ( i=0;i<10;i++){ printf("%d: %f %f ", i, a[i*SIZE], c[i*SIZE]); } omp_set_num_threads(2); for( i=0; i<SIZE; i++){ for( j=0; j<SIZE; j++){ a[i*SIZE+j]= (double)(i+j); b[i*SIZE+j]= (double)(i*j); c[i*SIZE+j]= (double)0; } } cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc); printf("row a c "); for ( i=0;i<10;i++){ printf("%d: %f %f ", i, a[i*SIZE], c[i*SIZE]); } delete [] a; delete [] b; delete [] c; } Can I use Intel MKL if I thread my application? The Intel Math Kernel Library is designed and compiled for thread safety so it can be called from programs that are threaded. It is fine to calling Intel MKL routines that are threaded from multiple application threads, like windows API CreateThread() or Pthread API etc. New threading features in MKL 10.x Please check Intel® MKL 10.0 threading on new threading feature introduced by Intel MKL 10.x |
| Optimization Notice |
|---|
|
The Intel® Math Kernel Library (Intel® MKL) contains functions that are more highly optimized for Intel microprocessors than for other microprocessors. While the functions in Intel® MKL offer optimizations for both Intel and Intel-compatible microprocessors, depending on your code and other factors, you will likely get extra performance on Intel microprocessors.
While the paragraph above describes the basic optimization approach for Intel® MKL as a whole, the library may or may not be optimized to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include Intel® Streaming SIMD Extensions 2 (Intel® SSE2), Intel® Streaming SIMD Extensions 3 (Intel® SSE3), and Supplemental Streaming SIMD Extensions 3 (Intel® SSSE3) instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Intel recommends that you evaluate other library products to determine which best meets your requirements. |
This article applies to: Intel® Math Kernel Library Knowledge Base, Software Products General
For more complete information about compiler optimizations, see our Optimization Notice.
Comments (0) 
Trackbacks (0)
Leave a comment 
Alexander Kobotov (Intel)
| ||
Gennady Fedorov (Intel)
|


