| Last Modified On : | April 2, 2009 3:25 AM PDT |
Rate |
|
|
Page Contents:
Memory Allocation MKL: Memory appears to be allocated and not released when calling some Intel® MKL routines (e.g. sgetrf). One of the advantages of using the IntelMKL is that it is multithreaded using OpenMP*. OpenMP* requires buffers to perform some operations and allocates memory even for single-processor systems and single-thread applications. This memory allocation occurs once the first time the OpenMP software is encountered in the program. This memory allocation persists until the application terminates. In addition, the Windows* operating system will allocate a stack equal to the main stack for every additional thread created, so the amount of memory that is automatically allocated will depend on the main stack, the OpenMP allocations and the number of threads used. Using Threading with BLAS and LAPACK Intel MKL is threaded in a number of places: LAPACK (*GETRF, *POTRF, *GBTRF routines), BLAS, DFTs, and FFTs. Intel MKL uses OpenMP* threading software. There are situations in which conflicts can exist that make the use of threads in Intel MKL problematic. We list them here with recommendations for dealing with these. First, a brief discussion of why the problem exists is appropriate. If the user threads the program using OpenMP directives and uses the Intel® Compilers to compile the program, Intel MKL and the user program will both use the same threading library. Intel MKL tries to determine if it is in a parallel region in the program, and if it is, it does not spread its operations over multiple threads. But Intel MKL can be aware that it is in a parallel region only if the threaded program and Intel MKL are using the same threading library. If the user program is threaded by some other means, Intel MKL may operate in multithreaded mode and the computations may be corrupted. Here are several cases and our recommendations:
Setting the Number of Threads for OpenMP* (OMP) The OpenMP* software responds to the environment variable OMP_NUM_THREADS:
Changing the Number of Processors for Threading During Runtime It is not possible to change the number of processors during runtime using the environment variable OMP_NUM_THREADS. You can call OpenMP API functions from your program to change the number of threads during runtime. The following sample code demonstrates changing the number of threads during runtime using the omp_set_num_threads() routine: #include "omp.h" #include "mkl.h" #include <stdio.h> #define SIZE 1000 void main(int args, char *argv[]){ double *a, *b, *c; a = new double [SIZE*SIZE]; b = new double [SIZE*SIZE]; c = new double [SIZE*SIZE]; double alpha=1, beta=1; int m=SIZE, n=SIZE, k=SIZE, lda=SIZE, ldb=SIZE, ldc=SIZE, i=0, j=0; char transa='n', transb='n'; for( i=0; i<SIZE; i++){ for( j=0; j<SIZE; j++){ a[i*SIZE+j]= (double)(i+j); b[i*SIZE+j]= (double)(i*j); c[i*SIZE+j]= (double)0; } } cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc); printf("row a c "); for ( i=0;i<10;i++){ printf("%d: %f %f ", i, a[i*SIZE], c[i*SIZE]); } omp_set_num_threads(1); for( i=0; i<SIZE; i++){ for( j=0; j<SIZE; j++){ a[i*SIZE+j]= (double)(i+j); b[i*SIZE+j]= (double)(i*j); c[i*SIZE+j]= (double)0; } } cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc); printf("row a c "); for ( i=0;i<10;i++){ printf("%d: %f %f ", i, a[i*SIZE], c[i*SIZE]); } omp_set_num_threads(2); for( i=0; i<SIZE; i++){ for( j=0; j<SIZE; j++){ a[i*SIZE+j]= (double)(i+j); b[i*SIZE+j]= (double)(i*j); c[i*SIZE+j]= (double)0; } } cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc); printf("row a c "); for ( i=0;i<10;i++){ printf("%d: %f %f ", i, a[i*SIZE], c[i*SIZE]); } delete [] a; delete [] b; delete [] c; } Can I use Intel MKL if I thread my application? The Intel Math Kernel Library is designed and compiled for thread safety so it can be called from programs that are threaded. Calling Intel MKL routines that are threaded from multiple application threads can lead to conflict (including incorrect answers or program failures), if the calling library differs from the Intel MKL threading library. New threading features in MKL 10.x Please check Intel® MKL 10.0 threading on new threading feature introduced by Intel MKL 10.x |
