Using Intel® MKL with Threaded Applications

Submit New Article

Last Modified On :   April 2, 2009 3:25 AM PDT
Rate
 


Page Contents:


Memory Allocation MKL: Memory appears to be allocated and not released when calling some Intel® MKL routines (e.g. sgetrf).
One of the advantages of using the IntelMKL is that it is multithreaded using OpenMP*. OpenMP* requires buffers to perform some operations and allocates memory even for single-processor systems and single-thread applications. This memory allocation occurs once the first time the OpenMP software is encountered in the program. This memory allocation persists until the application terminates. In addition, the Windows* operating system will allocate a stack equal to the main stack for every additional thread created, so the amount of memory that is automatically allocated will depend on the main stack, the OpenMP allocations and the number of threads used.


Using Threading with BLAS and LAPACK
Intel MKL is threaded in a number of places: LAPACK (*GETRF, *POTRF, *GBTRF routines), BLAS, DFTs, and FFTs. Intel MKL uses OpenMP* threading software. There are situations in which conflicts can exist that make the use of threads in Intel MKL problematic. We list them here with recommendations for dealing with these. First, a brief discussion of why the problem exists is appropriate.

If the user threads the program using OpenMP directives and uses the Intel® Compilers to compile the program, Intel MKL and the user program will both use the same threading library. Intel MKL tries to determine if it is in a parallel region in the program, and if it is, it does not spread its operations over multiple threads. But Intel MKL can be aware that it is in a parallel region only if the threaded program and Intel MKL are using the same threading library. If the user program is threaded by some other means, Intel MKL may operate in multithreaded mode and the computations may be corrupted. Here are several cases and our recommendations:
  • User threads the program using OS threads (pthreads on Linux*, Win32* threads on Windows*). If more than one thread calls Intel MKL and the function being called is threaded, it is important that threading in Intel MKL be turned off. Set OMP_NUM_THREADS=1 in the environment.
  • User threads the program using OpenMP directives and/or pragmas and compiles the program using a compiler other than a compiler from Intel. This is more problematic because setting OMP_NUM_THREADS in the environment affects both the compiler's threading library and the threading library with Intel MKL. In this case, the safe approach is to set OMP_NUM_THREADS=1.
  • Multiple programs are running on a multiple-CPU system. In cluster applications, the parallel program can run separate instances of the program on each processor. However, the threading software will see multiple processors on the system even though each processor has a separate process running on it. In this case OMP_NUM_THREADS should be set to 1.
  • If the variable OMP_NUM_THREADS environment variable is not set, then the default number of threads will be assumed 1.

Setting the Number of Threads for OpenMP* (OMP)
The OpenMP* software responds to the environment variable OMP_NUM_THREADS:
  • Windows*: Open the Environment panel of the System Properties box of the Control Panel on Microsoft* Windows NT*, or it can be set in the shell the program is running in with the command: set OMP_NUM_THREADS=<number of threads to use>.
  • Linux*: To set and export the variableP "export OMP_NUM_THREADS=<number of threads to use>".
Note: Setting the variable when running on Microsoft* Windows* 98 or Windows* Me is meaningless, since multiprocessing is not supported.


Changing the Number of Processors for Threading During Runtime
It is not possible to change the number of processors during runtime using the environment variable OMP_NUM_THREADS. You can call OpenMP API functions from your program to change the number of threads during runtime. The following sample code demonstrates changing the number of threads during runtime using the omp_set_num_threads() routine:

#include "omp.h"
#include "mkl.h"
#include <stdio.h>

#define SIZE 1000

void main(int args, char *argv[]){

double *a, *b, *c;
a = new double [SIZE*SIZE];
b = new double [SIZE*SIZE];
c = new double [SIZE*SIZE];

double alpha=1, beta=1;
int m=SIZE, n=SIZE, k=SIZE, lda=SIZE, ldb=SIZE, ldc=SIZE, i=0, j=0;
char transa='n', transb='n';

for( i=0; i<SIZE; i++){
for( j=0; j<SIZE; j++){
a[i*SIZE+j]= (double)(i+j);
b[i*SIZE+j]= (double)(i*j);
c[i*SIZE+j]= (double)0;
}
}
cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans,
m, n, k, alpha, a, lda, b, ldb, beta, c, ldc);

printf("row a c ");
for ( i=0;i<10;i++){
printf("%d: %f %f ", i, a[i*SIZE], c[i*SIZE]);
}

omp_set_num_threads(1);

for( i=0; i<SIZE; i++){
for( j=0; j<SIZE; j++){
a[i*SIZE+j]= (double)(i+j);
b[i*SIZE+j]= (double)(i*j);
c[i*SIZE+j]= (double)0;
}
}
cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans,
m, n, k, alpha, a, lda, b, ldb, beta, c, ldc);

printf("row a c ");
for ( i=0;i<10;i++){
printf("%d: %f %f ", i, a[i*SIZE],
c[i*SIZE]);
}

omp_set_num_threads(2);
for( i=0; i<SIZE; i++){
for( j=0; j<SIZE; j++){
a[i*SIZE+j]= (double)(i+j);
b[i*SIZE+j]= (double)(i*j);
c[i*SIZE+j]= (double)0;
}
}
cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans,
m, n, k, alpha, a, lda, b, ldb, beta, c, ldc);

printf("row a c ");
for ( i=0;i<10;i++){
printf("%d: %f %f ", i, a[i*SIZE],
c[i*SIZE]);
}

delete [] a;
delete [] b;
delete [] c;
}


Can I use Intel MKL if I thread my application?
The Intel Math Kernel Library is designed and compiled for thread safety so it can be called from programs that are threaded. Calling Intel MKL routines that are threaded from multiple application threads can lead to conflict (including incorrect answers or program failures), if the calling library differs from the Intel MKL threading library.

New threading features in MKL 10.x
Please check Intel® MKL 10.0 threading on new threading feature introduced by Intel MKL 10.x





This article applies to: Intel® Math Kernel Library Knowledge Base,   Software Products General