Using Threaded Intel® MKL in Multi-Thread Application

Introduction

Intel® Math Kernel library ( Intel® MKL) is extensively parallelized by OpenMP* threading internally, which make any program linking with threaded MKL library get parallel automatically on multi-cores system. By default, MKL employ the OpenMP* software to set the number of threads and manage the threads. However, when the program is threaded by other means, for example, pthreads on Linux* OS, in order to avoid the overuse of multiple threads, we usually recommend our developers to turn off Intel MKL threading by either using the sequential library or by setting MKL_NUM_THREADS or OMP_NUM_THREADS =1. However, as more and more developers are ready to parallelize their applications by all kind of multi-thread methods, there are some considerations arise, like, is it possible to use mkl threading in multithreaded application? Can user threads and MKL internal threads run simultaneously?  And further consideration like, how to bind the mkl threads on the dedicated cores. For example, as in the forum discussions U404745 How to set affinity while using MKl and U38468 Problem with calling MKL gemm with pthread . A quick answer is YES, it is possible to use mkl thread in multi-thread application and user can control mkl thread affinity as needed.  

This article tries to address such questions and show several samples to explore the ways of binding Intel mkl threads to the processors, especially when call threaded mkl in a user threaded application.

First Question: is it possible to use Intel® mkl threads in an already multithreaded application?

Suppose that we may have many matrix calculations. Our intention is to do them in a multithreading application where every matrix is calculated by its dedicated thread.  Ref: Forum475357

 

Assume the test system with 2 packages x 4 cores/pkg x 2 threads/core (HT is on, 8 total physical cores, 16 logical cores). I hope to create 2 threads to do matrix multiply simultaneously and each thread start 4 mkl threads in its own thread. Is it possible?

 

The quick answer is YES, it is possible to use mkl threads in multi-thread application (see Ref[1]). Here is a small example.  

 

#define NUM_PTHREADS 2 

 

#define NUM_OMP_THREADS 4

 

mklTest(){

 

mkl_set_num_threads_local(NUM_OMP_THREADS);

 

mklcall();

 

}

 

//threadFunction;

 

void *threadfunc(void *pArg){

 

mklTest()

 

}

 

int main(void){

 

//Create 2 pthreads

 

pthread_create(&tThreads[i], NULL, threadfunc, &idThreads[i]);

 

}

 

Please see the detials in the attached pdf file.

Second Question: how to bind Intel® mkl threads to dedicated processor cores explicitly?

2.1 Set  MKL Threads Affinity Globally

There is often more complex saturation for our developers, where one needs to control the threads to dedicated processor cores explicitly. Luckily, Intel MKL is based on the Intel® Compiler's OpenMP* runtime library and  has the ability to bind OpenMP threads to physical processing units, which allows MKL developer to control mkl threads affinity with same way. Generally, there are two ways to do that.

Please see the details in the attached pdf file.

 

2.2 Set MKL Threads Affinity in Pthread

The above works, but it was based on global affinity. Our developers may need to control the thread affinity within pthread often. We had claimed that, MKL threads don’t know if it is in a parallel region in the pthreads program. So when considering mkl thread affinity, we need insert set_affinity in each POSIX thread we create at the point in the code where we want it to bind, this makes sure the OpenMP runtime library sees the POSIX thread and binds it according to our AFFINITY settings.

Here is the code  (Please see the details in the attached pdf file)

 

2.3 Set MKL Threads Affinity with KMP Affinity Function in pthread

In order to bind mkl threads to specified processor, we also try to employ the KMP affinity functions.

omp_set_num_threads(NUM_OMP_THREADS);    

         #pragma omp parallel default(shared)

         {    //get thread number

                   int ompTid = omp_get_thread_num();

                   // create omp mask                  

                   kmp_affinity_mask_t new_omp_mask;

                   kmp_create_affinity_mask(&new_omp_mask);

                   //bind omp threads on even cores                             

                   kmp_set_affinity_mask_proc(ompTid*2+ thread_id* runprocs, &new_omp_mask);

                  if (kmp_set_affinity(&new_omp_mask) != 0)

                    printf("thread_id=%d Error: kmp_set_affinity(%d, &new_omp_mask)\n", thread_id, ompTid);

                    printf("thread_id=%d, omp_tid=%d, new_mask=%08X \n", thread_id,ompTid,  *(unsigned int*)(&new_omp_mask)       );

                   }

 

2.4 Set Pthread Affinity, How MKL threads work?

When we try to bind MKL threads in pthread to dedicated processor, naturally, we will bind pthreads first, then mkl threads. However, as we mentioned above, the MKL threaded are managed by Intel OpenMP library. It can’t be aware if it is in a pthread parallel region and which pthread is calling it. One typical issue is that mkl thread doesn’t respect pthread affinity. For example, We bind pthread 1 on cpu 0-7 and pthread2 on cpu 8-15 using pthread_setaffinity_np().

Please see the details in attached pdf file

 

Summary

Thread affinity bind thread to CPU cores. Depending upon the topology of the machine, thread affinity can have a dramatic effect on the execution speed of a program. Intel® Math Kernel Library (Intel® MKL) is threaded by OpenMP, which allowed user to control MKL threads by the methods provided by OpenMP. Although the threaded MKL is not recommended to be used in high-level multi-thread program, it still can be used carefully by appropriate affinity setting. In this article, we try to explore the MKL thread behaviors in pthreads with global environment variable KMP setting, OS affinity function sched_setaffinity globally; OS affinity functions and kmp_set_affinity functions in pthread internally and also try pthread_getaffinity_np to set pthread affinitys, we got some basic ideas 

1.       Generally, current OS scheduler can perform very well when pthread and OpenMP thread are running simultaneously, so IN MOST OF CASES (not guaranteed) , when the task in each pthread is average, the thread affinity  is not needed.

2.       Though OpenMP thread and pthread do not know each other, but in most of cases (not guaranteed), they don’t mess up the threads. So we can do global affinity setting by environment variable or OS affinity function.

3.       Though OpenMP thread and pthread don’t know each other, we still can control the OpenMP thread’s affinity by inserting set affinity function at the point where we want it to bind. We try to pass the pthread number to discern which OpenMP thread belong to which pthreads.

4.       As multi threads are running randomly, the out of order of execution bring some problems, for example, MKL thread doesn’t follow up the pthread’s affinity as OpenMP thread can’t estimate which pthread is running at present.

Additional References

 

1.          http://software.intel.com/en-us/articles/intel-math-kernel-library-intel-mkl-using-intel-mkl-with-threaded-applications

 

2.          http://software.intel.com/en-us/articles/recommended-settings-for-calling-intelr-mkl-routines-from-multi-threaded-applications

 

3.          http://software.intel.com/en-us/articles/setting-thread-affinity-on-smt-or-ht-enabled-systems

 

4.          Intel® Math Kernel Library Reference Manual

 

5.          How to set affinity while using MKl in sequential mode

 

6.          Problem with calling MKL gemm with pthread

 

7.          How to set affinity of threads spawned by MKL?

 

附件尺寸
下载 mkl-affinity.pdf214.96 KB
如需更全面地了解编译器优化,请参阅优化注意事项