Why the MKL can only call 4 threads?

Why the MKL can only call 4 threads?

Hi,

The processor option of our workstation is Intel Xeon X5550 2.ttGHz/8MB. It has 4 cpus and each cpu has 2 cores. In my code, I have set OMP_NUM_THREADS=8 and MKL_NUM_THREADS=8 by the commands omp_set_num_threads (8) and mkl_set_num_threads (8). But the mkl part, where the DSS and LAPACK are used to factorize some sparse and full matrices, only can call 4 threads. While the other c++ part runs with 8 threads. How can I call 8 threads at the mkl part? Thanks so much!

Best regards,
Shiquan

11 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hello Shiquan,

Please take a look at the description of mkl_set_num_threads () function in the Intel MKL Manual. It contains the next phrases:
"

This function allows you to request independently of OpenMP* how many threads MKL should

use. This is just a hint, and it is not guaranteed that this number of threads will be used. Enter

a positive integer.

"

Best regards,
Artem

Dear Artem,

Thanks for your kindly reply. I have noticed the description before. But the 8 threads of my workstation are available. That means no other program runs at the computer simultaneously. However, the mkl part still runs with only 4 threads. How can I make it running with the whole 8 threads? Should I set something or the mkl can only recognize the 4 cpus, but ignore that there are 8 cores? The other c++ parts of this program can parallelize with 8 threads well.

Best regards,

Shiquan

Shiquan,

In order to see how many threadsare used during MKL parallelization with libiomp5 library please set the following envs:

KMP_AFFINITY=verbose

Sorry, what kind of OS do you use Linux or Windows?

Thanks,
-- Victor

Dear Victor,

I encouter the sameproblem on both Linux and Windows system.

Thanks,
Shiquan

So, on Linux please use

export KMP_AFFINITY=verbose

or

export KMP_AFFINITY=verbose,$KMP_AFFINITY

in case if youuse some value already

And send us the output which is to be like as follows for 8-threads on my machine:

OMP: Info #204: KMP_AFFINITY: decoding cpuid leaf 11 APIC ids.
OMP: Info #202: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: {0,1,2,3,4,5,6,7}
OMP: Info #156: KMP_AFFINITY: 8 available OS procs
OMP: Info #157: KMP_AFFINITY: Uniform topology
OMP: Info #179: KMP_AFFINITY: 2 packages x 4 cores/pkg x 1 threads/core (8 total cores)
OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {0,1,2,3,4,5,6,7}
OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {0,1,2,3,4,5,6,7}
OMP: Info #147: KMP_AFFINITY: Internal thread 7 bound to OS proc set {0,1,2,3,4,5,6,7}
OMP: Info #147: KMP_AFFINITY: Internal thread 6 bound to OS proc set {0,1,2,3,4,5,6,7}
OMP: Info #147: KMP_AFFINITY: Internal thread 5 bound to OS proc set {0,1,2,3,4,5,6,7}
OMP: Info #147: KMP_AFFINITY: Internal thread 4 bound to OS proc set {0,1,2,3,4,5,6,7}
OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set {0,1,2,3,4,5,6,7}
OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {0,1,2,3,4,5,6,7}

Thanks,
-- Victor

Hi, Victor,

I have set the env variable and the output is:

OMP: Info #204: KMP_AFFINITY: decoding cpuid leaf 11 APIC ids.

OMP: Info #202: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info

OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: {0,1,2,3,4,5,6,7}

OMP: Info #156: KMP_AFFINITY: 8 available OS procs

OMP: Info #157: KMP_AFFINITY: Uniform topology

OMP: Info #179: KMP_AFFINITY: 1 packages x 4 cores/pkg x 2 threads/core (4 total cores)

OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {0,1,2,3,4,5,6,7}

OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {0,1,2,3,4,5,6,7}

OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {0,1,2,3,4,5,6,7}

OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set {0,1,2,3,4,5,6,7}

OMP: Info #147: KMP_AFFINITY: Internal thread 4 bound to OS proc set {0,1,2,3,4,5,6,7}

OMP: Info #147: KMP_AFFINITY: Internal thread 5 bound to OS proc set {0,1,2,3,4,5,6,7}

OMP: Info #147: KMP_AFFINITY: Internal thread 6 bound to OS proc set {0,1,2,3,4,5,6,7}

OMP: Info #147: KMP_AFFINITY: Internal thread 7 bound to OS proc set {0,1,2,3,4,5,6,7}

It is the same with yours. But the mkl part still runs with only 4 threads while the other c++ parts of this program can parallelize with 8 threads well.

The intel version in our computer is: /opt/intel/Compiler/11.1/064

Thanks,

Shiquan

Best Reply

You have
1 packages x 4 cores/pkg x 2 threads/core
but MKL uses just 1 thread per core => 4 in total

See Intel MKL threading behavior on Hyper-Threading systemsfor more details

Thanks,
-- Victor

Dear Victor,

Thanks for your help.

According to your advice, I can call the whole 8 threads in my mkl code now. But the code keeps running in the mkl function and can not give the response. I have encountered the similar problem on my laptop with 2 cores and Windows system before. If I select the Parallel option in MKL, the code will keep running and can not finish. Or it even takes more time than the Sequential version and return wrong results. But the Sequential version can finish quickly.

How can I solve this problem? Thanks.

I have set: MKL_DYNAMIC=FALSE

MKL_NUM_THREADS= 8

and

KMP_AFFINITY=granularity=fine,compact,1,0

Best regards,

Shiquan

I have noticed that:

Hyper-Threading Technology (HT Technology) is especially effective when each thread is performing different types of operations and when there are under-utilized resources on the processor. Intel MKL fits neither of these criteria as the threaded portions of the library execute at high efficiencies using most of the available resources and perform identical operations on each thread. You may obtain higher performance when using Intel MKL without HT Technology enabled.

So, maybe I should not use another 4 threads in mkl any more.

Thanks all of you for kindly help!

Best regards,
Shiquan

Yes, Shiquan,you are right.The similar interesting discussions regarding how HT affect on MKL performance, you can findhere.--Gennady

Leave a Comment

Please sign in to add a comment. Not a member? Join today