9.1.022 only using one core on AMD?

9.1.022 only using one core on AMD?

We are testing our application on a Dual/Dual core AMD. Despite OMP_NUM_THREADS=4, MKL is executing in serial mode ( or at least only using one core)

On a dual-core Intel, it is obviously using both cores with OMP_NUM_THREADS=2

MKL_SERIAL is not set in either case.

I can't think of what is going on here. I did not see any problems like this when we used MKL 9.0

(Windows XP 64)

7 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

I have proven this problem exists and seems related to calling the VML functions. Once a VML function is called MKL 9.1 behaves as if MKL_SERIAL = YES on AMD. Something happens when mkl_vml_p3.dll is loaded .
See my next post for the attached program

Added file as attachment since forum butchers posted code


Downloadtext/x-c++src test.cpp0 bytes

Thank you for posting the specifics. The engineering team is taking a look at this and I or someone on the team will provide an update when there is more information to share.


This is a known issue. It was fixed in MKL 10.0 Update 2. Try the latest version of MKL, it should work fine there.Sorry for inconvenience.


I can confirm the latest MKL 10.0 Update 2 fixes the problem. I will continue this discussion in Premier Support but this is a very upsetting and embarrassing problem for us as we have our AMD customers yelling at us. Yes we did QA on an AMD machine, but our primary QA is on Intel and the QA on AMD was mainly to check valid results not performance issues on 'large' problems.

Moving to MKL 10.0 is somewhat painful from 9.1 , requires a number of significant build changes, and is not a simple drop-in replacement.

Dear all,

We'revery sorry for the threading scalability issues you experienced with MKL 10.0 on AMD systems. This is unfortunate that engineering team couldn't pro-actively identify the issue in MKL 10.0 release time frame. We did our best to fix this issue as soon as we knew about this. The fix is already available in Update 2 release. We do our best toprovide high performance and scalability tocustomers who use AMD hardware.

According to engineering this is quite esoteric bug that by concourse of circumstances is detectable on some AMD systems. Our QA methods didn't account for such a bug. The bug is related to VML threading logic that predicts optimal number of threads to be used on particular hardware. Due to broken logic it could result in serial execution on AMD systems, and of course this is done not intentionally.

I'm sorry again about the problems you faced with. Please let us know if there are remaining problems. Feel free to provide details on what build issues you experience while migratingfrom 9.1 to 10.0. We would like to better understand what they are. Using Premier Support account is fine, of course.


Leave a Comment

Please sign in to add a comment. Not a member? Join today