Multi-thread MKL cblas_sgemm with g++ problem

Multi-thread MKL cblas_sgemm with g++ problem

Here's an example of sgemm program.

#include <mkl.h>
#include <iostream>
#include <cstdlib>
#define ITERATION 1

int main()
  int ra = 128;
  int lda = 75;
  int ldb = 55;
  float* left = (float*)calloc(ra * lda, sizeof(float));
  float* right = (float*)calloc(ldb * lda, sizeof(float));
  float* ans = (float*)calloc(ra * ldb, sizeof(float));
  std::cout << "left " << std::endl;
  for (int i = 0; i < ra; ++i) {
    for (int j = 0; j < lda; ++j) {
      left[i * lda + j] = static_cast <float> (rand()) / static_cast <float> (RAND_MAX);
      std::cout << left[i * lda + j] << " ";
    std::cout << std::endl;

  std::cout << "right " << std::endl;
  for (int i = 0; i < lda; ++i) {
    for (int j = 0; j < ldb; ++j) {
      right[i * ldb + j] = static_cast <float> (rand()) / static_cast <float> (RAND_MAX);
      std::cout << right[i * ldb + j] << " ";
    std::cout << std::endl;

  for (int i = 0; i < ITERATION; ++i) {
    cblas_sgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, ra, ldb, lda, 1.0f, left, lda,
      right, ldb, 0.0f, ans, ldb);

  std::cout << "ans " << std::endl;
  for (int i = 0; i < ra; ++i) {
    for (int j = 0; j < ldb; ++j) {
      std::cout << ans[i * ldb + j] << " ";
    std::cout << std::endl;

  return 0;

I compile this program with g++ by options `-fopenmp -lmkl_rt`, where `OMP_NUM_THREADS` has been set to 16. 

After running the program, I figure out that the answer is exactly wrong comparing to the matlab result. I wouldn't say wrong if there's only few accuracy errors. Further, I observe that the program performs well under these conditions:

  1. Use icc instead of g++,
  2. Remove -fopenmp flag,
  3. Use g++&atlas instead of icc&mkl

Therefore, I guess the problem may lay on the `-fopenmp` flag. Can you help me figure out the problem? Thank you!

g++ (GCC) 4.4.7 20120313 (Red Hat 4.4.7-16)

icc (ICC) 16.0.3 20160415

Linux core 2.6.32-279.el6.x86_64


4 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hi Keren,

Thanks for raise the question here.    the problem looks be here.

mkl_rt is Intel MKL Single Dynamic Library (SDL).    As mkl user guide explained : the  SDL enables you to select the interface and threading library for Intel MKL at run time. By default, linking
with SDL provides:
• Intel LP64 interface on systems based on the Intel® 64 architecture
• Intel interface on systems based on the IA-32 architecture
• Intel threading
To use other interfaces or change threading preferences, including use of the sequential version of Intel MKL,
you need to specify your choices using functions or environment variables as explained in section
Dynamically Selecting the Interface and Threading Layer.

So if you compiler it with GNU G++ and GNU openmp threading,  Could you please try to  export :



then run your exe and see if it can get  expected result.

Best Regards,

Other related On-line article about

I tried to export MKL_INTERFACE_LAYER=GNU, but it does not help.

Alternatively, I choose to link `-lmkl_intel_lp64 -lmkl_core -lmkl_gnu_thread -lpthread -ldl` instead of the single dynamic library, and it turns out that problems are solve by doing this.

Hi Keren, 

Glad you try them out. 

Just add comments so other may understand 


-lmkl_intel_lp64  -lmkl_gnu_thread -lmkl_core -lpthread -ldl

Best Regards,


Leave a Comment

Please sign in to add a comment. Not a member? Join today