Intel® Math Kernel Library

Switching between threaded and non-threaded libraries


I have an application that uses Intel's MKL libraries. In some sections of this application calls to MKL are made in parallel areas. In these areas, I want to use the sequential version of the MKL libraries; at times I am not in a parallel area in my application, so I would like to use the threadedversion of the MKL libraries here.

PARDISO generates different results

Dear all,

I use PARDISO to solve a symmetric sparse matrix with size n=400,000 and number of nonzero=1,100,000. When I set the number of thread to one, I can get the same answer every time. But when I set the number of threads higher than 1 and run it many times, the answers become different every time (and never same as the answer by one thread). Are these weird results related to the issues of condition number or something else?

By the way, I have tried both  MKL 11.0.2 and 11.2.3.



Multidimensional DFT and OpenMP

I'm working on a program that performs several 3 x 3d (N1xN2xN3) DFTs using the MKL DFT algorithm. I'm running most of the program in parallel using OpenMP and I'd like to get as much parallel performance from the DFT section as well as it accounts for a significant portion of the programs runtime. However when I try to increase the number of threads I find that the performance improvement plateaus at 3 threads, i.e., the number of transforms for each call. If instead I break up the transform into 3xN1 2d transforms the parallel performance continues to scale beyond 3 threads.

Blocks of different sizes in ScaLAPACK?

I am performing a Cholesky factorization with Intel-MKL, which uses ScaLAPACK. I distributed the matrix, based on this example, where the matrix is distributed in blocks, which are of equal size (i.e. Nb x Mb). I tried to make it so that every block has it's own size, depending on which process it belongs, so that I can experiment more and maybe get better performance.

cblas_dnrm2 much slower than cblas_ddot

Dear all,

I run benchmarks on a sandy-bridge Intel processor (E5-4620) using Intel MKL 11.1. Here, I have found that cblas_dnrm2 is significantly slower (3.4 s) than the corresponding cblas_ddot call (0.5 s) using one thread. This is very surprising for me, because if I use cblas_ddot to calculate the 2-Norm it is faster (0.3 s) than cblas_dnrm2.

I have compiled with gcc-4.8.3 with following flags:

CXXFLAGS += -O3 -I${MKLROOT}/include

Fail to preform DFTI.DftiComputeBackward on 64 bit platform (windows)


I've built an example application in C# using visual studio 2012.
The program do FFT for the vector {1,2,3,4,5,6,7,8,9,10} and then reverse FFT and checks if the return values are the same as the original input vector.
I used the mkl_rt.dll version 11.2 found in "C:\Program Files (x86)\Intel\Composer XE 2015\redistia32\mkl". and "C:\Program Files (x86)\Intel\Composer XE 2015\redist\intel64\mkl"

Assine o Intel® Math Kernel Library