Intel® Math Kernel Library

DSS/Pardiso crash

I've been trying to use Pardiso in MKL 10 (release 3), and seeing a variety of crashes, such as:

my.exe!_c_amuxy_pardiso() + 0x168e bytes Fortran

my.exe!_do_all_pardiso_fc() + 0x343d bytes Fortran

my.exe!_pardiso_c() + 0x1005 bytes C

Oddly enough, no crashes occur when the debugger is attached, which makes for some interesting debug opportunities.

Is this package not fully functional? The documentation seems downright contradictory about what is and isn't supported, especially in terms of symmetric/non-symmetric problems.

parallel mpi program linked with mkl libraries

Dear all,

I am trying to compile the code SIESTA(a scientific code : on a SGI altix 3200 clusters with the intel fortran compiler, INTEL MKL ( and SGI MPI libraries.

The SIESTA code uses BLAS, LAPACK (or scalapack) libraries.
I succeed in compiling the sequential version of the code, but I would like now to compile a parallel version.

Problems linking with Intel MKL to create a FFT dll

I have used Intel MKL to calculate the one-dimensional complex FFT from a set of data with no problems.

The program has a main function and a FFT function and both work. Now i have linked the FFT function as a dll to call it from C# and it hangs due to trying to read from a null pointer just at DftiCreateDescriptor(...).

MKL_Complex16* Dades = new MKL_Complex16[n];

MKL_LONG N2 = n;


long status;

for (int m=0;m


Dades[m].real = DadesReals[m];

Dades[n].imag = DadesCompl[m];


1D FFT perf


I am trying to replicate the 1D FFT perf numbers listed here ( For example, what seems like 17+ Gflops for n=1024. On my 2.8 GHz Harpertown I am only able to get about 7 Gflops for a single call to DftiComputeForward (single-precision, complex, in-place). If I average the time for 1000 repeated calls, then I get ~14 Gflops (I assume that's because the entire data set fits easily in L2).

Different numerical answers when calling mkl_set_num_threads (1)?

We are using MKL for some scientific computations and have just put a call to mkl_set_num_threads ( 1 ) in our product (because we found that running 2 executables on a dual processor machine similtaneouslywas about twice as slow without the call). However, the numberswe are getting are now slightly different to before. So, the question is: does setting the number of threads to1 change the numerics of the LAPACKand/or BLAS routines?



Intel MKL 10.0 Update 3 is now available

Intel MKL 10.0 Update 3 is now available. Update 3 includes the following improvements:

  • Improved IA-32 version of SGEMM by 1.4 to 1.5 times for Intel Core microarchitecture.
  • ZGEMM3M was sped up by up to 10 times for Intel Itanium processors and by up to 3 times for Intel Core2 Quad processors.
  • The performance of factorization routines *GETRF, *POTRF, *GEQRF, SVD (bi-diagonalization routine) and the Symmetric Banded Eigensolver (tridiagonalization routine) have been improved significantly on multi-core.

MKL "only" twice faster on a 8-Cores Machine


I wrote a Conjugate Gradient solver to solve big sparse systems and I parallelized it with SSE and OpenMP.

I'm working on a 8-Cores Mac Pro, and I reached a speedup of 9.0x in some cases (wrt to my non-parallel implementation).

I would compare my results with MKL so I used the dcg_init, dcg_check, dcg and dcg_get methods to solve my system.

For the crucial part of the Conjugate Gradient (the sparse matrix vector multiplication) I used the mkl_dcsrmv method.

Subscribe to Intel® Math Kernel Library