Intel® Math Kernel Library

link error: libmkl_core.a depends on Open MPI (via libmkl_blacs_openmpi_lp64.a)

While invoking 2015.2.164 Intel icpc compiler, I have encountered a link error while linking against libmkl_core.a:

/opt/intel/composer_xe_2015.2.164/mkl/lib/intel64/libmkl_core.a(cpardiso_blacs_lp64.o): In function 'mkl_pds_lp64_cpardiso_mpi_barrier':

__work/lnx32e/_cpardiso/kernel/mpi_wrapper/cpardiso_blacs_lp64_h.f:(.text+0x6): undefined referece to 'MKL_Barrier'

Installation fails --Jave Class Not Found -- on MacBook Pro

I am trying to install the Math Kernel Library under: 

Product Subscription Information
Download Latest Update
Release Posted

Academic Research Performance Libraries from Intel (OS X*)


On my MacBook Pro, when I try to start the install shell, I receive the following message:


In JPanelLicenseOptions

Inside JPanelRegistrationBegin

Calling initComponents

Calling initializePanel

Complex 1-D DFT not respecting DFTI_THREAD_LIMIT


I recently noticed that when using a threaded 1-dimensional DFT, a DFTI_COMPLEX domain DFT does not appear to respect the DFTI_THREAD_LIMIT and instead always uses the threading value set by mkl_set_num_threads().  Furthermore, it appears that while a REAL domain DFT does obey DFTI_THREAD_LIMIT, its behavior has changed between MKL v11.1 and 11.2.

Slow rectangular matrix transposition ?


I'm working with MKL on Gentoo. I have an "Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz" processor.

I'm trying to speed my inplace matrix transpositions and for that I thought that mkl_?imatcopy would be the solution. I have a very speedup on square matrix, but on rectangular matrix it is much worse than my naive "follow the cycles" implementation.

Here is the call:

mkl_dimatcopy('R', 'T', rows, cols, 1.0, matrix_ptr, rows, cols);

When I profiled the executable, most of the cycles were spent in



I am trying to solve a relatively large system (100.000 equations) using either Pardiso or CG.

The system matrix is sparse, symmetric and converted to CSR format. For the matrix-vector multiplication

I use mkl_dcsrsymv for the RCI requests. My question is why Pardiso is way faster than CG? 

Shouldn't it be the other way around?

on a quad-core intel xeon it takes around 1 sec for pardiso, while CG needs around 30 sec.

These are the parameters i used on pardiso:

Significant Overhead if threaded MKL is called from OpenMP parallel region


my aim is to diagonalize quadratic matrices with different sizes dxd in parallel. To this end I wrote a for  loop. In each iteration the aligned memory (dependent on the dimension d) is allocated with mkl_malloc(). The matrix is filled and afterwards dsyev is called to determine the optimal workspace size. Then I allocate the (aligned) workspace needed with mkl_malloc(), call dsyev once again to diagonalize the matrices and deallocate the memory that was used for the workspace and to store the matrix (using mkl_free()). 

Iscriversi a Intel® Math Kernel Library