Intel® Math Kernel Library

Truncated DFTI

Hello there,

This might be a silly question, so pardon me if it is.

Let us say that I want to perform M 1D DFTs (let us say, complex-complex) on an array of dimension N, but I only need the first P entries instead of M (e.g. by construction, I know that only the first P entries will be nonzero).

Where are the pkg-config files?

This article says that you can use pkg-config with MKL:

But I don't see the pkg-config files installed anywhere (OS X MKL 18 Update 1).

The article also includes the instruction "1. Go to the <mkl_install_dir>/mkl/bin/pkgconfig directory​"  

But there's no no such folder

ls /opt/intel/mkl/bin

only shows





Dense-Sparse Matrix Multiplication routine instead if sparse-dense multiplication


I have a problem where I need to multiply a dense matrix by a sparse matrix. The function "mkl_?csrmm" asks for the first matrix to be sparse and the second to be dense. But my case is opposite.

Problem: C = A * B + C , where A is dense and B is sparse. 

I know that I can use "mkl_?csrmm" after taking the transpose of both matrices A and B but the transpose operation will be costly. Is there a better way or existing routine for dense-sparse matrix multiplication?

How to avoid redundant factorizations in Pardiso for a transient analysis


For a transient problem, [A]{x}t={b}t ,the RHS {b}is time-varying and depends on the previous {b}t-dt, but [A] is constant. So during a complete analysis, we need to do factorization only once, and use this factorization to solve successive RHS. 

In this situation, multi-rhs is not suitable, and I tried to call the complete Pardiso (phase=11, 22, 33 and not execute phase=-1) at the 1st time step, then only executed phase=33 at later time steps. This way did not work however. 

Do you have any suggestion?

Benchmarking GEMM on Intel® Architecture Processors


Math libraries, such as the Intel® Math Kernel Library (Intel® MKL) and BLIS* framework, provide fast implementations for many frequently used math routines. In this article, we show how to measure the performance of SGEMM/DGEMM (single- and double-precision floating point GEMM) using the implementations provided by Intel® MKL and BLIS* framework.


  • Intel® Math Kernel Library
  • Development Tools
  • Normalize matrix by sum of columns

    I have a tensor - batch of matrixes dims [10 x 6 x 52]  10  matrixes 6 * 52 raw major.  I can change batch size as I want. Data type is - single float. And I  need to normalize every matrix in the tensor by it columns sum(so sum will be a vector of length 52). So I need make a columnwise sum and devide every row in matrix to it.  A pretty typical task in different areas.  Currently, I am doing something like this:


    //[10 x 6 x 52] - [batch x actions x cards_count]

    // node.regrets is target and source tensor.  node.regrets_sum - storage for sum.

    MKL linking process is very long in release mode (VS2016)

    I have static lib myLib.lib and executable myExe.exe that is using myLib. MKL is enabled for the mylib, the model is "sequential" in the VS Intel Performance Library settings (for exe project I have tried to build both with and without enabled MKL, looks like doesn't matters ).  In debug all building fast(with or without mkl). In release without of MKL - build is fast, and with enabled MKL linking process of myExe.exe is very-very long - something like 15 minutes(without using mkl all build time is 1 min). I have tried both MS C++ compiler and linker and Intel Compiler and linker.

    Subscribe to Intel® Math Kernel Library