Intel® Math Kernel Library

Announcing new open source project Intel® Math Kernel Library for Deep Neural Networks (Intel® MKL-DNN)

Intel® Math Kernel Library for Deep Neural Networks (Intel® MKL-DNN) is now available on the Github ( as an open source performance library for Deep Learning (DL) applications intended for acceleration of DL frameworks on Intel® architecture. Intel® MKL-DNN includes highly vectorized and threaded building blocks to implement convolutional neural networks (CNN) with C and C++ interfaces.

Intel® MKL 11.3.3 patch

There are two listed below limitations with Intel® Math Kernel Library (Intel® MKL) 11.3 Update 3 which were discovered recently. The official fix of these issues will be available the nearest update Intel MKL 11.3.4.

If you require an immediate Intel MKL update to address these issues, please submit a ticket at Intel Premier Support ( for the Intel MKL product.

Known Limitations: 

  • FreeBSD*
  • Linux*
  • Microsoft Windows* (XP, Vista, 7)
  • Microsoft Windows* 10
  • Microsoft Windows* 8.x
  • Unix*
  • Fortran
  • Advanced
  • Beginner
  • Intermediate
  • Intel® Math Kernel Library
  • Intel® Advanced Vector Extensions (Intel® AVX)
  • sgemm
  • Intel® AVX2
  • Deep Neural Network extensions for Intel MKL

        Deep neural network (DNN) applications grow in importance in various areas including internet search engines, retail and medical imaging. Intel recognizes importance of these workloads and is developing software solutions to accelerate these applications on Intel Architecture that will become available in future versions of Intel® Math Kernel Library (Intel® MKL) and Intel® Data Analytics Acceleration Library (Intel® DAAL).

    While we are working on new functionality we published a series of articles demonstrating DNN optimizations with Caffe framework and AlexNet topology:

    jit_gemm_convolution bwd data is too slow

    Hi, I encountered a performance issue on jit_gemm_convolution, I have one convolution primitive whose input is: stride_w = 2 and jcp.t_pad = 3, so it can not go through avx512 or avx2 path, it go to jit_gemm_convolution path, however, our workload is dealing with small batch with large input, suppose it is 2*3*2240*2240, batch size = 2 on googlenet v1, running on xeon phi(68 cores). In jit_gemm_convolution bwd data execute, it will seperate it as 2 thread, each thread dealing with one batch(3*2240*2240). so it is very slow(sgemm and col2img are running on two cores).

    Visual Studio 2017 support for MKL? Update 3 timeline?


    Is it correct that the latest MKL Update 2 doesn't support MSVS 2017 yet? If so, is there any time estimate for when this might become available? Is it planned for an Update 3 or earlier/later or unknown at this point?

    I think that many MKL users (companies), including myself, are holding the upgrade to VS2017 b/c of the MKL :)

    Thanks, Oleg

    MKL Lapack parallel subroutine

    Hello all

    I am using the lapack subroutine 'dgelsd' in order to calculate the linear least square solution of (||Ax-b||) system. For that I have used Intel MKL Parallel library. When I run my code I can see that only 57% of the total CPU is used. Also setting the number of threads for MKL also has no effect. For that I used

    call mkl_set_num_threads( 32 )

    I am working on the workstation, whose specs are given below:

    Beginner's guide to compile MKL in windows10 without a full VS2015 installation

    I would like to share my experience compiling MKL in windows 10 without a full VS2015 installation. Hopefully this will save some unnecessary trial and error efforts. Since I was doing it on my Intel laptop, which has a very precious and limited SSD space (unless you charge your department for an SSD upgrade), I cannot afford (>16GB) full VS2015 installation on my precious SSD.

    Here are the steps:

    Layout of structure dnnlayout_t in MKL DNN routines

    Good evening, 

    I am starting to use the Deep neural network routines in MKL. The definition of each node (weights, etc) is stored in a variable of type dnnlayout_t which is opaque. But, how I can save or load a trained network if I cannot extract the weights? So, I need the internal layout of that type or a way to access it.

    Thanks in advance,


    In the include file, the type is defined as following:

    Subscribe to Intel® Math Kernel Library