Intel® Math Kernel Library Release Notes and New Features

This page provides the current Release Notes for Intel® Math Kernel Library. The notes are categorized by year, from newest to oldest, with individual releases listed within each year.

Click a version to expand it into a summary of new features and changes in that version since the last release, and access the download buttons for the detailed release notes, which include important information, such as pre-requisites, software compatibility, installation instructions, and known issues.

You can copy a link to a specific version's section by clicking the chain icon next to its name.

To get product updates, log in to the Intel® Software Development Products Registration Center.
For questions or technical support, visit Intel® Software Developer Support.

Please see the following links to the online resources and documents for the latest information regarding Intel MKL:

2018

Update 1

Release Notes

What’s New in Intel® Math Kernel Library (Intel® MKL) version 2018 Update 1

  • BLAS
    • Improved single precision and single precision complex Level 3 BLAS performance for Intel® Xeon Phi™ processors based on Intel® Advanced Vector Extensions 512 (Intel® AVX-512) with support for AVX512_4FMAPS instructions
    • Improved irregular-shaped SGEMM performance on Intel® Xeon Phi™ processor x200
    • Added stack unwind support to internal Intel64 assembly kernels on Windows OS
    • Improved MKL_DIRECT_CALL DGEMM performance on Intel® Advanced Vector Extensions 2 (AVX2) for  Intel and GNU C/C++ compilers
  • Sparse BLAS
    • Improved performance of Inspector-Executed mode of  SpMV for CSR format
    • Improved performance of SpMM routine for CSR format
    • Improved performance of Inspector-Executed mode of  SpMV for BSR format in Intel TBB threading layer
  • LAPACK
    • Improved performance of ?(OR|UN)GQR, ?GEQR and ?GEMQR routines in Intel® TBB threading layer
    • Introduced LAPACKE_set_nancheck routine for disabling/enabling nan checks in LAPACKE functions
  • ScaLAPACK:
    • Added optimizations (2-stage band reduction algorithm) for pdsyevr/pzheevr for JOBZ=’N|V’ and for RANGE=A’. New algorithm is enabled for N>=4000 and for appropriate process grids; otherwise traditional algorithm is used. Best possible speed-up is expected for larger matrices.
  • FFT
    • Improved performance for batched real-to-complex 3D for  Intel® Xeon® Processor supporting Intel® Advanced Vector Extensions 512 (Intel® AVX-512) (codename Skylake Server)
    • Improved performance with and without scaling factor across all domains.
  • Sparse Solvers
    • Improved Intel Pardiso performance for small matrices with (iparm(24)=10)
  • Vector Mathematics
    • The default behavior has changed for unmasked exception handling. By default, all floating-point exceptions are now masked before any internal MKL VM computation, whereas until now exceptions unmasked by the user applied to internal computations as well. As a new feature, the user can employ four newly added modes (VML_TRAP_INVALID, VML_TRAP_DIVBYZERO, VML_TRAP_OVERFLOW, and VML_TRAP_UNDERFLOW) to trap on unmasked exceptions raised during internal computation of vector math functions.
  • Data Fitting and Vector Statistics
    • Introduced TBB-threading layer in MKL Data Fitting and Vector Statistics components
  • Library Engineering
    • Added pkg-config files to simplify compilation of applications and libraries with MKL.

Known limitation:

Data Fitting and Vector Statistics: Work in oversubscribed mode is not supported in this release. Please do not set number of TBB threads more than logical cores number

Initial Release

What’s New in Intel® Math Kernel Library (Intel® MKL) version 2018

  • BLAS Features:
    • Introduced compact GEMM and TRSM functions (mkl_{s,d,c,z}gemm_compact and mkl_{s,d,c,z}trsm_compact) to work on groups of matrices in compact format and service functions to support the new format
    • Introduced optimized integer matrix-matrix multiplication routines GEMM_S8U8S32 and GEMM_S16S16S32 to work with quantized matrices for all architectures.
  • BLAS Optimizations: 
    • Optimized SGEMM and SGEMM packed for Intel® Xeon Phi™  processors based on Intel® Advanced Vector Extensions 512 (Intel® AVX-512) with support of AVX512_4FMAPS and AVX512_4VNNIW instructions
    • Optimized GEMM_S8U8S32 and GEMM_S16S16S32 for AVX2, AVX512 and Intel® Xeon Phi™ processors based on Intel® Advanced Vector Extensions 512 (Intel® AVX-512) with support of AVX512_4FMAPS and AVX512_4VNNIW instruction groups
  • Deep Neural Network:
    • Added support for non-square pooling kernels
    • Improved performance of large non-square kernels on Intel® Xeon Phi™ processors
    • Optimized conversions between plain (nchw, nhwc) and internal data layouts
  • LAPACK:
    • Added the following improvements and optimizations for small matrices (N<16):
      • Direct Call feature extended with Cholesky and QR factorizations providing significant performance boost
      • Introduced LU and Inverse routines without pivoting with significantly better performance: mkl_?getrfnp and mkl_?getrinp
      • Introduced Compact routines for much faster solving of multiple matrices packed together: mkl_?getr[f|i]np_compact, mkl_?potrf_compact and mkl_?geqrf_compact
    • Added ?gesvd, ?geqr/?gemqr, ?gelq/?gemlq  optimizations for tall-and-skinny/short-and-wide matrice
    • Added optimizations for ?pbtrs routine
    • Added optimizations for ?potrf routine for Intel® Threading Building Blocks layer      
    • Added optimizations for CS decomposition routines: ?dorcsd and ?orcsd2by1
    • Introduced factorization and solve routines based on Aasen's algorithm: ?sytrf_aa/?hetrf_aa, ?sytrs_aa/?hetrs_aa
    • Introduced new (faster)_rk routines for symmetric indefinite (or Hermitian indefinite) factorization with bounded Bunch-Kaufman (rook) pivoting algorithm
  • ScaLAPACK:
    • Added optimizations (2-stage band reduction) for p?syevr/p?heevr routines for JOBZ=’N’ (eigenvalues only) case
  • FFT:
    • Introduced Verbose support for FFT domain, which enables users to capture the FFT descriptor information for Intel MKL
    • Improved performance for 2D real-to-complex and complex-to-real for  Intel® Xeon® Processor supporting Intel® Advanced Vector Extensions 512 (Intel® AVX-512) (codename Skylake Server) and Intel® Xeon Phi™ processor 72** ( formerly Knights Landing)
    • Improved performance for 3D complex-to-complex for  Intel® Xeon® Processor supporting Intel® Advanced Vector Extensions 512 (Intel® AVX-512) (codename Skylake Server) and Intel® Xeon Phi™ processor 72** ( formerly Knights Landing)
  • Intel® Optimized High Performance Conjugate Gradient Benchmark:         
    • New version of benchmark with Intel® MKL API
  • Sparse BLAS:
    • Introduced Symmetric Gauss-Zeidel preconditioner
    • Introduced Symmetric Gauss-Zeidel preconditioner with ddot calculation of resulted and initial arrays
    • Sparse Matvec routine with ddot calculation of resulted and initial arrays
    • Sparse Syrk routine with both OpenMP and Intel® Threading Building Block support
    • Improved performance of Sparse MM and MV functionality for Intel® AVX-512 Instruction Set
  • Direct Sparse Solver for Cluster:
    • Add support of transpose solver
  • Vector Mathematics:
    • Added 24 new functions: v?Fmod, v?Remainder, v?Powr, v?Exp2; v?Exp10; v?Log2; v?Logb; v?Cospi; v?Sinpi; v?Tanpi; v?Acospi; v?Asinpi; v?Atanpi; v?Atan2pi; v?Cosd; v?Sind; v?Tand; v?CopySign; v?NextAfter; v?Fdim; v?Fmax; v?Fmin; v?MaxMag and v?MinMag including optimizations for processors based on Intel(R) Advanced Vector Extensions 512 (Intel® AVX-512)
  • Data Fitting:
    • Cubic spline-based interpolation in ILP64 interface was optimized up to 8x times on Intel® Xeon® Processor supporting Intel® Advanced Vector Extensions 512 (Intel® AVX-512) (codename Skylake Server) and 2.5x on Intel® Xeon Phi™ processor 72** (formerly Knights Landing) 
  • Documentation:
    • Starting with this version of Intel® MKL, most of the documentation for Parallel Studio XE is only available online at https://software.intel.com/en-us/articles/intel-math-kernel-library-documentation. You can also download it from the Intel Registration Center > Product List > Intel® Parallel Studio XE Documentation
  • Intel continually evaluates the markets for our products in order to provide the best possible solutions to our customer’s challenges. As part of this on-going evaluation process Intel has decided to not offer Intel® Xeon Phi™ 7200 Coprocessor (codenamed Knights Landing Coprocessor) products to the market.
    • Given the rapid adoption of Intel® Xeon Phi™ 7200 processors, Intel has decided to not deploy the Knights Landing Coprocessor to the general market.
    • Intel® Xeon Phi™ Processors remain a key element of our solution portfolio for providing customers the most compelling and competitive solutions possible.
  • Support for the Intel® Xeon Phi™ x100 product family coprocessor (formerly code name Knights Corner) is removed in this release. The Intel® Xeon Phi™ x100 product family coprocessor (formerly code name Knights Corner) was officially announced end of life in January 2017.  As part of the end of life process, the support for this family will only be available in the Intel® Parallel Studio XE 2017 version.  Intel® Parallel Studio XE 2017 will be supported for a period of 3 years ending in January 2020 for the Intel® Xeon Phi™ x100 product family.  Support will be provided for those customers with active support.

Product Content

Intel MKL can be installed as a part of the following suite:

Intel MKL consists in one package for both IA-32 and Intel® 64 architectures and in online installer

Known Issues

  • Convolution primitives for forward pass may return incorrect results or crashes for the case where input spatial dimensions smaller than kernel spatial dimensions for Intel® Advanced Vector Extensions 512 (Intel® AVX-512)
  • Intel® MKL FFT – complex-to-complex in-place batched 1D FFT with transposed output returns incorrect output
  • Intel® ScaLAPACK may fail with OpenMPI* 1.6.1 and later releases due to known OpenMPI* issue: https://github.com/open-mpi/ompi/issues/3937. As a workaround, please avoid using OpenMPI
  • Intel® VML functions may raise spurious FP exceptions even if the (default) ML_ERRMODE_EXCEPT is not set. Recommendation: do not unmask FP exceptions before calling VML functions.
  • When an application uses Vector Math functions with the single dynamic library (SDL) interface combined with TBB threading layer, the application may generate runtime error “Intel MKL FATAL ERROR: Error on loading function mkl_vml_serv_threader_c_1i_2o.”
For more complete information about compiler optimizations, see our Optimization Notice.