Intel MKL 11.2 Release Notes

This document provides a general summary of new features and important notes about the Intel® Math Kernel Library (Intel® MKL) software product.

Please see the following links to the online resources and documents for the latest information regarding Intel MKL:

Links to documentation, help, and code samples can be found on the main Intel MKL product page. For technical support visit the Intel MKL technical support forum and review the articles in the Intel MKL knowledge base.

Please register your product using your preferred email address. This helps Intel recognize you as a valued customer in the support forum and ensures that you will be notified of product updates. You can read Intel's Online Privacy Notice Summary if you have any questions regarding the use of your email address for software product registration.

What's New in Intel MKL 11.2 Update 2

  • BLAS:
    • Improved ?GEMM performance for Intel® Xeon Phi™ coprocessors for cases where k >> m, k >> n.
    • Improved parallel and serial performance of ?HEMM/?SYMM for Intel® Advanced Vector Extensions 2 (Intel® AVX2) for the 64-bit Intel MKL.
    • Improved parallel and serial performance of ?HERK/?SYRK and and ?HER2K/?SYR2K for Intel AVX2.
    • Added MKL_DIRECT_CALL support for CBLAS interfaces and ?GEMM3M routines.
    • Improved CGEMM performance for Intel® Advanced Vector Extensions 512 (Intel® AVX512).
    • Improved SGEMM and ZGEMM performance for AMD* Opteron* 6000 series processors.
    • Small performance improvement for CGEMM and ZGEMM for Intel AVX2 for the 64-bit Intel MKL.
  • LAPACK:
    • Improved symmetric eigensolvers performance by up to 3x, for the cases when eigenvectors are not needed.
    • Improved ?GESVD performance by 2-3x, for the cases when singular vectors are required.
    • Improved ?GETRF performance for Intel AVX2 by up to 14x for non-square matrices.
    • Narrowed the ?GETRF performance gap between CNR (Conditional Numerical Reproducibility)-enabled and CNR-disabled cases. The gap is now below 5%.
    • Improved performance of Intel® Optimized LINPACK Benchmark shared memory (SMP) implementation for Intel AVX2 by up to 40%.
  • Parallel Direct Sparse Solver for Clusters:
    • Added ability to overwrite the right hand side vector with solution with the distributed CSR format.
    • Added ability to gather system solution on all compute nodes with distributed CSR format.
  • Intel® MKL PARDISO:
    • Significantly improved overall scalability for Intel Xeon Phi coprocessors.
    • Improved the scalability of the solving step for Intel Xeon processors.
    • Reduced memory footprint in the out-of-core mode.
    • Added ability to free up memory used by the input matrix after the factorization step. This helps to reduce memory consumption when iterative refinement is not needed and disabled by the user.
  • Extended Eigensolver:
    • Improved performance for Intel Xeon processors
  • VSL:
    • Summary Statistics:
      • Improved performance of variance-covariance matrices computation and correlation matrices computation routines for cases when the task dimension is approximately equal to the number of observations.
    • RNG:
      • Improved performance of the Sobol and the Niederreiter Quasi-RNGs for Intel Xeon processors.
  • Convolution and correlation:
    • Improved 3D convolution performance.
  • Important Note : Intel® MKL Cluster Support for IA 32 is now Deprecated and support will be removed starting Intel® MKL 11.3

What's New in Intel MKL 11.2 Update 1

  • Introduced support for Intel® Advanced Vector Extensions 512 (Intel® AVX-512) on Intel® Xeon® processors for Windows* and Linux* versions of Intel MKL.This is in addition to the current support for Intel® AVX-512 instructions for Intel® Many Integrated Core Architecture (Intel® MIC Architecture)
  • BLAS:
    • Optimized the following functions on Intel Xeon processors supporting Intel AVX-512 Instruction set:
      • (D/Z)AXPY,(S/D/C/Z)COPY, DTRMM (for cases when the triangular matrix is on right side and with no matrix transpose)
    • Optimized following BLAS Level-1 functions on Intel® Advanced Vector Extensions 2 (Intel® AVX2)both for Intel64 and IA-32 Architectures
      • (S/D)DOT,(S/D)SCAL,(S/D)ROT,(S/D)ROTM,(S/D/C/Z)SWAP,(S/D/SC/DZ)ASUM
    • Improved ?GEMM performance (serial and multithreaded) on Intel AVX2 (for IA-32 Architectures)
    • Improved ?GEMM performance for beta==0 on Intel® Advanced Vector Extensions (Intel® AVX) and Intel AVX2 (for Intel64 Architectures)
    • Improved DGEMM performance (serial and multithreaded) on Intel AVX (for Intel64 Architectures)
  • LAPACK:
    • Introduced support for LAPACK version 3.5. New features introduced in this version are:
      • Symmetric/Hermitian LDLT factorization routines with rook pivoting algorithm
      • 2-by-1 CSD for tall and skinny matrix with orthonormal columns
    • Improved performance of (C/Z)GE(SVD/SDD) when M>=N and singular vectors are not needed
  • FFT:
    • Introduced Automatic Offload mode for 1D Batch FFT on Intel MIC Architecture
    • Improved performance of Hybrid (OpenMP+MPI) Cluster FFT
    • Improved accuracy of large 1D real-to-complex transforms
  • Parallel Direct Sparse Solver for Clusters:
    • Added support for many factorization steps with the same reordering (maxfct > 1)
  • Intel MKL PARDISO:
    • Added support for Schur complement, including getting explicit Schur complement matrix and solving the system through Schur complement
  • Sparse BLAS:
    • Optimized SpMV on Intel Xeon processor supporting Intel AVX-512 Instruction set
    • Added Sparse Matrix Checker functionality as standalone API to simplify validation of matrix structure and indices(see Sparse Matrix Checker Routines in Intel® Math Kernel Library (Intel® MKL) Reference Manual
    • Sparse BLAS API for C/C++ uses const modifier for constant parameters
  • VML:
    • Introduced new Environment variable, MKL_VML_MODE to control the accuracy behavior. This Environment variable can be used to control VML functions behavior (analog of vmlSetMode() function)
    • Optimized all functions of VML on Intel Xeon Processors supporting Intel AVX-512 instruction set
  • VSL:
    • Optimized all Random Number Generators of VSL on Intel Xeon Processors supporting Intel AVX-512 instruction set
    • Parallelized Sobol BRNG to improve performance on big dimensions
  • Important Note : Intel® MKL Cluster Support for IA 32 is now Deprecated and support will be removed starting Intel® MKL 11.3

What's New in Intel MKL 11.2

  • Intel MKL now provides optimizations for all Intel® Atom™ processors that support Intel® Streaming SIMD Extensions 4.1 (Intel® SSE4.1) and Intel® Streaming SIMD Extensions 4.2 (Intel® SSE4.2) instruction sets
  • Introduced support for Intel® Advanced Vector Extensions 512 (Intel® AVX-512) instruction set with limited optimizations in BLAS, DFT and VML
  • Introduced Verbose support for BLAS and LAPACK domains, which enables users to capture the input parameters to Intel MKL function calls
  • Introduced support for Intel® MPI Library 5.0
  • Introduced the Intel Math Kernel Library Cookbook (http://software.intel.com/en-us/mkl_cookbook) , a new document that describes how to use Intel MKL routines to solve certain complex problems
  • Introduced the MKL_DIRECT_CALL or MKL_DIRECT_CALL_SEQ compilation feature that provides ?GEMM small matrix performance improvements for all processors (see the Intel® Math Kernel Library User's Guide for more details)
  • Added the ability to link a Single Dynamic Library (mkl_rt) on Intel® Many Integrated Core Architecture (Intel® MIC Architecture)
  • Added a customizable error handler.See the Intel Math Kernel Library Reference Manual description of mkl_set_exit_handler() for further details
  • Extended the Intel® Xeon Phi™ coprocessor Automatic Offload feature with a resource sharing mechanism.See the Intel Math Kernel Library Reference Manual for the description of mkl_mic_set_resource_limit() function and the MKL_MIC_RESOURCE_LIMIT environment variable for further details
  • Parallel Direct Sparse Solver for Clusters:
    • Introduced Parallel Direct Sparse Solver for Clusters, a distributed memory version of Intel MKL PARDISO direct sparse solver
    • Improved performance of the matrix gather step for distributed matrices
    • Enabled reuse of reordering information on multiple factorization steps
    • Added distributed CSR format, support of distributed matrices, RHS, and distributed solutions
    • Added support of solving of systems with multiple right hand sides
    • Added cluster support of factorization and solving steps
    • Added support for pure MPI mode and support for single OpenMP thread in hybrid configurations
  • BLAS:
    • Improved threaded performance of ?GEMM for all 64-bit architectures supporting Intel® Advanced Vector Extensions 2 (Intel® AVX2)
    • Optimized ?GEMM, ?TRSM, DTRMM for the Intel AVX-512 instruction set
    • Improved performance of ?GEMM for outer product [large m, large n, small k] and tall skinny matrices [large m, medium n, small k] on Intel MIC Architecture
    • Improved performance of ?TRSM and ?SYMM in Automatic Offload mode on Intel MIC Architecture
    • Improved performance of Level 3 BLAS functions for 64-bit processors supporting Intel AVX2
    • Improved ?GEMM performance on small matrices for all processors when MKL_DIRECT_CALL or MKL_DIRECT_CALL_SEQ is defined during compilation (see the Intel® Math Kernel Library User’s Guide for more details )
    • Improved performance of DGER and DGEMM for the beta=1, k=1 case for 64-bit processors supporting Intel SSE4.2, Intel® Advanced Vector Extensions (Intel® AVX), and Intel AVX2 instruction sets
    • Optimized (D/Z)AXPY for the Intel AVX-512 instruction set
    • Optimized ?COPY for Intel AVX2 and AVX512 instruction sets
    • Optimized DGEMV for Intel AVX-512 instruction set
    • Improved performance of SSYR2K for 64-bit processors supporting Intel AVX and Intel AVX2
    • Improved threaded performance of ?AXPBY for all Intel processors
    • Improved DTRMM performance for the side=R, uplo={U,L}, transa=N, diag={N,U} cases for Intel AVX-512
  • LINPACK:
    • Improved performance of matrix generation in the heterogeneous Intel® Optimized MP LINPACK Benchmark for Clusters
    • Intel MIC Architecture offload option of the Intel Optimized MP LINPACK Benchmark for Clusters package now supports Intel AVX2 hosts
    • Improved performance of the Intel Optimized MP LINPACK for Clusters package for 64-bit processors supporting Intel AVX2
  • LAPACK:
    • Improved performance of ?(SY/HE)RDB
    • Improved performance of ?(SY/HE)(EV/EVD) when eigenvectors are needed
    • Improved performance of ?(SY/HE)(EV/EVR/EVD) when eigenvectors are not needed
    • Improved performance of ?GELQF,?GELS and ?GELSS for underdetermined case (M less than N)
    • Improved performance of ?GEHRD,?GEEV and ?GEES
    • Improved performance of NaN checkers in LAPACKE interfaces
    • Improved performance of ?GELSX, ?GGSVP
    • Improved performance of ?GETRF
    • Improved performance of (S/D)GE(SVD/SDD) when M>=N and singular vectors are not needed
    • Improved performance of ?POTRF UPLO=U in Automatic Offload mode on Intel MIC Architecture
    • Added Automatic Offload for ?SYRDB on Intel MIC Architecture, which speeds up ?SY(EV/EVD/EVR) when eigenvectors are not needed
  • PBLAS and ScaLAPACK:
    • Enabled Automatic Offload in P?GEMM routines for large distribution blocking factors
  • Sparse BLAS:
    • Optimized SpMV kernels for Intel AVX-512 instruction set
    • Added release example for diagonal format use in Sparse BLAS
    • Improved Sparse BLAS level 2 and 3 performance for systems supporting Intel SSE4.2, Intel AVX and Intel AVX2 instruction sets
  • Intel MKL PARDISO:
    • Added the ability to store Intel MKL PARDISO handle to the disk for future use at any solver stage
    • Added pivot control support for unsymmetric matrices and out-of-core mode
    • Added diagonal extraction support for unsymmetric matrices and out-of-core mode
    • Added example demonstrating use of Intel MKL PARDISO as iterative solver for non-linear systems
    • Added capability to free memory taken by original matrix after factorization stage if iterative refinement is disabled
    • Improved memory estimation of out-of-core (OOC) portion size for reordering algorithm leading to improved factorization-solve performance in OOC mode
    • Improved message output from Intel MKL PARDISO
    • Added support of zero pivot during factorization for structurally symmetric cases
  • Poisson library:
    • Added example demonstrating use of the Intel MKL Poisson library as a preconditioner for linear systems solves
  • Extended Eigensolver:
    • Improved message output
    • Improved examples
    • Added input and output iparm parameters in predefined interfaces for solving sparse problems
  • FFT:
    • Optimized FFTs for the Intel AVX-512 instruction set
    • Improved performance for non-power-of-2 length on Intel® MIC Architecture
  • VML: Added v[d|s]Frac function computing fractional part for each vector element
  • VSL RNG:
    • Added support of ntrial=0 in Binomial Random Number Generator
    • Improved performance of MRG32K3A and MT2203 BRNGs on Intel MIC Architecture
    • Improved performance of MT2203 BRNG on CPUs supporting Intel AVX and Intel AVX2 instruction sets
  • VSL Summary Statistics:
    • Added support for group/pooled mean estimates (VSL_SS_GROUP_MEAN/VSL_SS_POOLED_MEAN)
  • Data Fitting: Fixed incorrect behavior of the natural cubic spline construction function when number of breakpoints is 2 or 3
  • Introduced an Intel MKL mode that ignores all settings specified by Intel MKL environment variables
    • User can set up the mode by calling mkl_set_env_mode() routine which directs Intel MKL to ignore all environment settings specific to Intel MKL so that all Intel MKL related environment variables such as MKL_NUM_THREADS, MKL_DYNAMIC, MKL_MIC_ENABLE and others are ignored; users can instead set needed parameters via Intel MKL service routines such as mkl_set_num_threads() and mkl_mic_enable()

Known Limitations:

  • Windows* only : Automatic Offload on Windows with large matrices may cause data corruption or crash. There is a problem in COI: HSD4868293 (critical). COI Cannot allocate a buffer with >= 2**32 bytes and 2M pages on Windows
    • Workaround: Set MKL_MIC_MAX_MEMORY=3G. Note: This issue is resolved in Intel® MPSS 3.3
  • The LU factorization routine (?getrf) may produce incorrect results or hang on 64-bit Intel AVX2 when both dimensions are larger than 15000 and number of threads >= 14. This issue will be resolved in Intel MKL 11.2.1
    • Workaround: Use less than 14 threads for ?getrf
  • Scripts for hybrid offload version of MP LINPACK (runme_offload_intel64.bat) do not work for Intel MPI Library 5.0.
    • Workaround:Replace the PMI_RANK variable with PMI_ID inside runme_offload_intel64_prv.bat

Note: API symbols, order of arguments and link line have changed since Intel MKL 11.2 Beta Update 2 . (see the Intel® Math Kernel Library User's Guide for more details)

Note: Important deprecations are listed in Intel® Math Kernel Library (Intel® MKL) 11.2 Deprecations

Product Contents

Now Intel MKL consists in one package for both IA-32 and Intel® 64 architectures and in online installer

Technical Support

If you did not register your Intel software product during installation, please do so now at the Intel® Software Development Products Registration Center. Registration entitles you to free technical support, product updates, and upgrades for the duration of the support term.

For general information about Intel technical support, product updates, user forums, FAQs, tips and tricks and other support questions, please visit http://www.intel.com/software/products/support/.

Note: If your distributor provides technical support for this product, please contact them rather than Intel.

For technical information about Intel MKL, including FAQs, tips and tricks, and other support information, please visit the Intel MKL forum: http://software.intel.com/en-us/forums/intel-math-kernel-library/ and browse the Intel MKL knowledge base: http://software.intel.com/en-us/articles/intel-mkl-kb/all/.

Attributions

As referenced in the End User License Agreement, attribution requires, at a minimum, prominently displaying the full Intel product name (e.g. "Intel® Math Kernel Library") and providing a link/URL to the Intel MKL homepage (http://www.intel.com/software/products/mkl) in both the product documentation and website.

The original versions of the BLAS from which that part of Intel MKL was derived can be obtained from http://www.netlib.org/blas/index.html.

The original versions of LAPACK from which that part of Intel MKL was derived can be obtained from http://www.netlib.org/lapack/index.html. The authors of LAPACK are E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorensen. Our FORTRAN 90/95 interfaces to LAPACK are similar to those in the LAPACK95 package at http://www.netlib.org/lapack95/index.html. All interfaces are provided for pure procedures.

The original versions of ScaLAPACK from which that part of Intel MKL was derived can be obtained from http://www.netlib.org/scalapack/index.html. The authors of ScaLAPACK are L. S. Blackford, J. Choi, A. Cleary, E. D'Azevedo, J. Demmel, I. Dhillon, J. Dongarra, S. Hammarling, G. Henry, A. Petitet, K. Stanley, D. Walker, and R. C. Whaley.

The Intel MKL Extended Eigensolver functionality is based on the Feast Eigenvalue Solver 2.0 http://www.ecs.umass.edu/~polizzi/feast/

PARDISO (PARallel DIrect SOlver)* in Intel MKL was originally developed by the Department of Computer Science at the University of Basel http://www.unibas.ch . It can be obtained at http://www.pardiso-project.org.

Some FFT functions in this release of Intel MKL have been generated by the SPIRAL software generation system (http://www.spiral.net/) under license from Carnegie Mellon University. The Authors of SPIRAL are Markus Puschel, Jose Moura, Jeremy Johnson, David Padua, Manuela Veloso, Bryan Singer, Jianxin Xiong, Franz Franchetti, Aca Gacic, Yevgen Voronenko, Kang Chen, Robert W. Johnson, and Nick Rizzolo.

License Definitions

INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.

A "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death. SHOULD YOU PURCHASE OR USE INTEL'S PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS' FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS.

Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information.

The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.

Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order.

Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or by visiting Intel's Web Site

Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each processor family, not across different processor families. See http://www.intel.com/products/processor_number for details.

BlueMoon, BunnyPeople, Celeron, Celeron Inside, Centrino, Centrino Inside, Cilk, Core Inside, E-GOLD, Flexpipe, i960, Intel, the Intel logo, Intel AppUp, Intel Atom, Intel Atom Inside, Intel Core, Intel Inside, Intel Insider, the Intel Inside logo, Intel NetBurst, Intel NetMerge, Intel NetStructure, Intel SingleDriver, Intel SpeedStep, Intel Sponsors of Tomorrow., the Intel Sponsors of Tomorrow. logo, Intel StrataFlash, Intel vPro, Intel Xeon Phi, Intel XScale, InTru, the InTru logo, the InTru Inside logo, InTru soundmark, Itanium, Itanium Inside, MCS, MMX, Moblin, Pentium, Pentium Inside, Puma, skoool, the skoool logo, SMARTi, Sound Mark, Stay With It, The Creators Project, The Journey Inside, Thunderbolt, Ultrabook, vPro Inside, VTune, Xeon, Xeon Inside, X-GOLD, XMM, X-PMU and XPOSYS are trademarks of Intel Corporation in the U.S. and/or other countries.

*Other names and brands may be claimed as the property of others.

Optimization Notice

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804

Copyright © 2002-2014, Intel Corporation. All rights reserved.

For more complete information about compiler optimizations, see our Optimization Notice.