Intel MKL 11.1 Release Notes

This document provides a general summary of new features and important notes about the Intel® Math Kernel Library (Intel® MKL) software product.

Please see the following links to the online resources and documents for the latest information regarding Intel MKL:

Links to documentation, help, and code samples can be found on the main Intel MKL product page. For technical support visit the Intel MKL technical support forum and review the articles in the Intel MKL knowledge base.

Please register your product using your preferred email address. This helps Intel recognize you as a valued customer in the support forum and ensures that you will be notified of product updates. You can read Intel's Online Privacy Notice Summary if you have any questions regarding the use of your email address for software product registration.

What's New in Intel MKL 11.1 Update 2

  • Intel® MKL now provides optimizations for all Intel® Atom™ processors that support SSE4.1 and SSE 4.2 instruction sets
  • BLAS:
    • Improved performance of ?GEMM for m=1 or n=1 on all Intel architectures
    • Improved MP LINPACK performance for systems using Intel® Many Integrated Core Architecture (Intel® MIC Architecture)
    • Improved performance of ?GEMM for outer product [large m, large n, small k] and tall skinny matrices [large m, medium n, small k] on Intel MIC Architecture
    • Improved performance of ?SYMM on Intel MIC Architecture
    • Improved {S/D}GEMM single thread performance on small matrices for 64-bit processors supporting Intel® Advanced Vector Extensions (Intel® AVX) and Intel® Advanced Vector Extensions 2 (Intel® AVX2)
    • Improved DGEMV performance for 64-bit processors supporting Intel AVX2
    • Improved threaded performance of ?GEMV for notrans:n>>m and trans:m>>n on all Intel architectures
    • Improved DSYR2K performance for 64-bit processors supporting Intel AVX and Intel AVX2
    • Improved DTRMM performance on small matrices (matrix A size <= 10) for 64-bit processors supporting Intel AVX and Intel AVX2
    • Reduced stack usage for ZHEMM and ZSYRK
    • Added more detailed error messages for running Offload MP LINPACK scripts with unsupported configurations
  • LAPACK:
    • Improved performance of (S/D)SYRDB and (D/S)SYEV for large dimensions and UPLO=L when eigenvectors are needed
    • Improved performance of ?GELQF,?GELS and ?GELSS for underdetermined case (M less than N)
    • Improved performance of ?GEHRD,?GEEV and ?GEES
    • Added Automatic Offload to Intel Xeon Phi Coprocessor for DSYRDB UPLO=L
  • Sparse BLAS:
    • Optimized Sparse Matrix Vector Multiply kernels for Intel® Advanced Vector Extensions 512 (Intel® AVX-512) instruction set
    • Improved Sparse BLAS level 2 and 3 performance for systems supporting Intel® Streaming SIMD Extensions 4.2 (Intel® SSE4.2), Intel AVX and Intel AVX2 instruction sets
  • Intel MKL PARDISO:
    • Improved memory estimation of out-of-core portion size for reordering algorithm leading to improved factorization-solving performance in OOC mode
  • VML:
    • Added v[d|s]Frac function computing fractional part for each vector element
  • VSL RNG:
    • Improved performance of MRG32K3A, and MT2203 BRNGs on Intel Xeon Phi processors
    • Improved performance of MT2203 BRNG on CPUs supporting Intel AVX and Intel AVX2 instruction sets
  • VSL Summary Statistics:
    • Added support for computation of group/pooled mean estimates (VSL_SS_GROUP_MEAN/VSL_SS_POOLED_MEAN)

     

  • Known Limitation :
    • Linux* OS only: The Intel MKL single dynamic library libmkl_rt.so does not conform to the gfortran calling convention for functions returning COMPLEX values. An application compiled with gfortran and linked with libmkl_rt.so might crash if it calls the following functions:
      • BLAS: CDOTC, CDOTU, CDOTCI, CDOTUI, ZDOTC, ZDOTU
      • LAPACK: CLADIV, ZLADIV

      Workaround: use gfortran options “-ff2c -fno-second-underscore” for building the entire application.Or use Static or Dynamic linking options instead of Single Dynamic Library

    • Windows* OS only:Automatic Offload on Windows with large matrices may cause data corruption or crash. There is a problem in COI: HSD4868293 (critical). COI Cannot allocate a buffer with >= 2**32 bytes and 2M pages on Windows
    • Workaround: Set MKL_MIC_MAX_MEMORY=3G, until the COI issue is resolved

  • Note: If LAPACKE_ssyevd fails to evaluate a matrix , workaround is to use LAPACKE_ssyev

    • Important Notices:
      • Visual Studio* 2008* is deprecated
        • Support for Visual Studio 2008* has been deprecated and will be removed in a future release
      • Windows XP* is deprecated
        • Support for Windows XP has been deprecated and will be removed in a future release
      • Windows Server 2003* and Windows Vista* not supported
        • Support has been removed for installation and use on Windows Server 2003 and Windows Vista. Intel recommends migrating to a newer version of these operating systems

     

    What's New in Intel MKL 11.1 Update 1

    • Introduced support for Intel® Advanced Vector Extensions 512 (Intel® AVX-512) instruction set with limited set of optimizations in BLAS, DFT and VML
    • Added support for Microsoft* Visual Studio* 2013
    • BLAS:
      • Improved performance of DSDOT and added support for multiple threads on all 64-bit Intel processors supporting Intel® Advanced Vector Extensions (Intel® AVX) and Intel® Advanced Vector Extensions 2 (Intel® AVX2)
      • Improved handling of denormals on the diagonal in *TRSM
      • Improved SGEMM performance for small N and large M and K on Intel® Many Integrated Core Architecture (Intel® MIC Architecture)
      • Improved parallel performance of *HEMM on all Intel processors supporting Intel® Streaming SIMD Extensions 4.2 (Intel® SSE4.2) and later
      • Improved parallel performance of 64-bit *SYRK/*HERK on all Intel processors supporting Supplemental Streaming SIMD Extensions 3 (SSSE3) and later
      • Improved serial performance of 64-bit (S/D)SYRK on all Intel processors supporting Intel SSE4.2 and later
      • Improved performance of DTRSM on Intel MIC Architecture
      • Enhanced Intel® Optimized HPL Benchmark runmultiscript capabilities for Intel processors supporting Intel AVX
      • Improved Intel Optimized HPL Benchmark performance on Intel MIC Architecture
    • LAPACK:
      • Decreased memory utilization for parallel LAPACK functions *(OR/UN)MQR, *(OR/UN)MRQ, *(OR/UN)MQL, and *(OR/UN)MLQ
      • Decreased stack memory utilization in LAPACK functions
      • Improved performance of (S/D)SYRDB and (S/D)SYEV for large dimensions when only eigenvalues are needed
    • VML:
      • Improved performance of the vector math complex functions v(c|z)(Exp|Ln|Sqrt)_(HA|LA|EP) on Intel AVX, Intel AVX2, and Intel MIC Architecture
    • VSL:
      • Added Skip-Ahead support in MT19937 and SFMT19937 Basic Random Number Generators
      • Changed behavior of UniformBits() random number generator for SFMT19937: a call to the generator with vector length n now returns n 32-bit unsigned integers (earlier versions of the generator returned 4n 32-bit unsigned integers) so to generate the same sequence of random numbers as in previous versions, replace parameter n with 4n
    • Data Fitting:
      • Added query service functionality df?QueryPtr(), dfiQueryVal(),df?QueryIdxPtr()
      • Added DF_INTERP_USER_CELL computation type to df?interpolate1d()/df?interpolateex1d(); these functions support computations given user-provided cell indices in parameter cell
    • ScaLAPACK: Updated PBLAS header files to support both the default NETLIB and Intel MKL complex datatypes
    • Transposition: Improved performance of mkl_?omatcopy routines on tall and skinny matrices
    • Setting the NUMBER_OF_USER_THREADS parameter when using MKL DFT from parallel regions is now optional;Intel MKL DFTI interface and thread safe wrappers are now thread safe by default
    • Known Limitations on Intel MIC Architecture:

      DSYGVD returns incorrect results when using selected numbers of threads

      SVD generates a segmentation fault in multithreading mode

      Symmetric/Hermitian matrix-vector multiplication routines ?SYMV/?HEMV produce incorrect results for some values of MKL_NUM_THREADS

      Workaround: Use a number of threads which is a multiple of 4

    Note: Introduced support for Intel® Manycore Platform Software Stack (Intel® MPSS) 3.1 for Linux and Windows with Intel® Fortran Compiler 14.0, Intel® C++ Compiler 14.0, and Intel MKL 11.1. If you are using an earlier version of Intel Fortran Compiler, Intel C++ Compiler, or Intel MKL and you want to use Intel MPSS 3.1, you might have to migrate to Intel Fortran Compiler 14.0, Intel C++ Compiler 14.0, or Intel MKL 11.1

     

    What's New in Intel MKL 11.1

    • Conditional Numerical Reproducibility : Introduced support for Conditional Numerical Reproducibility (CNR) mode on unaligned data
    • Introduced MP LINPACK support for heterogeneous clusters - clusters whose nodes differ from each other, either by processor type or by having varying number of attached Intel® Xeon Phi™ coprocessors
    • Intel MKL now supports compiler assisted offload and Automatic offload programming model on Intel Xeon Phi™ coprocessors based on the Intel® Many Integrated Core Architecture (Intel® MIC Architecture) on Windows OS*
    • Introduced Clang compiler support on OS X*
    • Improved performance of CNR=AUTO mode on recent AMD* systems
    • BLAS:
      • Improved performance of [S/D]GEMV on all Intel processors supporting Intel® SSE4.2 and later
      • Optimized [D/Z]GEMM and double-precision Level 3 BLAS functions on Intel® Advanced Vector Extensions 2 (Intel® AVX2)
      • Optimized [Z/C]AXPY and [Z/C]DOT[U/C] on Intel® Advanced Vector Extensions (Intel® AVX) and Intel AVX2
      • Optimized sequential version of DTRMM on Intel MIC Architecture
      • Tuned DAXPY on Intel AVX2
    • LAPACK:
      • Improved performance of (S/D)SYRDB and (S/D)SYEV for large dimensions when only eigenvalues are needed
      • Improved performance of xGESVD for small sizes like M,N<10
      • Improved performance of xGETRF and Intel SMP Linpack in Automatic Offload to Xeon Phi mode
    • VSL:
      • Added support and examples for mean absolute deviation
      • Improved performance of Weibull Random Number Generator (RNG) for alpha=1
      • Added support of raw and central statistical sums up to the 4th order, matrix of cross-products and median absolute deviation
      • Added a VSL example designed by S. Joe, and F. Y. Kuo illustrating usage of Sobol QRNG with direction numbers, which supports dimensions up to 21,201
      • Improved performance of SFMT19937 Basic Random Number Generator (BRNG) on Intel MIC Architecture
    • DFT:
      • Improved performance of double precision complex-to-complex transforms on Intel MIC Architecture
      • Optimized complex-to-complex DFT on Intel AVX2
      • Optimized complex-to-complex 2D DFT on Intel® Xeon processor E5 v2 series (code named IvyTown)
      • Improved performance for workloads specific to GENE application on Intel Xeon E5-series (Intel AVX) and on Intel AVX2
      • Improved documentation data layout for DFTI compute functions
      • Introduced scaling in large real-to-complex FFTs
    • Data Fitting:
      • Improved performance of df?Interpolate1D and df?SearchCells1D functions on Intel Xeon processors and Intel MIC Architecture
      • Improved performance of df?construct1d function for linear and Hermite/Bessel/Akima cubic types of splines on Intel MIC Architecture, Intel® Xeon® processor X5570 and Intel® Xeon® processor E5-2690
    • Transposition
      • Improved performance of in-place transposition for square matrices
    • Examples and tests for using Intel MKL are now packaged as an archive to shorten the installation time
    • Link Tool and Link Line advisor: Added support for Intel MIC Architecture on Windows* OS
    • Support for Microsoft Visual Studio 2008* Deprecated
    • Showing Documentation Issue with Microsoft Visual Studio 2012* and Windows Server 2012* :

      If on Windows Server 2012* you find that you cannot display help or documentation from within Visual Studio 2012*, correcting a security setting for Microsoft Internet Explorer* usually corrects the problem. From Tools > Internet Options > Security, change the settings for Internet Zone to allow “MIME Sniffing” and “Active Scripting”

    Important Notices:

    • Intel MKL now provides a choice of components to install. Components necessary for PGI compiler, Compaq Visual Fortran Compiler, SP2DP interface and Cluster support (ScaLAPACK and Cluster DFT) are not installed unless explicitly selected during installation
    • Unaligned CNR is not available for MKL Cluster components (ScaLAPACK and Cluster DFT)
    • Examples for using Intel MKL with BOOST/uBLAS and Java have been removed from the product distribution and placed in the following articles:
    • Known Issue:
      • A user application on OS X* linked with libmkl_rt.so library where the first call to Intel MKL was made in the parallel section will crash with segfault or with either of these messages:

        “malloc: *** error for object xxxxx: pointer being freed was not allocated *** set a breakpoint in malloc_error_break to debug”

        or

        “malloc: *** error for object xxxxx: double free !!! *** set a breakpoint in malloc_error_break to debug”

        Workaround: call any Intel MKL function before the parallel section

      • There is no LAPACK Automatic Offload (AO) on Windows* OS for ilp64 interface. To enable AO LAPACK offload use 32-bit integer and lp64 interface
      • Service functions like mkl_mic_enable, mkl_mic_set_offload_report etc. are missing in mkl_rt and settings for AO on Windows* OS can be controlled only via environment variables
      • SGEMM and DGEMM on Intel MIC Architecture crashes with segfault if TRANSB is 'N' and the right border of the matrix B is aligned to a page boundary. The workaround is to allocate extra memory for B

    Product Contents

    The Intel Math Kernel Library (Intel MKL) consists of two installation packages: one package for both IA-32 and Intel® 64 architectures, one for online installer

    Technical Support

    If you did not register your Intel software product during installation, please do so now at the Intel® Software Development Products Registration Center. Registration entitles you to free technical support, product updates, and upgrades for the duration of the support term.

    For general information about Intel technical support, product updates, user forums, FAQs, tips and tricks and other support questions, please visit http://www.intel.com/software/products/support/.

    Note: If your distributor provides technical support for this product, please contact them rather than Intel.

    For technical information about Intel MKL, including FAQs, tips and tricks, and other support information, please visit the Intel MKL forum: http://software.intel.com/en-us/forums/intel-math-kernel-library/ and browse the Intel MKL knowledge base: http://software.intel.com/en-us/articles/intel-mkl-kb/all/.

    Attributions

    As referenced in the End User License Agreement, attribution requires, at a minimum, prominently displaying the full Intel product name (e.g. "Intel® Math Kernel Library") and providing a link/URL to the Intel MKL homepage (http://www.intel.com/software/products/mkl) in both the product documentation and website.

    The original versions of the BLAS from which that part of Intel MKL was derived can be obtained from http://www.netlib.org/blas/index.html.

    The original versions of LAPACK from which that part of Intel MKL was derived can be obtained from http://www.netlib.org/lapack/index.html. The authors of LAPACK are E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorensen. Our FORTRAN 90/95 interfaces to LAPACK are similar to those in the LAPACK95 package at http://www.netlib.org/lapack95/index.html. All interfaces are provided for pure procedures.

    The original versions of ScaLAPACK from which that part of Intel MKL was derived can be obtained from http://www.netlib.org/scalapack/index.html. The authors of ScaLAPACK are L. S. Blackford, J. Choi, A. Cleary, E. D'Azevedo, J. Demmel, I. Dhillon, J. Dongarra, S. Hammarling, G. Henry, A. Petitet, K. Stanley, D. Walker, and R. C. Whaley.

    The Intel MKL Extended Eigensolver functionality is based on the Feast Eigenvalue Solver 2.0 http://www.ecs.umass.edu/~polizzi/feast/

    PARDISO (PARallel DIrect SOlver)* in Intel MKL was originally developed by the Department of Computer Science at the University of Basel http://www.unibas.ch . It can be obtained at http://www.pardiso-project.org.

    Some FFT functions in this release of Intel MKL have been generated by the SPIRAL software generation system (http://www.spiral.net/) under license from Carnegie Mellon University. The Authors of SPIRAL are Markus Puschel, Jose Moura, Jeremy Johnson, David Padua, Manuela Veloso, Bryan Singer, Jianxin Xiong, Franz Franchetti, Aca Gacic, Yevgen Voronenko, Kang Chen, Robert W. Johnson, and Nick Rizzolo.

    License Definitions

    INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.

    A "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death. SHOULD YOU PURCHASE OR USE INTEL'S PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS' FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS.

    Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information.

    The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.

    Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order.

    Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or by visiting Intel's Web Site

    Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each processor family, not across different processor families. See http://www.intel.com/products/processor_number for details.

    BlueMoon, BunnyPeople, Celeron, Celeron Inside, Centrino, Centrino Inside, Cilk, Core Inside, E-GOLD, Flexpipe, i960, Intel, the Intel logo, Intel AppUp, Intel Atom, Intel Atom Inside, Intel Core, Intel Inside, Intel Insider, the Intel Inside logo, Intel NetBurst, Intel NetMerge, Intel NetStructure, Intel SingleDriver, Intel SpeedStep, Intel Sponsors of Tomorrow., the Intel Sponsors of Tomorrow. logo, Intel StrataFlash, Intel vPro, Intel Xeon Phi, Intel XScale, InTru, the InTru logo, the InTru Inside logo, InTru soundmark, Itanium, Itanium Inside, MCS, MMX, Moblin, Pentium, Pentium Inside, Puma, skoool, the skoool logo, SMARTi, Sound Mark, Stay With It, The Creators Project, The Journey Inside, Thunderbolt, Ultrabook, vPro Inside, VTune, Xeon, Xeon Inside, X-GOLD, XMM, X-PMU and XPOSYS are trademarks of Intel Corporation in the U.S. and/or other countries.

    *Other names and brands may be claimed as the property of others.

    Optimization Notice

    Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

    Notice revision #20110804

    Copyright © 2002-2013, Intel Corporation. All rights reserved.

Пожалуйста, обратитесь к странице Уведомление об оптимизации для более подробной информации относительно производительности и оптимизации в программных продуктах компании Intel.