Intel® MKL 11.0 Release Notes

What's New in Intel® MKL 11.0 update 5

  • Introduced Clang compiler support on OS X*
  • Improved SMP LINPACK performance for 3rd and 4th Generation Intel® Core™ microarchitectures
  • Improved matrix generation time for Intel® Optimized MP LINPACK Benchmark for Clusters
  • BLAS:
    • Optimized {Z,D}GEMM and double-precision real/complex Level 3 BLAS functions on Intel® Advanced Vector Extensions 2 (Intel® AVX2)
    • Optimized sequential version of DTRMM on the Intel® Many Integrated Core Architecture (Intel® MIC Architecture)
    • Optimized *SYR2K and *HER2K on the Intel® MIC Architecture
    • Optimized DAXPY on Intel® AVX2
  • LAPACK:
    • Improved performance of Automatic offload single and double precision LU for one and two Intel® Xeon Phi™ coprocessors
    • Improved performance of ?GESVD for small sizes like M,N
  • DFT:
    • Improved documentation for DFTI compute functions data layout
    • Improved performance of workloads specific for GENE application on Intel Xeon® E5-series (Intel® AVX) and 4th generation Intel Core processors (Intel® AVX2)
    • Added scaling capability to large real-to-complex FFTs
  • Added examples for Reverse Communication Interface (RCI) in Intel Extended Eigensolver
  • Added live links to Intel MKL code examples:

    • The HTML version of the Intel MKL Reference Manual (available from http://software.intel.com/en-us/articles/intel-math-kernel-library-documentation/ ) provides hyperlinks from references to specific code examples so that when you click on an example, your Web browser displays the code. See, for example, the links from the documentation on Fourier Transform Functions and Nonlinear Optimization Problem Solvers
  • Known Limitation: MKL CTRMM may not return bitwise-identical results on some architectures

    Running in CNR mode on all systems supporting the SSE4.2 instruction set, MKL CTRMM may not return bitwise-identical results if the input matrices contain NaN values. To get bitwise-identical results, please set the environment variable MKL_CBWR to COMPATIBLE

What's New in Intel® MKL 11.0 update 4

What's New in Intel® MKL 11.0 update 3

  • BLAS:
    • Optimized multithreaded [S/D/C/Z]TRSM for native execution on the Intel® Many Integrated Core Architecture (Intel® MIC Architecture)
    • Improved serial and multithreaded performance of DGEMM on 2nd and 3rd Generation Intel® Core™ microarchitectures
  • Linpack:
    • Updated the Intel® Optimized MP LINPACK Benchmark for Clusters package to HPL 2.1
    • Tuned the Intel® Optimized MP LINPACK Benchmark for Clusters package with a new offload option for optimizations on systems with zero to eight Intel® Xeon Phi™ coprocessors; added new options, functionality and performance
  • Sparse BLAS:
    • Improved performance of DCOOMM on Intel® Advanced Vector Extensions 2 (Intel® AVX2)
  • LAPACK:
    • Parallelized ?LASET, ?LACPY, ?LANGE, ?LANSY
    • Improved performance of [C/Z]POTRF on Intel MIC Architecture
    • Improved performance of LU (?GETRF), Cholesky (?POTRF), and QR (?GEQRF) factorization functions for automatic offload on Intel MIC Architecture
  • Service Functions:
    • Introduced control for number of threads to be used by Automatic Offload in Intel MIC Architecture
  • FFT:
    • Improved Complex-to-complex power-of-2 FFT performance on Intel AVX2
  • VSL:
    • Improved performance of SFMT19937 Basic Random Number Generator (BRNG) on Intel AVX2 and on Intel MIC Architecture
  • Cluster FFT:
    • Improved hybrid mode (MPI + OpenMP*) Cluster FFT performance
  • Data Fitting:
    • Improved performance of df?construct1d function for linear and Hermite/Bessel/Akima cubic types of splines on Intel MIC Architecture, Intel® Xeon® X5570 and Intel® Xeon® E5-2690 CPUs series
    • Improved performance of df?interpolate and df?searchcells1d functions on Intel MIC Architecture
  • Known Issue: User application on OS X* linked with libmkl_rt.so library where the first call to Intel MKL was made in parallel section will crash with segfault or with either of these messages:

    “malloc: *** error for object xxxxx: pointer being freed was not allocated *** set a breakpoint in malloc_error_break to debug”

    OR

    “malloc: *** error for object xxxxx: double free !!! *** set a breakpoint in malloc_error_break to debug”

    Workaround: call any Intel MKL function before parallel section

  • Known Limitation:Documentation Viewing Issue with Microsoft Internet Explorer* 10 and Windows Server* 2012

    If on Windows Server 2012 you find that you cannot display help or documentation from within
    Internet Explorer 10, modifying a security setting for Microsoft Internet Explorer usually corrects
    the problem. From Tools > Internet Options > Security, add “about:internet” to the list of trusted sites.
    Optionally, you can remove “about:internet” from the list of trusted sites after you finish viewing the documentation.

What's New in Intel® MKL 11.0 update 2

  • Introduced Intel MKL Extended Eigensolver:

    Intel MKL Extended Eigensolver is a high performance package for solving symmetric standard or generalized symmetric-definite eigenvalue problems on matrices in dense, LAPACK banded, and sparse (CSR) formats. It is based on an innovative fast and stable numerical algorithm named Feast (See Attributions section below)

  • BLAS:
    • Optimized [C/Z]HERK for native execution on the Intel® Many Integrated Core Architecture (Intel® MIC Architecture)
    • Optimized BLAS Level-3 subroutine, ?SYMM (all precisions) for automatic offload (AO) on Intel MIC Architecture
  • Sparse BLAS:
    • Improved performance of 0-based DCSRMM significantly
  • LAPACK:
    • Improved performance of parallel versions of ?(OR/UN)G(LQ/QL/QR/RQ) functions significantly
    • Optimized LU (?GETRF), Cholesky (?POTRF), and QR (?GEQRF) factorization functions for automatic offload on Intel MIC Architecture
    • Improved LU and SMP Linpack performance for 60-cores on Intel MIC Architecture
  • ScaLAPACK:
    • Updated version to 2.0.2. New functions introduced include:
      • P?HSEQR: Nonsymmetric Eigenvalue Problem
      • P?SYEVR/P?HEEVR: MRRR (Multiple Relatively Robust Representations) algorithm
  • FFT:
    • Improved performance of complex-to-complex power-of-2 1D and 2D FFTs on Intel MIC Architecture
    • Improved performance of real-to-complex power-of-two and odd size 1D FFTs on Intel MIC Architecture
    • Added example demonstrating use of MKL FFT in Compiler Assisted Offload usage model for Intel MIC Architecture with Intel Fortran compiler
    • Decreased DFTI descriptor commit time on Intel MIC Architecture
    • Added FFTW interface wrapper libraries support for Intel MIC Architecture
  • Cluster FFT:
    • Implemented transposed order in multidimensional Cluster FFT transforms, including FFTW2 wrappers
  • VSL:
    • Supported ICDF (Inverse cumulative distribution function) method in VSL Lognormal RNG
    • Added “const” specifier to declarations of Summary Statistics functions
    • Improved performance of Wichmann-Hill BRNG on Intel MIC Architecture
  • Data Fitting:
    • Improved performance of df[d/s]Interpolate1D, df[d/s]InterpolateEx1D, df[d/s]SearchCells1D, df[d/s]SearchCellsEx1D routines for default/quasi-uniform partition, sorted interpolation sites in scalar (number of interpolation sites is 1) and vector cases for Intel® Xeon® processor X5570 and Intel® Xeon® processor E5-2600
    • Supported DF_DISABLE_CHECK_FLAG parameter in dfiEditVal editor to improve performance for small number of interpolation sites (fewer than one dozen) by disabling checking of the correctness of parameters in Data Fitting routines
    • Added “const” specifier to declarations of functions
  • Transposition:
    • Parallelized general out-of-place matrix transposition (mkl_?omatcopy, mkl_?omatcopy2), improving its performance significantly
  • Service functions:
    • Added mkl_peak_mem_usage function which provides information about peak memory amount used by Intel MKL Memory Allocator
    • Added mkl_calloc and mkl_realloc functions extending MKL Memory Allocator functionality to standard C library memory allocation API
  • Enhanced SMP LINPACK with residual check:

    It returns error code 1 if a failure is detected and prints conclusion if resulting residuals are ok to pass precision check or not. Please note that residuals might slightly vary from run-to-run on the same matrix if conditional numerical reproducibility mode is not turned on. The check ensures that results are reliable

What's New in Intel® MKL 11.0 update 1

  • BLAS:
    • Optimized [S/D/C/Z]SYMM for native execution on the Intel® Many Integrated Core Architecture (Intel® MIC Architecture)
    • Improved DGEMM and double-precision Level 3 BLAS performance on AMD Family “Bulldozer” CPUs
  • Sparse BLAS: Greatly improved CSRMV performance for complex conjugate transpose & Hermitian cases on Intel MIC Architecture
  • LAPACK:
    • Optimized ?(SY/HE)TRD, ?(OR/UN)M(LQ/QL/QR/RQ),?(OR\UN)GQR,?GE(QR/LQ/RQ/QL)F functions for native performance on Intel MIC Architecture
    • Improved ?GETRF and SMP LINPACK benchmark native performance on Intel MIC Architecture
    • Optimized ?GETRF, ?GEQRF, ?PORTF functions for automatic offload on the Intel MIC Architecture
  • PARDISO: Imaginary part of the diagonal values for Hermitian matrices are ignored
  • Cluster FFT:
    • Improved hybrid Cluster FFT (MPI + OpenMP) performance up to 2 times
    • A new Cluster FFT algorithm (Segment of Interest FFT) that uses less communication was implemented for 1D FFTs and it can be enabled by setting the environment variable "MKL_CFFT_SOI_ENABLE" to "YES" or "1" — see more info in MKL documentation
  • VSL:
    • Added support of VSL_SS_METHOD_FAST_USER_MEAN method for computation of descriptive Summary Statistics estimates given user-provided mean
    • Improved performance of VSL_SS_METHOD_FAST method for computations of descriptive Summary Statistics estimates on Intel® Xeon® processor E5-2690 CPU
    • Improved performance of Summary Statistics algorithms for computation of raw and central moments,and variance-covariance estimates on Intel MIC Architecture
    • Improved performance of MT2203 and WH BRNGs on Intel MIC Architecture
  • Transposition:Improved performance of Out-of-place transposition on 2nd generation Intel® Core™ microarchitecture (up to 7x)
  • Service functions:Removed seven service functions with obsolete names (see more details in KB Article on obsolete service functions removed )
  • Introduced support for PGI compiler 12.5
    • Some examples may fail on OS* X 32-bit mode with PGI compiler version 12.5 due to known issues in the OpenMP runtime of that compiler — solution is to upgrade the compiler

What's New in Intel® MKL 11.0

  • Intel MKL now has support for Intel® Xeon Phi™ coprocessor based on the Intel® Many Integrated Core Architecture (Intel® MIC Architecture) on Linux* only. There are three Intel MKL usage models on Intel Xeon Phi coprocessor: automatic offload, compiler assisted offload and native execution. Most of Intel MKL has been ported to run natively on these coprocessors. A smaller number of functions have been optimized to automatically divide their computational work between the host and Intel Xeon Phi coprocessor, a feature called automatic offload (AO). Read the Intel MKL User’s Guide for more information. Most standard Intel MKL functions run on Intel Xeon Phi coprocessor except the Poisson library, Iterative sparse solvers, and Trust region solvers.
  • Conditional Numerical Reproducibility (CNR): New functionality in Intel MKL now allows you to balance performance with reproducible results by allowing greater flexibility in code branch choice and by ensuring algorithms are deterministic. See the Intel MKL User’s Guide for more information. Refer to the CNR KB Article for more information.
  • Intel MKL also introduces optimizations using the new Intel® Advanced Vector Extensions 2 (Intel® AVX2) including the new FMA3 instructions. See the KB Article on support for Intel® AVX2
  • BLAS:
    • Optimized [S/D/C/Z]GEMM, [S/D/C/Z]TRMM, [S/D/C/Z]TRSM, [S/D/C/Z]SYRK, [S/D]GEMV, [S/D]AXPY, [S/D]DOT for native execution and ?TRMM, ?TRSM, ?GEMM functions for automatic offload on the Intel MIC Architecture
    • Improved DSYRK/SSYRK performance for 64-bit programs supporting Intel® Advanced Vector Extensions (Intel® AVX)
  • Sparse BLAS:
    • Optimized ?CSRMV, ?CSRMM, and ?CSRSYMV (for unit diagonal case) for native execution on Intel MIC Architecture
  • LAPACK:
    • Optimized [S/D]GETRF, [S/D]POTRF, [S/D]GEQRF, [S/D]GELQF, [S/D]GEQLF, and [S/D]GERQF for native execution on Intel MIC Architecture
    • Introduced support for LAPACK version 3.4.1
  • FFT :
    • Optimized single- and double-precision real-to-complex and complex-to-complex one-, two-, and three-dimensional fast Fourier transforms for native execution on Intel MIC Architecture
    • Added configuration parameter DFTI_THREAD_LIMIT which limits the number of threads per descriptor
    • Added support for 1D real-to-complex transforms with sizes given by 64-bit prime integers
  • VML /VSL:
    • Optimized complex SinCos and CIS functions for native execution on Intel MIC Architecture
    • Optimized MT19937, MT2203, MRG32k3a BRNGs, and discrete Uniform and Geometric RNGs for native execution on Intel MIC Architecture
    • Improved performance of viRngGeometric on Intel® Advanced Vector Extensions (Intel AVX)
    • Implemented threading in Data Fitting Integrate1d function
  • Transposition: Parallelized in-place transposition of square matrices with leading dimensions greater than the matrix size for single and double precisions improving its performance significantly
  • Implemented local threading control function (mkl_set_num_threads_local) which increases flexibility in threading control
  • The mklvars.* script no longer sets $FPATH in environment and no longer exports internal variable MKL_TARGET_ARCH (this change will not impact users as the Intel compiler no longer requires these variables)
  • Link Tool: Added Intel MIC Architecture support
  • Link Line Advisor:
    • Added Help-Me functionality for selecting architecture (IA-32/Intel® 64) and interface layer (LP64/ILP64)
    • Added Intel MIC Architecture support

Important Notices:

    Please refer Deprecations KB Article for more information on the following notices

  • Microsoft Windows* System PATH environment variable is no longer set during installation
  • Removed support for Intel® Pentium® III processor. The minimal supported instruction set will be SSE2(Streaming SIMD Extensions 2).
  • Removed Intel MKL GNU Multiple Precision* (GMP) function interfaces
  • Disabled timing function mkl_set_cpu_frequency() to perform useful work — use mkl_get_max_cpu_frequency(), mkl_get_clocks_frequency(), and mkl_get_cpu_frequency() as described in the Intel MKL Reference Manual
  • Removed MKL_PARDISO constant — used MKL_DOMAIN_PARDISO to specify the PARDISO domain with the mkl_domain_set_num_threads() function
  • Removed special backward compatibility functions for convolution and correlation functions in Intel MKL 10.2 update 4
  • Removed the OpenMP* static runtime library from the Windows* version of Intel MKL and Intel® compilers
  • Documentation:
    • The Intel MKL Reference Manual in HTML format is no longer available with the product
    • Man pages and Eclipse help integration are no longer provided

Product Contents

The Intel® Math Kernel Library (Intel® MKL) consists of three installation packages: one package for both IA-32 architecture and Intel® 64 architectures, one for IA-32 only, and one for Intel® 64 architecture only.

Technical Support

If you did not register your Intel software product during installation, please do so now at the Intel® Software Development Products Registration Center. Registration entitles you to free technical support, product updates, and upgrades for the duration of the support term.

For general information about Intel technical support, product updates, user forums, FAQs, tips and tricks and other support questions, please visit http://www.intel.com/software/products/support/.

Note: If your distributor provides technical support for this product, please contact them rather than Intel.

For technical information about Intel MKL, including FAQ's, tips and tricks, and other support information, please visit the Intel MKL forum: http://software.intel.com/en-us/forums/intel-math-kernel-library/ and browse the Intel MKL knowledge base: http://software.intel.com/en-us/articles/intel-mkl-kb/all/.

Attributions

As referenced in the End User License Agreement, attribution requires, at a minimum, prominently displaying the full Intel product name (e.g. "Intel® Math Kernel Library") and providing a link/URL to the Intel® MKL homepage (http://www.intel.com/software/products/mkl) in both the product documentation and website.

The original versions of the BLAS from which that part of Intel® MKL was derived can be obtained from http://www.netlib.org/blas/index.html.

The original versions of LAPACK from which that part of Intel® MKL was derived can be obtained from http://www.netlib.org/lapack/index.html. The authors of LAPACK are E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorensen. Our FORTRAN 90/95 interfaces to LAPACK are similar to those in the LAPACK95 package at http://www.netlib.org/lapack95/index.html. All interfaces are provided for pure procedures.

The original versions of ScaLAPACK from which that part of Intel® MKL was derived can be obtained from http://www.netlib.org/scalapack/index.html. The authors of ScaLAPACK are L. S. Blackford, J. Choi, A. Cleary, E. D'Azevedo, J. Demmel, I. Dhillon, J. Dongarra, S. Hammarling, G. Henry, A. Petitet, K. Stanley, D. Walker, and R. C. Whaley.

PARDISO in Intel® MKL is compliant with the 3.2 release of PARDISO that is freely distributed by the University of Basel. It can be obtained at http://www.pardiso-project.org.

Some FFT functions in this release of Intel® MKL have been generated by the SPIRAL software generation system (http://www.spiral.net/) under license from Carnegie Mellon University. The Authors of SPIRAL are Markus Puschel, Jose Moura, Jeremy Johnson, David Padua, Manuela Veloso, Bryan Singer, Jianxin Xiong, Franz Franchetti, Aca Gacic, Yevgen Voronenko, Kang Chen, Robert W. Johnson, and Nick Rizzolo.

License Definitions

INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.

A "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death. SHOULD YOU PURCHASE OR USE INTEL'S PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS' FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS.

Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information.

The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.

Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order.

Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or by visiting Intel's Web Site

Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each processor family, not across different processor families. See http://www.intel.com/products/processor_number for details.

BlueMoon, BunnyPeople, Celeron, Celeron Inside, Centrino, Centrino Inside, Cilk, Core Inside, E-GOLD, Flexpipe, i960, Intel, the Intel logo, Intel AppUp, Intel Atom, Intel Atom Inside, Intel Core, Intel Inside, Intel Insider, the Intel Inside logo, Intel NetBurst, Intel NetMerge, Intel NetStructure, Intel SingleDriver, Intel SpeedStep, Intel Sponsors of Tomorrow., the Intel Sponsors of Tomorrow. logo, Intel StrataFlash, Intel vPro, Intel Xeon Phi, Intel XScale, InTru, the InTru logo, the InTru Inside logo, InTru soundmark, Itanium, Itanium Inside, MCS, MMX, Moblin, Pentium, Pentium Inside, Puma, skoool, the skoool logo, SMARTi, Sound Mark, Stay With It, The Creators Project, The Journey Inside, Thunderbolt, Ultrabook, vPro Inside, VTune, Xeon, Xeon Inside, X-GOLD, XMM, X-PMU and XPOSYS are trademarks of Intel Corporation in the U.S. and/or other countries.

*Other names and brands may be claimed as the property of others.

Optimization Notice

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more
information regarding the specific instruction sets covered by this notice.

Notice revision #20110804

Copyright © 2002-2012, Intel Corporation. All rights reserved.

Пожалуйста, обратитесь к странице Уведомление об оптимизации для более подробной информации относительно производительности и оптимизации в программных продуктах компании Intel.