TODD R. (Intel) 于 提交

**New in Intel® MKL 10.2 Update 7:**

- LAPACK: Threaded QR factorization with pivoting (DGEQP3) on Itanium® architecture
- PARDISO/DSS: Added true F90 overloaded API (see the Intel MKL reference manual for more information)
- PARDISO: Improved the readability of the statistical reporting
- Sparse BLAS: Improved performance of ?BSRMM functions on Intel® Core™ i7 processors
- FFTs: Support for negative strides
- FFT examples: Added examples for split-complex FFTs in C and Fortran using both the DFTI and FFTW3 interfaces
- Poisson Library: Changed the default behavior of the Poisson library functions from sequential to threaded operation
- Bug fixes

**New in Intel® MKL 10.2 Update 6:**

- New Features
- Integrated Netlib LAPACK 3.2.2 including one new computational routine (?GEQRFP) and two new auxiliary routines (?GEQR2P and ?LARFGP)

- Performance improvements
- Improved DZGEMM performance on Intel® Xeon® processors series 5300 and 5400 with 64-bit operating systems
- Improved DSYRK performance on Intel® Xeon® processors series 5300 with 32-bit operating systems with the most significant improvements for small oblong matrices on 8 and more threads
- Improved the scalability of (C/Z)GGEV by parallelizing the reduction to generalized Hessenberg form ((C/Z)GGHRD)
- Improved performance for ?(SY/HE)EV and ?(SP/HP)TRS on very small matrices (< 20)
- Improved performance of FFTW2 wrappers for those cases where the descriptor remains constant from call to call
- Improved Scalability of threaded applications that use non-threaded FFTs on multi-socket systems
- Significantly improved performance of cluster FFTs through better load balancing when the input data cannot be evenly distributed between MPI processes
- Improved scalability of cluster FFTs on systems with a non-power-of-2 number of cores/processors
- Improved performance of factorization step in PARDISO out-of-core for huge matrices through reduction in the number of disk IO operations
- Parallelized solve step in PARDISO

- Usability/Interface improvements
- Improved support for F77 in FFTW2 and MPI FFTW2 interfaces
- Implemented rfftwnd_create_plan_specific and its 2d and 3d variants
- Added 2D Convolution/Correlation examples

- Bug fixes

**New in Intel® MKL 10.2 Update 5:**

- New Features
- Incorporated the LAPACK 3.2.1 update primarily consisting of fixes to LAPACK 3.2. The following fixes listed in the LAPACK Errata but appearing after the 3.2.1 release were also incorporated: bug0014, bug0015, bug0016, bug0017, bug0019, bug0021, bug0023, bug0031, bug0038, bug0041, bug0042, bug0043, bug0044, bug0045, bug0047, bug0048, bug0049, bug0050, bug0052.

- Performance improvements
- FFTs
- Improved performance for complex FFTs, 3D and higher on the Intel® 64 architecture

- VSL
- Improved performance of the MT19937 and MT2203 basic random number generators (BRNGs) on the 45nm Intel® Core™2 Duo processor and newer processors in 64-bit libraries

- FFTs
- Usability/Interface improvements
- Added support for Boost version 1.41.0 in the ublas examples
- Included Fortran 95 interfaces for the diagonally dominant solver functionality (?DTSVB, ?DTTRFB, ?DTTRSB)
- Extended the Fortran 90 interface for the cluster FFTs to support GNU Fortran on Linux* operating systems.
- Significantly reduced the memory consumption of in-place, multi-dimensional cluster FFTs

- Bug fixes

**New in Intel® MKL 10.2 Update 4:**

- New Features
- Introduced the single precision complex absolute value function SCABS1
- Introduced the solver ?DTSVB for diagonally dominant tri-diagonal systems which is up to 2x faster than the general solver with partial pivoting (?GTSV)
- Added routines for factorization (?DTTRFB) and the forward/backward substitution (?DTTRSB) of the diagonally dominant tri-diagonal systems

- Performance improvements
- FFTs
- Enhanced performance for transforms which are a multiple of 8 or 13
- Optimized 1D complex cluster FFTs for non-power-of-2 vector lengths

- VSL
- Convolution and Correlation computations that require decimation show significant improvements

- FFTs
- Bug fixes (see fixes list)

**New in Intel® MKL 10.2 Update 3:**

- Performance improvements
- BLAS: Several Level 1 & 2 BLAS functions newly threaded; Improved scaling for DGEMM for skinny matrices
- LAPACK: Improved scalability for LAPACK functions: ?POTRF, ?GEBRD, ?SYTRD, ?HETRD, and ?STEDC
- FFTs: Extended threading to small-size multi-dimensional transforms and other cases
- VML: Further optimizations: v(s,d)Asin, v(s,d)Acos, v(s,d)Ln, v(s,d)Log10, vsLog1p, v(s/d)Hypot
- VSL: Improved performance for viRngPoisson and viRngPoissonV random number generators

- Usability/Interface improvements
- Improved example programs for uBLAS, Java, FFTW3, LAPACK95, and BLAS95
- New 64-bit integer (ILP64) fftw_mpi interfaces for cluster FFTs

- Bug fixes (see fixes list)

**New in Intel® MKL 10.2 Update 2:**

- Performance improvements
- Many improvements in BLAS functions for Intel® Core™ i7 processors, and Intel® Xeon® processor 5300, 5400, and 5500 series
- Improved scalability of the following LAPACK functions: ?POTRF, ?GEBRD, ?SYTRD, ?HETRD, and ?STEDC divide and conquer eigensolvers
- PARDISO OOC performance has improved significantly for symmetric positive definite matrices
- Improved performance for the double precision Sobol generator for dimensions >= 16
- Improvements in many VML functions for Intel® Xeon® processor 5500 series and others: v(s,d)Pow, v(s,d)Ceil/Trunc/Floor, vsSin/Cos/SinCos, and vdSin/Cos/SinCos
- Improved scalability of 1D, single precision, complex FFTs and improved performance for small 3D transforms

- Usability/Interface improvements
- Support for 64-bit integer parameters in FFTW wrappers
- Intel MKL is now compatible with the representation of logical values in GCC 4.4.0
- All transpose functions now have a Fortran interface

- Bug fixes (see fixes list)

**New in Intel® MKL 10.2:**

- New features
- LAPACK 3.2
- Introduced implementation of the DZGEMM Extended BLAS function (as described at http://www.netlib.org/blas/blast-forum/). See the description of the ?gemm family of functions in the BLAS section of the reference manual.
- PARDISO now supports real and complex, single precision data

- Usability/Interface improvements
- Sparse matrix format conversion routines which convert between the following formats:
- CSR (3-array variation) <-> CSC (3-array variation)
- CSR (3-array variation) <-> diagonal format
- CSR (3-array variation) <-> skyline

- Fortran95 BLAS and LAPACK mod files are now included
- Modules are pre-built with the Intel compiler and located in the include directory (see Intel® MKL User's Guide for full path)
- Source is still included for use with other compilers
- Documentation for these interfaces can be found in the Intel® MKL User's Guide

- The FFTW3 interface is now integrated directly into the main libraries
- Source code is still included to create wrappers for use with compilers not compatible with the default Intel® Fortran compiler convention for name decoration
- See Appendix G of the Reference Manual for information

- DFTI_DESCRIPTOR_HANDLE now represents a true type name and can now be referenced as a type in user programs
- Added parameter to Jacobi matrix calculation routine in the optimization solver domain to allow access to user data (see the description of the djacobix function in the reference manual for more information)
- Added an interface mapping calls to single precision BLAS functions in Intel® MKL (functions with 's' or 'c' initial letter) to 64-bit floating point precision functions has been added on 64-bit architectures (See 'sp2dp' in the Intel® MKL User Guide for more information)
- Compatibility libraries (also known as "dummy" libraries) have been removed from this version of the library

- Sparse matrix format conversion routines which convert between the following formats:
- Performance improvements
- Further threading in BLAS level 1 and 2 functions for Intel® 64 architecture
- Level 1 functions (vector-vector): (CS,ZD,S,D)ROT, (C,Z,S,D)COPY, and (C,Z,S,D)SWAP
- Increase in performance by up to 1.7-4.7 times over version 10.1 Update 1 on 4-core Intel® Core™ i7 processor depending on data location in cache
- Increase in performance by up to 14-130 times over version 10.1 Update 1 on 24-core Intel® Xeon® processor 7400 series system, depending on data location in cache

- Level 2 functions (matrix-vector): (C,Z,S,D)TRMV, (S,D)SYMV, (S,D)SYR, and (S,D)SYR2
- Increase in performance by up to 1.9-2.9 times over version 10.1 Update 1 on 4-core Intel® Core™ i7 processor, depending on data location in cache
- Increase in performance by up to 16-40 times over version 10.1 Update 1 on 24-core Intel® Xeon® processor 7400 series system, depending on data location in cache

- Level 1 functions (vector-vector): (CS,ZD,S,D)ROT, (C,Z,S,D)COPY, and (C,Z,S,D)SWAP
- Introduced recursive algorithm in 32-bit sequential version of DSYRK for up to 20% performance improvement on Intel® Core™ i7 processors and Intel® Xeon® processors in 5300, 5400, and 7400 series.
- Improved LU factorization (DGETRF) by 25% over Intel MKL 10.1 Update 1 for large sizes on the Intel® Xeon® 7460 Processor; small sizes are also dramatically improved
- BLAS *TBMV/*TBSV functions now use level 1 BLAS functions to improve performance by up to 3% on Intel® Core™ i7 processors and up to 10% on Intel® Core™2 processor 5300 and 5400 series.
- Improved threading algorithms to increase DGEMM performance
- up to 7% improvement on 8 threads and up to 50% on 3,5,7 threads on the Intel® Core™ i7 processor
- up to 50% improvement on 3 threads on Intel® Xeon® processor 7400 series.

- Threaded 1D complex-to-complex FFTs for non-prime sizes
- New algorithms for 3D complex-to-complex transforms deliver better performance for small sizes (up to 64x64x64) on 1 or 2 threads
- Implemented high-level parallelization of out-of-core (OOC) PARDISO when operating on symmetric positive definite matrices.
- Reduced memory use by PARDISO for both in-core and out-of-core on all matrix types
- PARDISO OOC now uses less than half the memory previously used in Intel MKL 10.1 for real symmetric, complex Hermitian, or complex symmetric matrices

- Parallelized Reordering and Symbolic factorization stage in PARDISO/DSS
- Up to 2 times better performance (30% improvement on average) on Intel® Core™ i7 and Intel® Core™2 processors for the following VML functions: v(s,d)Round, v(s,d)Inv, v(s,d)Div, v(s,d)Sqrt, v(s,d)Exp, v(s,d)Ln, v(s,d)Atan, v(s,d)Atan2
- Optimized versions of the following functions available for Intel® Advanced Vector Extensions (Intel® AVX)
- BLAS: DGEMM
- FFTs
- VML: exp, log, and pow
- See important information in the Intel® MKL User's Guide regarding the mkl_enable_instructions() function for access to these functions

- Further threading in BLAS level 1 and 2 functions for Intel® 64 architecture

Optimization Notice |
---|

The Intel® Math Kernel Library (Intel® MKL) contains functions that are more highly optimized for Intel microprocessors than for other microprocessors. While the functions in Intel® MKL offer optimizations for both Intel and Intel-compatible microprocessors, depending on your code and other factors, you will likely get extra performance on Intel microprocessors. While the paragraph above describes the basic optimization approach for Intel® MKL as a whole, the library may or may not be optimized to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include Intel® Streaming SIMD Extensions 2 (Intel® SSE2), Intel® Streaming SIMD Extensions 3 (Intel® SSE3), and Supplemental Streaming SIMD Extensions 3 (Intel® SSSE3) instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Intel recommends that you evaluate other library products to determine which best meets your requirements. |

## 添加评论

顶部（有关技术讨论的信息，请访问开发人员论坛。有关网站或软件产品的问题，请联系支持部门。）

请登录添加评论。还不是成员？立即加入