Intel® MKL product changes since Intel® MKL 9.1
 Linking model change
 In Version 10.0 of Intel® MKL we have rearchitected Intel® MKL and physically separated the interface, threading and computational components of the product. This architecture facilitates the use of multiple library linking combinations to support numerous configurations of interfaces, compilers, and processors in a single package. Multiple layers are provided so that the base Intel® MKL package supports numerous configurations of interfaces, compilers, and processors in a single package. This new Intel® MKL architecture is intended to provide maximum support for our varied customers’ needs, while minimizing the effort it takes to obtain and utilize the great performance of Intel® MKL. For more information, please refer to the "Using Intel® MKL Parallelism" section of the Intel® MKL User’s Guide
 Cluster enabled capability available in single Intel® MKL product
 In Intel® MKL 9.1, there were two versions of Intel® MKL (Intel® MKL for Windows, and Intel® MKL Cluster Edition for Windows). In Intel® MKL 10.0, we have merged these two versions and now there is only one version: Intel® MKL for Windows, which includes ScaLAPACK, distributed memory FFT’s and all other capabilities of the former "Cluster Edition"
Performance improvements since Intel® MKL 9.1
 DGEMM and SGEMM on Intel® Core™2 Quad processors
 Large square and large outer product sizes were improved by 1.04 times on 1 thread and 1.1 times to 1.15 times on 8 threads
 Other level 3 real functions were improved by 1.02 times to 1.04 times on large sizes
 Several linear equation solvers (?spsv/?hpsv/?ppsv, ?pbsv/?gbsv, ?gtsv/?ptsv, ?sysv/?hesv) have dramatically improved in performance. Banded and packed storage format and multiple righthand sides cases see performance enhanced up to 100 times
 All symmetric eigensolvers (?syev/?syev, ?syevd/?heevd, ?syevx/?heevx, ?syevr/?heevr) have significantly improved, since tridiagonalization routines (?sytrd/?hetrd) have sped up to 4 times
 All symmetric eigensolvers in packed storage (?spev/?hpev, ?spevd/?hpevd, ?spevx/?hpevx) have significantly improved, since tridiagonalization routines in packed storage (?sptrd/?hptrd) perform 3 times better than previous version
 A number of routines which apply orthogonal/unitary transformations (?ormqr/?unmqr, ?ormrq/?unmrq, ?ormql/?unmql, ?ormlq/?unmlq) are up to 2 times faster
 Performance of complex 1D FFTs for poweroftwo sizes was improved by up to 1.8 times on 1 thread
 On systems with Intel® EM64T and running in 64bit mode
Complex 2D FFTs were sped up by up to 1.1 times on 1 thread for single and double precision
 Parallel Complex 2D FFTs were sped up for single precision by up to 1.2 times on 8 threads and for double precision by up to 1.3 times
 Parallel Complex 3D FFTs were sped up by up to 1.15 times for single and double precision
 Parallel Complex Backward 2D FFTs were sped up for double precision by up to 1.4 times and for single precision up to 1.3 times
 Single complex backward 1D FFT size greater than 2^22 were sped up by up to 2 times on 4 threads and up to 2.4 times on 8 threads on Itanium® processors
 Performance of VSL functions is improved on nonIntel processors by approximately 2 times on average
 Performance of VML vdExp, vdSin, and vdCos functions is improved on nonIntel processors by 1.18 times on average
 Performance of VSL functions is improved on IA32 and Intel® 64 by 1.07 times on average
Other Improvements
 Change in threading model
 Previously, when OMP_NUM_THREADS was undefined the number of threads for Intel® MKL defaulted to 1. With Intel® MKL 10.0, when the environment variable OMP_NUM_THREADS is undefined, your compiler run time library (e.g. libguide) determines the default number of threads. Intel® MKL may create multiple threads depending on problem size and the value of the MKL_DYNAMIC or other threading environment variables
 To provide additional user control over threading, the following environment variables have been added: MKL_NUM_THREADS, MKL_DOMAIN_NUM_THREADS, and MKL_DYNAMIC as well as the corresponding library routines. See the User Guide for details
 The C DFTI has changed in the ILP64 variant of the C/C++ interface. The MKL_LONG type is used instead of long type in C DFTI interface, i.e.
MKL_LONG Dfti…(… ,MKL_LONG, …) instead of long Dfti…(…,long, …). For example we have difference on Windows where long is 4 byte, MKL_LONG is 8 byte in ILP64 variant. See details in the User’s Guide
 Outofcore (OOC) PARDISO for all types of matrices
 In version 10.0, we have added outofcore memory support to PARDISO. While computers have greatly increased memory capacity, there continue to be a large number of problems for which problems sizes are too great to solve with inmemory solutions. For customers who are encountering problem size limitations we encourage you to try our new outofcore memory PARDISO solution. Opportunities for further performance optimizations have been identified and we plan to release an Intel® MKL update in the coming months with significant performance improvements
 ZGEMM3M and CGEMM3M functions
 These complex functions use three block matrix multiplies and five additions as opposed to four block matrix multiplies and four additions to reduce the number of operations. These two functions are extensions to the standard BLAS in Intel MKL using the same syntax as ZGEMM and CGEMM respectively
 Using [Z/C]GEMM3M instead of [Z/C]GEMM can give up to 1.25 times of performance improvement without bittobit correspondence of the results
 An ILUT preconditioner has been added
 Support for sparse 0based indexing has bee n added
 The mkl_scsrgemv, a single precision sparse BLAS matrix vector multiply function, has been added
 The DftiCommitDescriptor function has been optimized by avoiding double data initialization for serial and parallel 1D FFT. This function now runs faster and allocates less memory
 Vector Math Library (VML)
 New VML EP (enhanced performance) accuracy mode has been introduced. The EP routines are significantly faster than LA (low accuracy) routines and are accurate to at least 11 and 26 bits for single and double precisions respectively. See vmlSetMode function description in the Intel® MKL manual for details
 New VML functions added: v{s,d,c,z}Mul, v{c,z}MulByConj, v{c,z}Div, v{s,d,c,z}Add, v{s,d,c,z}Sub, v{c,z}Conj, v{s,d}Expm1, v{s,d}Log1p, v{s,d}Sqr, v{s,d}Pow3o2, v{s,d}Pow2o3, v{s,d,c,z}Abs, v{c,z}CIS
 Vector Statistical Library (VSL)
 Support of 64bit nskip parameter of vslSkipAheadStream service routine in all versions of the VSL (not only ILP64) introduced
 Bugs in vslCopyStream, vslCopyStreamState service routines, and VSL QRNG initialization scheme for the case of userdefined parameters were fixed
 Trigonometric Transforms have been extended to support various kinds of DCT/DST transforms. In addition to even size transforms, odd size transforms are supported starting from this release
 New FFTW 3.x wrappers have been developed for realtoreal (DCT/DST) transforms
Operating System:
Red Hat* Linux, Windows Vista*, Windows* XP Starter Edition, SUSE* Linux 

Kommentar hinzufügen
Seitenanfang(Technische Beiträge finden Sie in unseren Entwicklerforen. Bei Fragen zur Website oder zu Softwareprodukten kontaktieren Sie den Support.)
Bitte anmelden, um einen Kommentar hinzuzufügen. Sie sind noch nicht Mitglied? Jetzt teilnehmen