Intel® Math Kernel Library

Fastest and most used math library for Intel and compatible processors**

  • Vectorized and threaded for highest performance on all Intel and compatible processors
  • De facto standard APIs for simple code integration
  • Compatible with all C, C++ and Fortran compilers
  • Royalty-free, per developer licensing for low cost deployment

From $499
Buy Now

Or Download a Free 30-Day Evaluation Version

Performance: Ready to Use

Intel® Math Kernel Library (Intel® MKL) includes a wealth of routines to accelerate application performance and reduce development time. Today’s processors have increasing core counts, wider vector units and more varied architectures. The easiest way to take advantage of all of that processing power is to use a carefully optimized computing math library designed to harness that potential. Even the best compiler can’t compete with the level of performance possible from a hand-optimized library.

Because Intel has done the engineering on these ready-to-use, royalty-free functions, you’ll not only have more time to develop new features for your application, but in the long run you’ll also save development, debug and maintenance time while knowing that the code you write today will run optimally on future generations of Intel processors.

Intel® MKL includes highly vectorized and threaded Linear Algebra, Fast Fourier Transforms (FFT), Vector Math and Statistics functions. Through a single C or Fortran API call, these functions automatically scale across previous, current and future processor architectures by selecting the best code path for each.


Intel® MKL delivers industry-leading performance on Monte Carlo and other math-intensive routines


Quotes

“I’m a C++ and Fortran developer and have high praise for the Intel® Math Kernel Library. One nice feature I’d like to stress is the bitwise reproducibility of MKL which helps me get the assurance I need that I’m getting the same floating point results from run to run."
Franz Bernasek
CEO and Senior Developer, MSTC Modern Software Technology

“Intel MKL is indispensable for any high-performance computer user on x86 platforms.”
Prof. Jack Dongarra,
Innovative Computing Lab,
University of Tennessee, Knoxville

Comprehensive Math Functionality – Covers Range of Application Needs
Click to enlarge

Comprehensive Math Functionality – Covers Range of Application Needs

Intel® MKL contains a wealth of threaded and vectorized complex math functions to accelerate a wide variety of software applications. Why write these functions yourself when Intel has already done the work for you?

Major functional categories include Linear Algebra, Fast Fourier Transforms (FFT), Vector Math and Statistics. Cluster-based versions of LAPACK and FFT are also included to support MPI-based distributed memory computing.

Standard APIs – For Immediate Performance Results

Click to enlarge

Standard APIs – For Immediate Performance Results

Wherever available, Intel® MKL uses de facto industry standard APIs so that minimal code changes are required to switch from another library. This makes it quick and easy to improve your application performance through simple function substitutions or relinking.

Simply substituting Intel® MKL’s LAPACK (Linear Algebra PACKage), for example, can yield 500% or higher performance improvement (benchmark left.)

In addition to the industry-standard BLAS and LAPACK linear algebra APIs, Intel® MKL also supports MIT’s FFTW C interface for Fast Fourier Transforms.

Highest Performance and Scalability across Past, Present & Future Processors – Easily and Automatically
Click to enlarge

Highest Performance and Scalability across Past, Present & Future Processors – Easily and Automatically

Behind a single C or Fortran API, Intel® MKL includes multiple code paths -- each optimized for specific generations of Intel and compatible processors. With no code-branching required by application developers, Intel® MKL utilizes the best code path for maximum performance.

Even before future processors are released, new code paths are added under these same APIs. Developers just link to the newest version of Intel® MKL and their applications are ready to take full advantage of the newest processor architectures.

In the case of the Intel® Many Integrated Core Architecture (Intel® MIC Architecture), in addition to full native optimization support, Intel® MKL can also automatically determine the best load balancing between the host CPU and the Intel® Xeon® Phi™ coprocessor.

Flexibility to Meet Developer Requirements
Click to enlarge

Flexibility to Meet Developer Requirements

Developers have many requirements to meet. Sometimes these requirements conflict and need to be balanced. Need consistent floating point results with the best application performance possible? Want faster vector math performance and don’t need maximum accuracy? Intel® MKL gives you control over the necessary tradeoffs.

Intel® MKL is also compatible with your choice of compilers, languages, operating systems, linking and threading models. One library solution across multiple environments means only one library to learn and manage.

FeatureBenefit
Conditional Numerical Reproducibility

Overcome the inherently non-associativity characteristics of floating-point arithmetic results with new support in the Intel MKL. New in this release is the ability to achieve reproducibility without memory alignment.

New and improved optimizations for Haswell Intel® Core™, Intel® microarchitecture code name Ivy Bridge, future Broadwell processors and Intel® Xeon® Phi™ coprocessors

Intel MKL is optimized for the latest and upcoming processor architectures to deliver the best performance in the industry. For example, new optimizations for the fusedmultiply-add (FMA) instruction set introduced in Haswell Core processors deliver up to 2x performance improvement for floating point calculations.

Automatic offload and compute load balancing between Intel Xeon processors and Intel Xeon Phi coprocessors – Now for Windows*

For selected linear algebra functions, Intel MKL can automatically determine the best way to utilize a system containing one or more Intel Xeon Phi coprocessors. The developer simply calls the MKL function and it will take advantage of the coprocessor if present on the system. New functions added for this release plus Windows OS support.

ExtendedEigensolver Routines based on the FEAST algorithm

New sparse matrix Eigensolver routines handle larger problem sizes and use less memory. API-compatibility with the open source FEAST Eigenvalue Solver makes it easy to switch to the highly optimized Intel MKL implementation.

Linear Algebra

Intel® MKL BLAS provides optimized vector-vector (Level 1), matrix-vector (Level 2) and matrix-matrix (Level 3) operations for single and double precision real and complex types. Level 1 BLAS routines operate on individual vectors, e.g., compute scalar product, norm, or the sum of vectors. Level 2 BLAS routines provide matrix-vector products, rank 1 and 2 updates of a matrix, and triangular system solvers. Level 3 BLAS level 3 routines include matrix-matrix products, rank k matrix updates, and triangular solvers with multiple right-hand sides.

Intel® MKL LAPACK provides extremely well-tuned LU, Cholesky, and QR factorization and driver routines that can be used to solve linear systems of equations. Eigenvalue and least-squares solvers are also included, as are the latest LAPACK 3.4.1 interfaces and enhancements.

If your application already relies on the BLAS or LAPACK functionality, simply re-link with Intel® MKL to get better performance on Intel and compatible architectures.

Fast Fourier Transforms

Intel® MKL FFTs include many optimizations and should provide significant performance gains over other libraries for medium and large transform sizes. The library supports a broad variety of FFTs, from single and double precision 1D to multi-dimensional, complex-to-complex, real-to-complex, and real-to-real transforms of arbitrary length. Support for both FFTW* interfaces simplifies the porting of your FFTW-based applications.

Vector Math

Intel® MKL provides optimized vector implementations of computationally intensive core mathematical operations and functions for single and double precision real and complex types. The basic vector arithmetic operations include element-by-element summation, subtraction, multiplication, division, and conjugation as well as rounding operations such as floor, ceil, and round to the nearest integer. Additional functions include power, square root, inverse, logarithm, trigonometric, hyperbolic, (inverse) error and cumulative normal distribution, and pack/unpack. Enhanced capabilities include accuracy, denormalized number handling, and error mode controls, allowing users to customize the behavior to meet their individual needs.

Statistics

Intel® MKL includes random number generators and probability distributions that can deliver significant application performance. The functions provide the user the ability to pair Random-Number Generators such as Mersenne Twister and, Niederreiter with a variety of Probability Distributions including Uniform, Gaussian and Exponential.

Intel® MKL also provides computationally intensive core/building blocks for statistical analysis both in and out-of-core. This enables users to compute basic statistics, estimation of dependencies, data outlier detection, and missing value replacements. These features can be used to speed-up applications in computational finance, life sciences, engineering/simulations, databases, and other areas.

Data Fitting

Intel® MKL includes a rich set of splines functions for 1-dimensional interpolation. These are useful in a variety of application domains including data analytics (e.g. histograms), geometric modeling and surface approximation. Splines included are linear, quadratic, cubic, look-up, stepwise constant and user-defined.

What’s New

Conditional Bitwise Reproducible Results

When exact reproducible calculations are required, Intel® MKL gives developers control over the tradeoffs to maximize performance across a set of target processors while delivering identical floating point results

Optimized for Intel® Advanced Vector Extensions 2 (Intel® AVX2), Intel® microarchitecture code name Ivy Bridge and Intel® Many Integrated Core Architecture (Intel® MIC Architecture) processor architectures

Intel® MKL is optimized for the latest and upcoming processor architectures to deliver the best performance in the industry. Support for the new digital random number generator provides truly random seeding of statistical calculations.

Automatic offload and compute load balancing between Intel® Xeon® processor and Intel® Xeon Phi™ coprocessors

For Linear Algebra functionality, Intel® MKL can automatically determine the best way to utilize a system containing one or more Intel® MIC processors. The developer simply calls an MKL function and doesn’t have to worry about the details.

Data Fitting functions

A rich set of splines are now included to optimize 1-dimensional interpolation calculations used in a variety of application domains

Conditional Bitwise Reproducible Results

When exact reproducible calculations are required, Intel® MKL gives developers control over the tradeoffs to maximize performance across a set of target processors while delivering identical floating point results

Optimized for Intel® Advanced Vector Extensions 2 (Intel® AVX2), Intel® microarchitecture code name Ivy Bridge and Intel® Many Integrated Core Architecture (Intel® MIC Architecture) processor architectures

Intel® MKL is optimized for the latest and upcoming processor architectures to deliver the best performance in the industry. Support for the new digital random number generator provides truly random seeding of statistical calculations.

Automatic offload and compute load balancing between Intel® Xeon® processor and Intel® Xeon Phi™ coprocessors

For Linear Algebra functionality, Intel® MKL can automatically determine the best way to utilize a system containing one or more Intel® MIC processors. The developer simply calls an MKL function and doesn’t have to worry about the details.

Data Fitting functions

A rich set of splines are now included to optimize 1-dimensional interpolation calculations used in a variety of application domains

Click on images for a larger view of the benchmark graphic.


Linear Algebra Performance Charts


DGEMM
DGEMM Performance Benchmark

Intel® Optimized SMP LINPACK
Intel® Optimized SMP LINPACK Benchmark

HPL LINPACK
HPL LINPACK performance benchmark

LU Factorization
LU Factorization Performance Benchmark

Cholesky Factorization
Cholesky Factorization Benchmark


FFT Performance Charts


2D and 3D FFTs on Intel® Xeon and Intel® Core Processors
Cluster FFT Performance Benchmark

Cluster FFT Performance
Cluster FFT Performance Benchmark

Cluster FFT Scalability
Cluster FFT Scalability Benchmark


Sparse BLAS and Sparse Solver

Performance Charts



Data Fitting Performance Charts


DCSRGEMV and DCSRMM
DCSRGEMV and DCSRMM performance benchmark

PARDISO Sparse Solver
PARDISO Sparse Solver performance benchmark

Natural cubic spline construction and interpolation
Natural cubic spline construction and interpolation Performance Benchmark


Random Number Generator Performance Charts



Vector Math Performance Chart



Application Benchmark Performance Chart


MCG31m1
Random Number Generator Performance Benchmark

VML exp()
VML exp() Function Performance Benchmark

Monte-Carlo option pricing performance benchmark
Monte-Carlo option pricing performance benchmark

Click on images for a larger view of the benchmark graphic.


Linear Algebra Performance Charts


Intel® Optimized SMP LINPACK
DGEMM Performance Benchmark

LU Factorization
LU Factorization Performance Benchmark

QR Factorization
QR Factorization Performance Benchmark

HPL LINPACK
HPL LINPACK

Cholesky Factorization
Cholesky Factorization Performance Benchmark

Matrix Multiply
Matrix Multiply Performance Benchmark


Application Benchmark Performance Chart



Batch 1D FFT Performance Chart



Black- Scholes Chart           


Monte Carlo Option Pricing
Monte Carlo Option Pricing Performance Benchmark

 
Batch 1D FFT Performance Chart

 
Black- Scholes Performance Benchmark

Videos to help you get started.

Register for future Webinars


Previously recorded Webinars:

  • Powered by MKL Accelerating NumPy and SciPy Performance with Intel® MKL- Python
  • Get Ready for Intel® Math Kernel Library on Intel® Xeon Phi™ Coprocessor
  • Beginning Intel® Xeon Phi™ Coprocessor Workshop: Advanced Offload Topics
  • Accelerating financial services applications using Intel® Parallel Studio XE with the Intel® Xeon Phi™ coprocessor

Featured Articles

No Content Found

Pages

More Tech Articles

Running The HPL Benchmark Over Intel MPI
By Mohamad SindiPosted 10/25/20100
This is a step by step procedure on how to run the High Performance Linpack (HPL) benchmark on a Linux cluster using Intel-MPI. This was done on a Linux cluster of 128 nodes running Intel’s Nehalem processor 2.93 MHz with 12GB of RAM on each node.
Hybrid applications: Intel MPI Library and OpenMP*
By Gergana Slavova (Intel)Posted 05/21/20091
Tips and tricks on how to get the optimal performance settings for your mixed Intel MPI/OpenMP applications.
Intel® MPI Library for Linux* Tips and Tricks - FAQ: Part 1 of 2
By Andrey Derbunovich (Intel)Posted 05/07/20090
An FAQ regarding starting up and tuning the Intel MPI Library
Writing Parallel Programs: a multi-language tutorial introduction
By Andrey Chernyshev (Intel)Posted 12/02/200810
Introduction Parallel programming was once the sole concern of extreme programmers worried about huge supercomputing problems. With the emergence of multi-core processors for mainstream applications, however, parallel programming is well poised to become a technique every professional software de...

Pages

Subscribe to

Supplemental Documentation

No Content Found
Subscribe to

You can reply to any of the forum topics below by clicking on the title. Please do not include private information such as your email address or product serial number in your posts. If you need to share private information with an Intel employee, they can start a private thread for you.

New topic    Search within this forum     Subscribe to this forum


C# Nonlinear Optimization Problem Solvers
By Michael W.2
Hi, I'm using Intel MKL from C#. In General it works. I want to use the Nonlinear Optimization Problem Solvers and I've translated the example, see http://software.intel.com/en-us/node/471540. But the dtrnlsp_solve gives me sometimes a memory exception (attempt to read write protected memory).  I've attached all the dllimports, see below. [DllImport("mkl", CallingConvention = CallingConvention.Cdecl, ExactSpelling = true, SetLastError = false)] internal static extern int dtrnlsp_init( ref IntPtr handle, ref int n, ref int m, IntPtr x, [In] double[] eps, ref int iter1, ref int iter2, ref double rs); [DllImport("mkl", CallingConvention = CallingConvention.Cdecl, ExactSpelling = true, SetLastError = false)] internal static extern int dtrnlsp_check( ref IntPtr handle, ref int n, ref int m, IntPtr fjac, IntPtr fvec, [In] double[] eps, [In, Out] int[] info); [DllImport("mkl", CallingConvention = CallingConvention.Cdecl, ExactSpelling = true, SetLastError = false)] internal static extern i...
More Threads with G than S in mkl_dcsrmv?
By Robert P.5
I am performing a sparse matrix vector multiplication using mkl_dcsrmv on a system with ~80,000 degrees of freedom. My matrix is symmetric, so as a first attempt I used the option "SLNCxx" for matdescra and passed in the lower triangular part only. This works fine and gives the correct answer, but on a E5-4650 machine with 32 cores the code maxes out at 8 threads. If I instead call mkl_dcsrmv with "GxxCxx" and pass in the full sparse matrix, the code scales up to 32 threads and completes in roughly half the time as the symmetric version. This code is running with MKL 11.1 packaged with Composer XE 2013 SP1 on Linux. Should I expect the symmetric version of mkl_dcsrmv to execute with fewer threads than the general version? Thank you for your advice.
Distribution with VS2012+MKL
By Raivyn3
I am trying to distribute my VS2012/MKL code to a separate computer. When on that computer I get a vcomp.dll not found. I understand that by using vcomp.dll, the application is using the MS OpenMP instead of libiomp. I am trying to get around this dependency on MS OpenMP and searching through forums I made the following changes: - Added the <intel directory>\compiler\lib\intel64 directory to "VC++ Directories"->Reference Directories and Library Directories - Added libiomp5md.lib to Linker->Additional Dependencies - Added vcomp.lib to Linker->Ignore Specific Default Libraries - Added the <intel directory>\compiler\lib\intel64 to my Linker->Additional Library Directories for good measure. Each change still loads vcomp110.dll. Is additional steps I am missing to force VS to use libiomp instead of vcomp? I cannot provide reproduce-able code to attach. Thanks  
VML accuracy environment variable
By nils.smeds@se.ibm.com1
Would it be a stupid idea to have an environment variable  MKL_VML_MODE that - if set - would call vmlSetMode()  ? MKL_VML_MODE - Comma separated strings of parameters as defined in mkl_vml_defines.h. The corresponding values as defined in mkl_vml_defines.h are or-ed together and used as a parameter to vmlSetMode(). It is the user's responsibility to state a set of names that are not mutually excluding. (E.g. MKL_VML_MODE=VML_FLOAT_CONSISTENT,VML_DOUBLE_CONSISTENT  is incorrect since these two values are not allowed to be set simultaneously). Any word in the list not recognized by the VML library will result in a warning message being printed and the word ignored for the resulting value sent to vmlSetMode(). And, yes I could write it myself, but maybe it could be useful to others too? /Nils
PARDISO: Large number of subsequent identical calls result in very different runtimes
By Mathias H.1
Hi, I am working with Pardiso in C++ on sparse symmetric positive definite matrices. Unexpectedly slow performance led me to do the following experiment: I define a matrix and run phase 11 (symbolic factorization) once. Then I repeatedly run phase 22 followed by phase 33, using default iparm parameters, let's say 1000 times, on the exact same matrix. For most iterations the speed is as expected, but for a (usually small) number of iterations, phase 22 takes much longer to complete. For example, on a run of 1000 iterations, the minimum duration for phase 22 might be 0.002 seconds while the longest might be 10.5 seconds. In my experience, the more iterations are performed, the larger the speed discrepancies get - for 10000 iterations, the longest phase 22 took almost 30 seconds. It doesn't seem to be a memory leak, and it doesn't follow an obvious pattern - the iterations don't get slower and slower. The slow iterations are spread out seemingly randomly amongst the fast ones. By defau...
Error linking with Lapack
By Mads P.6
Hi I am trying to run an example that solves a set of linear equations using SGESV, but during linking I get the following error messages:   1>mkl_intel_c.lib(_sgesv.obj) : error LNK2019: unresolved external symbol _mkl_serv_set_progress referenced in function _sgesv 1>mkl_intel_c.lib(_sgesv.obj) : error LNK2019: unresolved external symbol _mkl_serv_setxer referenced in function _sgesv 1>mkl_intel_c.lib(_sgesv.obj) : error LNK2019: unresolved external symbol _mkl_lapack_sgesv referenced in function _sgesv 1>mkl_intel_c.lib(_misc_mkl_progress_iface_u.obj) : error LNK2019: unresolved external symbol _mkl_serv_default_progress referenced in function _MKL_PROGRESS 1>mkl_intel_c.lib(_misc_mkl_xerbla_iface_u.obj) : error LNK2019: unresolved external symbol _mkl_serv_default_xerbla referenced in function _XERBLA   What am I doing wrong?
MKL_PARDISO in os x
By ericcp.dias.ie8
Hello, I am trying to solve a sparse linear system in fortran 90 using the mkl pardiso solver as in the following code:  do i = 1, 64 iparm(i) = 0 end do error = 0 ! initialize error flag msglvl = 1 ! print statistical information mtype = 13 ! complex unsymmetric !C.. Initiliaze the internal solver memory pointer. This is only !C necessary for the FIRST call of the PARDISO solver. do i = 1, 64 pt( i )%DUMMY = 0 end do maxfct = 1 mnum = 1 nrhs = 1 !C.. Reordering and Symbolic Factorization, This step also allocates !C all memory that is necessary for the factorization phase = 11 ! only reordering and symbolic factorization print*, ' calling sparse solver' CALL PARDISO (pt, maxfct, mnum, mtype, phase, nPardiso, values, rowIndex, columns, & perm, nrhs, iparm, msglvl, b, Ex, error) WRITE(*,...
DSBGVX documentation: ifail dimension is n?
By martenjan2
dsbgvx kept crashing on me with an access violation for large enough problems (n=3636). I am compiling for an x64 target. Meticulously studying the manual, http://software.intel.com/sites/products/documentation/hpc/mkl/mklman/index.htm#GUID-659297EE-D35B-428C-ABBE-1A7EBE2B0F6E, I noticed that the dimension of ifail is m for dsbgvx, and n for sbgvx. Allocating n integers for ifail, dsbgvx stops crashing.   Another peculiarity is that dsbgvx does not crash on me while compiled for an Win32 target. Questions: I am selecting only m out of n eigenvalues. The information I aim for should fit in w(1:m) and z(1:n,1:m), as discussed in the Fortran 95 interface of e.g. ?stein. So, is w(n) and z(n,n) really required in dsbgvx? Now that ifail(n) works and ifail(m) fails, should the manual read that the dimension of ifail must be n? And if so, why do w(m), z(n,m) and ifail(m) work for Win32 targets?  

Pages

Subscribe to Forums

Choose a topic:

  • Can I redistribute the Intel Math Kernel Library with my application?
  • Yes. When you purchase Intel MKL, you receive rights to redistribute computational portions of Intel MKL with your application. The evaluation versions of Intel MKL do not include redistribution rights. The list of files that can be redistributed is provided in redist.txt included in the Intel MKL distribution with product license.

  • Are there royalty fees for using Intel MKL?
  • No. There is no per copy royalty fee. Check the Intel MKL end user license agreement (EULA) for more details.

  • What files am I allowed to redistribute?
  • In general, the redistributable files include the linkable files (.DLL and .LIB files for Windows*, .SO and .A files for Linux*). With your purchase of Intel MKL (and updates through the support service subscription), you receive the redist.txt file which outlines the list of files that can be redistributed. The evaluation versions of Intel MKL do not include redistribution rights. See EULA for all terms.

  • Is there a limit to the number of copies of my application that I can ship which include Intel MKL redistributables?
  • You may redistribute an unlimited number of copies of the files that are found in the directories defined in the Redistributables section of the EULA.

  • How many copies of Intel MKL do I need to secure for my project team or company?
  • The number of Intel MKL copies that you need is determined by the number of developers who are writing code, compiling, and testing using Intel MKL API, For example, five developers in an organization working on building code with Intel MKL will require five Intel MKL licenses. View the EULA for complete details.

  • Do I need to get a license for each machine being used to develop and test applications using Intel MKL library?
  • The number of licenses for Intel MKL that you need are determined by the number of developers and build machines that may be in simultaneous use in your organization. These can be deployed on any number of machines on which the application is built and/or tested as long as there is only the number of licensed copies in use at any given time. For example a development team of five developers using ten machines simultaneously for development and test activities with Intel MKL, will be required to get ten licenses of Intel MKL. View the EULA for complete details.

  • Do I need to buy an Intel MKL license for each copy of our software that we sell?
  • No, there is no royalty fee for redistributing Intel MKL files with your software. By licensing Intel MKL for your developers, you have rights to distribute the Intel MKL files with your software for an unlimited number of copies. For more information, please refer to the EULA.

  • Where can I view the Intel MKL license agreement before making a decision to purchase the product?
  • The number of copies of Intel MKL that you need is determined by the number of developers who are writing code, compiling, and testing using the Intel MKL API, as well as the number of build machines involved in compiling and linking, which need the full Intel MKL development tools file set. See EULA for all terms.

Intel® Math Kernel Library 11.1

Getting Started?

Click the Learn tab for guides and links that will quickly get you started.

Get Help or Advice

Search Support Articles
Forums - The best place for timely answers from our technical experts and your peers. Use it even for bug reports.
Support - For secure, web-based, engineer-to-engineer support, visit our Intel® Premier Support web site. Intel Premier Support registration is required.
Download, Registration and Licensing Help - Specific help for download, registration, and licensing questions.

Resources

Release Notes - View Release Notes online!
Fixes List - View Compiler Fixes List

Documentation:
Reference Manual
Linux* | Windows* | OS X*
Documentation for other software products

Featured Support Topics

No Content Found

Pages

**Source: Evans Data Software Developer surveys 2011-2013

Top Features