Obtaining Numerically Reproducible Results
offers functions and environment variables that help you obtain Conditional Numerical Reproducibility (CNR) of floatingpoint results when calling the library functions from your application. These new controls enable
to run in a special mode, when functions return bitwise reproducible floatingpoint results from run to run under the following conditions:
Intel® oneAPI Math Kernel Library
 Calls tooccur in a single executableIntel® oneAPI Math Kernel Library
 The number of computational threads used by the library does not change in the run
For a limited set of routines, you can eliminate the second condition by using
in
strict CNR mode.
Intel® oneAPI Math Kernel Library
It is well known that for general single and double precision IEEE floatingpoint numbers, the associative property does not always hold, meaning (a+b)+c may not equal a +(b+c). Let's consider a specific example. In infinite precision arithmetic 2^{63} + 1 + 1 = 2^{63}. If this same computation is done on a computer using double precision floatingpoint numbers, a rounding error is introduced, and the order of operations becomes important:
(2^{63} + 1) + (1)
≃
1 + (1) = 0
versus
2^{63} + (1 + (1))
≃
2^{63} + 0 = 2^{63} This inconsistency in results due to order of operations is precisely what the new functionality addresses.
The application related factors that affect the order of floatingpoint operations within a single executable program include selection of a code path based on runtime processor dispatching, alignment of data arrays, variation in number of threads, threaded algorithms and internal floatingpoint control settings. You can control most of these factors by controlling the number of threads and floatingpoint settings and by taking steps to align memory when it is allocated (see the
Getting Reproducible Results with Intel® MKL knowledge base article for details). However, runtime dispatching and certain threaded algorithms do not allow users to make changes that can ensure the same order of operations from run to run.
Intel® oneAPI Math Kernel Library
Intel® oneAPI Math Kernel Library
Intel® oneAPI Math Kernel Library
Intel® oneAPI Math Kernel Library
2
(Intel® AVX2
). The featurebased approach introduces a challenge: if any of the internal floatingpoint operations are done in a different order or are reassociated, the computed results may differ.
Dispatching optimized code paths based on the capabilities of the processor on which the code is running is central to the optimization approach used by
. So it is natural that consistent results require some performance tradeoffs. If limited to a particular code path, performance of
can in some circumstances degrade by more than a half. To understand this, note that matrixmultiply performance nearly doubled with the introduction of new processors supporting Intel AVX
Intel® oneAPI Math Kernel Library
Intel® oneAPI Math Kernel Library
2
instructions. Even if the code branch is not restricted, performance can degrade by 1020% because the new functionality restricts algorithms to maintain the order of operations.
Optimization Notice


Intel's compilers may or may not optimize to the same degree for nonIntel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessordependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804

This notice covers the following instruction sets: SSE2, SSE4.2, AVX2, AVX512.