Run-to-Run Reproducibility with Intel® MKL and the Intel® Compilers

The Intel® compilers and the Intel® Math Kernel Library have settings and functions that allow users to balance performance and their requirements for reproducible floating point results. The webinar presentation (available as a PDF or you can watch the recording) discusses the mechanisms that cause variability in floating point results, the controls available to limit these mechanisms, and the performance trade-offs involved.

More information can be found linked from our Conditional Numerical Reproducibility page.

More information on the controls in the Intel compilers can be found in this Whitepaper: Consistency of Floating-Point Results using the Intel® Compiler


Q: Is there a way to have consistency between 32 bit and 64 bit Intel MKL
A: It's possible you could see consistency in some algorithms now, but we don't currently validate or ensure reproducibility in this case. One mechanism that is a hurdle would be use of math libraries for which there is no standard required results and which can therefore have slightly different implementations from architecture to architecture.

Q: What about reproducibility from OS to OS:
A: The answer to this question is similar to the above.

Q: Any future plans to support 128bit double in hardware? [Would this] help minimize differences for some numerical algorithms?:
A: It's possible that this could minimize differences for some algorithms, but there are no current plans to support 128-bit double precision in Intel MKL or in hardware. The Intel compilers support quad precision in software.

Q: Do results differ for single- vs. double-precision functions?:
A: Quite possibly they will differ since twice as many single-precision floating point numbers will fit into registers and hence be used at once by vector instructions.

Q: Question regarding Slide 19: The performance is halved between CNR off and compatible. But this test was performed on a 40kx40k matrix on 12 threads. How big would the difference in performance be if the matrix was getting smaller 5k x 5k or if the number of threads were smaller?:
A: While we don't have direct data to answer that question, other benchmarks we've done show a fairly regular percentage performance drop. Ultimately the performance graph is just meant to give a very general idea of the sort of impact you might see if you require reproducibility — we encourage our users to give it a try on their own benchmarks that are representative of their workload.

Q: Why is the divide by zero exception burden placed on the coder. Why doesn't the FPU/ALU generate an overflow error, handle it internally and exit gracefully, rather than have the code crash when divide by zero does occur. I ask this because I have several times not been able to foresee a divide by zero limit condition which has led to program crashing.:
A: By default, exceptions are masked; an infinity will be generated, and the application will continue. You can unmask exceptions, in which case the application will halt, unless you write your own handler.

Q: What about integer operations order, for example n*(n+1)/2. Here the order is very important.:
A: These are always evaluated correctly. There is no rounding error for integer operations.

Q: How reproducible are results from Intel MKL functions when called from GPU libraries such as MAGMA? :
A: Intel MKL does not currently support reproducibility on GPUs or co-processors.

For more complete information about compiler optimizations, see our Optimization Notice.