Intel(R) MKL 11.0 introduces new functionality allowing users to balance the need for reproducible results with performance. The webinar recording and presentation linked below discusses the mechanisms that cause variability in floating point results, the new controls to limit these, and the performance trade-offs involved.
More information can be found linked from our Conditional Numerical Reproducibility page.
Q&A from the session If you have more questions, ask them on this thread on the Intel MKL user forum
Q: Did you say in case of LINPACK and BLAS functions the reduction is performance is huge? Can you give a % drop?
A: When processors supporting Intel AVX instructions were first released there were significant boosts in performance due to the wider registers and the subsequent doubling of the theoretical peak performance. To get processor-to-processor reproducible results then between such a system and another system then means that you restrict yourself using the new controls to use a code path that does not use the new instructions and therefore does not take advantage of those wide registers. This then can mean performance gets reduced by nearly half.
Q: The performance chart shows that the difference in performance between CNR off and ‘AUTO’ is small or zero. Is there an example where there is a difference that would cause users to not have AUTO be on by default?
A: There are some examples where the performance is non-trivial. We will be posting some performance charts soon that demonstrate this.
Q: If there is no performance degradation, why wouldn’t we always want reproducibility?
A: If there were no difference in performance between turning CNR on and off then it would be worthwhile to keep it on. There are cases where they are not the same and should post them soon.
Q: In our source code we use Intel MKL, but deploy our application on any PC. Is setting CNR to AUTO our best option?
A: If you want run-to-run reproducibility and best possible performance, AUTO is what you want. If you need reproducibility from one PC to another PC ("any" PC as you put it) then you'll need to use COMPATABILE.
Q: Does number of computational core matter? By controlling the code path, do we need to setting the number of threads to be the same?
A: yes, the number of threads must be set to the same number across a set of systems with different numbers of cores. I will check with Todd at Q&A session if we have time to cover this question.
Q: Does the Intel MKL memory allocation function align memory differently from the compiler?
A: The allocation function in Intel MKL uses the compiler functionality, but will by default align memory to 64-byte boundaries (which supports CNR into the future). Take a look at what we offer in the Intel MKL Reference manual: Memory Functions.
Q:Doesn’t the compiler return aligned memory by default unless you do something to prevent it?
A: Take a look at this article for detail: Data Alignment to Assist Vectorization. It seems to say that it is up to you to use pragmas for memory alignment.
Q: Is CNR also available in Intel® IPP 7.1?
A: No. If this is important for your application, don't hesitate to let them know through Intel Premier Support or their User forum.
More questions coming soon. Check back soon.