# Getting reproducible results with Intel® MKL

Published:12/20/2011   Last Updated:12/20/2011

Introduction

The Intel® Math Kernel Library (Intel® MKL) contains highly optimized, extensively threaded math routines for science, engineering, and financial applications that require maximum performance. While performance is the chief reason for the existence of Intel MKL, users count on Intel MKL to employ the best practices available to provide accurate results. In recent years, a growing number of Intel MKL users have noted a number of factors that can affect the numerical results of the library. This article discusses the underlying reason for these variations in output, the mechanisms that cause the variations in Intel MKL, and some ways to improve chances for consistency in certain cases.

The sources of indeterminism

Most users of floating point math libraries expect some rounding error when doing multiple calculations. A simple experiment shows that the rounding error can be different if the order of operations changes. If we assume for a moment double precision floating point numbers, then:

260 + (-260 + 1) = 260 + -260 = 0

since -260 + 1 rounds to -260. Yet

(260 + -260) + 1 = 0 + 1 = 1

With this simple example in mind we’ll discuss three different cases where methods used in Intel MKL that are crucial to performance cause indeterminate results.

Memory alignment: One of the ways Intel MKL gets good performance is through use of new instructions made available with successive generations of Intel® processors. Some of these instructions make computation more efficient by performing the same floating point operation on multiple floating point numbers at once. The way some of these instructions are loaded however depends on how the data is situated in memory. If in one run of the program, the data happens to be aligned along a 16-byte boundary, then the first 2 double precision numbers in the array would be grouped together, while in the next run, if the array is offset from that memory boundary, then the 2nd and 3rd double precision numbers are grouped together.  This difference in order can cause different results when running the same program two times consecutively with all settings remaining identical. Update: The latest processors supporting Intel Advanced Vector Extensions (AVX) have larger registers and thus change the requirement to alignment along 32-byte boundaries.

Internal Intel MKL code paths: To get the best performance wherever a program is run, Intel MKL will check on the processor type at runtime and can dispatch processor-specific code accordingly. If a particular instruction set or cache of a certain size is available a specialized code path may exploit it. These code paths are different enough that they will again cause a different order of operations and will cause slightly different results on different processors.