﻿ Vector Math Library (VML) Performance and Accuracy Data

# Intel® Math Kernel Library 11.0 Update 2

## Performance and Accuracy Data

The Vector Math Library (VML) is designed to compute elementary functions on vector arguments. VML is an integral part of the Intel® Math Kernel Library (Intel® MKL) and the VML terminology is used here for simplicity in discussing this group of functions.

VML includes a set of highly optimized implementations of certain computationally expensive core mathematical functions (power, trigonometric, exponential, hyperbolic, etc.) that operate on vectors. VML may improve performance for such applications as nonlinear software, computations of integrals, and many others.

Each vector function from VML (for each data format) can work in three modes: High Accuracy (HA), Low Accuracy (LA), and Enhanced Performance (EP). Most VML functions have different implementation flavors that correspond to each of these three modes. This does not apply to certain functions, for example, those that have correctly rounded results. For many functions, using the LA accuracy mode improves performance compared to HA, however, at the cost of a slight reduction in accuracy (1 or 2 least significant bits may be inaccurate). In contrast to the LA accuracy mode, the EP mode further enhances the performance, at the cost of a significant reduction in accuracy: in both single and double precision, only about half of the significand bits are expected to be correct in the EP mode. Moreover, for EP some argument values (for example, large arguments in trigonometric functions) could lead to calculations with even less accuracy.

Despite the fact that the default accuracy is HA, LA is more than sufficient in most cases. For certain applications that are not very demanding for accuracy (for example, media applications, some Monte Carlo simulations, etc.) you may find the EP accuracy mode to be adequate. You can use the `vmlSetMode` function to control the accuracy mode. Please refer to the Intel® Math Kernel Library Reference Manual for further details.

Accuracy behavior is processor specific, so results might differ slightly across different processor families and even within a processor family, for example, between some processor models of the family, or between 64-bit and 32-bit libraries. Results might also differ slightly from release to release. Nevertheless, these differences are within specified error bounds.

Error and special value behavior is identical for HA and LA functions and does not depend on the processor used to run the software. Correct error and special value behavior is not guaranteed for the EP mode.

Refer to the List of VML Functions for a more detailed description of the performance and accuracy properties of the VML functions.

Note on Performance: Performance numbers in the respective tables are shown for "working" argument intervals. Performance behavior may be different for other intervals. For example, it is quite expensive to compute trigonometric functions accurately for huge arguments. Each function lists the working interval over which performance is measured. The same page contains graphs that show how the performance behavior depends on the vector length. There are two extreme cases: short and long vectors (logarithmic scale is used to show both cases). For short vectors, functions incur certain overheads, which are amortized with an increasing vector length. For vectors longer than a few dozens of elements the performance remains quite flat until the L2 cache size is exceeded due to the length of the vector.

Data prefetching greatly reduces the performance penalty for vectors that do not fit in the cache.

See a comprehensive table with performance data for all the VML functions.

Note on Accuracy: The design requirement for the HA functions is to have error less than 1.0 ulp (unit-in-the-last-place), and to have all special values processed correctly. For the LA functions, the error bound is 4.0 ulps. For the EP functions, approximately half of the bits in the significand of the floating-point result need to be correct. For details, see the accuracy table with ulp errors for all the functions. Any deviations from these error bounds are highlighted in the accuracy tables, and should be considered temporary.

For complex functions, the ulp error is the maximum of the two ulp errors calculated for the real and the imaginary parts of the result.

Special Value Processing: Special values are processed in conformance with the C9X standard. See the information for the special value behavior of every function in the Intel® Math Kernel Library Reference Manual.

