﻿ Vector Math Library (VML) Performance and Accuracy Data

# Intel® Math Kernel Library 11.0 Update 2

## Performance and Accuracy Data

The Vector Math Library (VML) is designed to compute elementary functions on vector arguments. VML is an integral part of the Intel® Math Kernel Library (Intel® MKL) and the VML terminology is used here for simplicity in discussing this group of functions.

VML includes a set of highly optimized implementations of certain computationally expensive core mathematical functions (power, trigonometric, exponential, hyperbolic, etc.) that operate on vectors. VML may improve performance for such applications as nonlinear software, computations of integrals, and many others.

Each vector function from VML (for each data format) can work in three modes: High Accuracy (HA), Low Accuracy (LA), and Enhanced Performance (EP). Most VML functions have different implementation flavors that correspond to each of these three modes. This does not apply to certain functions, for example, those that have correctly rounded results. For many functions, using the LA accuracy mode improves performance compared to HA, however, at the cost of a slight reduction in accuracy (1 or 2 least significant bits may be inaccurate). In contrast to the LA accuracy mode, the EP mode further enhances the performance, at the cost of a significant reduction in accuracy: in both single and double precision, only about half of the significand bits are expected to be correct in the EP mode. Moreover, for EP some argument values (for example, large arguments in trigonometric functions) could lead to calculations with even less accuracy.

Despite the fact that the default accuracy is HA, LA is more than sufficient in most cases. For certain applications that are not very demanding for accuracy (for example, media applications, some Monte Carlo simulations, etc.) you may find the EP accuracy mode to be adequate. You can use the `vmlSetMode` function to control the accuracy mode. Please refer to the Intel® Math Kernel Library Reference Manual for further details.

Accuracy behavior is processor specific, so results might differ slightly across different processor families and even within a processor family, for example, between some processor models of the family, or between 64-bit and 32-bit libraries. Results might also differ slightly from release to release. Nevertheless, these differences are within specified error bounds.

Error and special value behavior is identical for HA and LA functions and does not depend on the processor used to run the software. Correct error and special value behavior is not guaranteed for the EP mode.

Refer to the List of VML Functions for a more detailed description of the performance and accuracy properties of the VML functions.

Note on Performance: Performance numbers in the respective tables are shown for "working" argument intervals. Performance behavior may be different for other intervals. For example, it is quite expensive to compute trigonometric functions accurately for huge arguments. Each function lists the working interval over which performance is measured. The same page contains graphs that show how the performance behavior depends on the vector length. There are two extreme cases: short and long vectors (logarithmic scale is used to show both cases). For short vectors, functions incur certain overheads, which are amortized with an increasing vector length. For vectors longer than a few dozens of elements the performance remains quite flat until the L2 cache size is exceeded due to the length of the vector.

Data prefetching greatly reduces the performance penalty for vectors that do not fit in the cache.

See a comprehensive table with performance data for all the VML functions.

Note on Accuracy: The design requirement for the HA functions is to have error less than 1.0 ulp (unit-in-the-last-place), and to have all special values processed correctly. For the LA functions, the error bound is 4.0 ulps. For the EP functions, approximately half of the bits in the significand of the floating-point result need to be correct. For details, see the accuracy table with ulp errors for all the functions. Any deviations from these error bounds are highlighted in the accuracy tables, and should be considered temporary.

For complex functions, the ulp error is the maximum of the two ulp errors calculated for the real and the imaginary parts of the result.

Special Value Processing: Special values are processed in conformance with the C9X standard. See the information for the special value behavior of every function in the Intel® Math Kernel Library Reference Manual.

## Legal Information

INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.

A "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death. SHOULD YOU PURCHASE OR USE INTEL'S PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS' FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS.

Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined". Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information.

The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.

Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order.

Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or go to: http://www.intel.com/design/literature.htm

Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each processor family, not across different processor families. Go to: http://www.intel.com/products/processor_number for details.

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information on performance tests and on the performance of Intel products, go to: http://www.intel.com/performance/resources/benchmark_limitations.htm

BlueMoon, BunnyPeople, Celeron, Celeron Inside, Centrino, Centrino Inside, Cilk, Core Inside, E-GOLD, Flexpipe, i960, Intel, the Intel logo, Intel AppUp, Intel Atom, Intel Atom Inside, Intel Core, Intel Inside, Intel Insider, the Intel Inside logo, Intel NetBurst, Intel NetMerge, Intel NetStructure, Intel SingleDriver, Intel SpeedStep, Intel Sponsors of Tomorrow., the Intel Sponsors of Tomorrow. logo, Intel StrataFlash, Intel vPro, Intel Xeon Phi, Intel XScale, InTru, the InTru logo, the InTru Inside logo, InTru soundmark, Itanium, Itanium Inside, MCS, MMX, Moblin, Pentium, Pentium Inside, Puma, skoool, the skoool logo, SMARTi, Sound Mark, Stay With It, The Creators Project, The Journey Inside, Thunderbolt, Ultrabook, vPro Inside, VTune, Xeon, Xeon Inside, X-GOLD, XMM, X-PMU and XPOSYS are trademarks of Intel Corporation in the U.S. and/or other countries.

* Other names and brands may be claimed as the property of others.

Microsoft, Windows, Visual Studio, Visual C++, and the Windows logo are trademarks, or registered trademarks of Microsoft Corporation in the United States and/or other countries.

Optimization Notice

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804