Advanced Optimizations for Intel® MIC Architecture, Low Precision Optimizations
The latest Intel Compilers (released after the 13.0.039 Beta Update 1 release) do not generate low-precision sequences unless low-precision options are added explicitly to the compiler options. This article describes methods for improving application performance through the use of low-precision mathematical functions.
The Intel Compilers are able to generate low-precision code sequences for certain operations and intrinsics, such as divide and square root. Why would a user consider LOW PRECISION? Speed and performance: low precision operations can be faster than their more high precision equivalents. The Intel compilers provide a robust set of options to control mathematical precision.
With current compilers, the compiler provides the -fimf* option. Variations of the base -fimf option are shown below. The term "ulp", if you are not familiar with this acronym is Units in the Last Place (binary, 4 ulp implies 2 last bits in the mantissa). The general syntax is:
-fimf-domain-exclusion=<n1> -fimf-accuracy-bits=<n2> -fimf-precision=low -fimf-max-error=<n3_ulps>
Some combinations that make sense:
- -fimf-precision=low -fimf-domain-exclusion=15 (gives lowest precision sequences available for both SP/DP)
- -fimf-domain-exclusion=15 -fimf-accuracy-bits=22 (low precision compared to default for DP)
- -fimf-domain-exclusion=15 -fimf-accuracy-bits=11 (even lower precision for DP, low precision compared to default for SP)
- -fimf-max-error=2048 -fimf-domain-exclusion=15 (gives lower accuracy than default max-error of 4 ulps, but higher accuracy than a above)
- -fp-model fast=2 (Compiler default is -fp-model fast=1, specifying fast=2 is equivalent to adding the option -fimf-domain-exclusion=15 to the default)
- -fp-model-precise –no-prec-div –no-prec-sqrt –fast-transcendentals –fimf-precision=high (to get vectorized, high precision versions of division, square root and transcendental functions from libsvml)
These options affect code generation for vector as well as scalar code.
For the full list of options and detailed descriptions, please refer to the "Floating-Point Options" section of the Compiler User and Reference Guide (C++ | Fortran). The User and Reference Guide is installed on your system along with the compiler, and available online.
The Intel compilers allow a user to select lower (or higher) precision for mathematical intrinsics. This allows the user to balance the tradeoffs between performance, accuracy, and reproducibility. Options discussed in this section are in the -fimf family of options:
-fimf-precision defines the accuracy (precision) for math library functions
-fimf-accuracy bits defines the relative error for math library function results
-fimf-domain-exclusion set up a bit mask to exclude classes of numeric exceptions. Without having to check for these exception conditions, math functions are allowed to run faster.
These options are part of a much larger discussion of numerics, balancing precision against performance and reproducibility. Please read the -fp-model compiler option for broader exposure to control of accuracy. For an in-depth discussion of this topic, please read the white paper by clicking the following link "Consistency of Floating-Point Results using the Intel Compiler".
It is essential that you read this guide from start to finish using the built-in hyperlinks to guide you along a path to a successful port and tuning of your application(s) on Intel® Xeon Phi™Coprocessor. The paths provided in this guide reflect the steps necessary to get best possible application performance.
Back to the main chapter, Advanced Optimizations for Intel® MIC Architecture