The latest Intel Compilers (released after the 13.0.039 Beta Update 1 release) do not generate low-precision sequences unless low-precision options are added explicitly to the compiler options. This article describes methods for improving application performance through the use of low-precision mathematical functions.
The Intel Compilers are able to generate low-precision code sequences for certain operations and intrinsics, such as divide and square root. Why would a user consider LOW PRECISION? Speed and performance: low precision operations can be faster than their more high precision equivalents. The Intel compilers provide a robust set of options to control mathematical precision.
With current compilers, the compiler provides the -fimf* option. Variations of the base -fimf option are shown below. The term "ulp", if you are not familiar with this acronym is Units in the Last Place (binary, 4 ulp implies 2 last bits in the mantissa). The general syntax is:
-fimf-domain-exclusion=<n1> -fimf-accuracy-bits=<n2> -fimf-precision=low -fimf-max-error=<n3_ulps>
Some combinations that make sense:
These options affect code generation for vector as well as scalar code.
For the full list of options and detailed descriptions, please refer to the "Floating-Point Options" section of the Compiler User and Reference Guide (C++ | Fortran). The User and Reference Guide is installed on your system along with the compiler, and available online.
The Intel compilers allow a user to select lower (or higher) precision for mathematical intrinsics. This allows the user to balance the tradeoffs between performance, accuracy, and reproducibility. Options discussed in this section are in the -fimf family of options:
-fimf-precision defines the accuracy (precision) for math library functions
-fimf-accuracy bits defines the relative error for math library function results
-fimf-domain-exclusion set up a bit mask to exclude classes of numeric exceptions. Without having to check for these exception conditions, math functions are allowed to run faster.
These options are part of a much larger discussion of numerics, balancing precision against performance and reproducibility. Please read the -fp-model compiler option for broader exposure to control of accuracy. For an in-depth discussion of this topic, please read the white paper by clicking the following link "Consistency of Floating-Point Results using the Intel Compiler".
It is essential that you read this guide from start to finish using the built-in hyperlinks to guide you along a path to a successful port and tuning of your application(s) on Intel® Xeon Phi™Coprocessor. The paths provided in this guide reflect the steps necessary to get best possible application performance.
Back to the main chapter, Advanced Optimizations for Intel® MIC Architecture
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804