Overview: Intel® IEEE 754-2008 Binary Floating-Point Conformance Library
- General-computational operations that produce correctly rounded floating-point or integer results. These operations might signal the floating-point exceptions.
- Quiet-computational operations that produce floating-point results. These operations do not signal any floating-point exceptions.
- Signaling-computational operations that produce no floating-point results. These operations might signal floating-point exceptions.
- Non-computational operations that produce no floating-point results. These operations do not signal floating-point exceptions.
Produce no result
Might signal FP exception
Do not signal FP exception
- Homogenous general-computational operations whose floating-point operands and floating-point result are in the same format.
- formatOfgeneral-computational operations whose floating-point operands and floating-point result have different formats.The IEEE 754-2008 standard requires that allformatOfgeneral-computational operations be computed without any loss of precision before converting to the destination format. This may differ from how these operations are implemented on most hardware and software.For example, when all operands are in binary64 format and the destination format is binary32, most hardware and software implementations would first compute an intermediate result rounded in binary64 and then convert the intermediate result to binary32. This double rounding procedure may produce a result different from what is defined in the standard under certain rounding mode. For example:x = 0x3ff0000010000000 = 1.000000000000000000000001_2,y = 0x3ca0000000000000 = 1.0_2*2^(-53) x+y = 1.00000000000000000000000100000000000000000000000000001_2When the rounding-direction attribute is set toroundTiesToEven, using double rounding procedure, the addition result rounds to1.000000000000000000000001_2 (0x3ff0000010000000)in binary64, which would then round to1 (0x3f800000)in binary32. On the other hand, according to the standard, the addition result should round to1.00000000000000000000001_2 (0x3f800001)in binary32.
IEEE 754-2008 binary32 interchange format
IEEE 754-2008 binary64 interchange format
Integer operand formats
int, unsigned int, long long int, unsigned long long int
Signed 32-bit integer
Unsigned 32-bit integer
Signed 64-bit integer
long long int
Unsigned 64-bit integer
unsigned long long int
Boolean value represented by generic integer type
Enumerated values of floating-point class
Enumerated values of floating-point radix
Type for the destination of the
logBoperation and the scale exponent operand of the
Decimal character sequence
Hexadecimal-significand character sequence
Set of exceptions as a set of booleans
Set of status flags
Rounding direction for binary
No explicit operand or result