About FPU/SSE determinism

About FPU/SSE determinism

rutenis's picture


I have a question about determinism in FPU and SSE computations. Suppose thatI have a mixed sequence ofboth FPUand SSE instructions. Then I run it on different types of Intel microprocessors keeping the same starting conditions (general registers, FPU registers, memory etc). Will the final results be bit-by-bit equalon all processors? Are there any rules to predict and avoid undetermined results of FPU or SSE computations?

Thank you.

5 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.
Intel Software Network Support's picture



Greetings fromIntel Software NetworkSupport.

We'll run this by our Application Engineering team and let you know how they respond.

Regards,


Lexi S.


IntelSoftware NetworkSupport


http://www.intel.com/software


Contact us


Message Edited by intel.software.network.support on 11-30-2005 04:06 PM

Intel Software Network Support's picture


Here is the response we received from our engineering team:

Assuming that you input the exact same set of instructions in the exact same sequence (hand-written or compiler generated), the results should be the same or very similar depending on the instruction mix. See below for more details. If you are changing the instructions from one run to the next (substituting SSE for x87 or vice versa), then some small differences will result depending on the accuracy settings (same precision control and rounding modes, flush to zero, etc.) of the floating point unit and how many temporary values are held in the floating point registers (with 80-bit accuracy) versus the XMM registers (maximum 64-bit accuracy). It is possible to force the floating point unit to use standard accuracy (32-bit or 64-bit accuracy) instead of the default 80-bit accuracy.



For all ofthe *instructions* that are IEEE operations (*,+,-,/,sqrt, compares, regardless of whether they are SSE or x87), theywill produce the same results across platforms with the same control settings (same precision control and rounding modes, flush to zero, etc.) and inputs. This is true for both 32-bit and 64-bit processors.



On the x87 side, the transcendental instructions like, fsin, fcos, etc. could produce slightly different answers across implementations.


They are specified with a relative error that is guaranteed, but not bit-for-bit accuracy.


We wanted wiggle room for improvement. [For example, on the Itanium processor family's x87 functions we do a much more accurate argument reduction because we had the fused mul-add available.]



On the SSE side, the reciprocal instructions are also specified with a relative error g
uarantee, and different implementations could produce different bit patterns for their results. But they will all have the minimum guaranteed accuracy.



Finally,the compiler can also change some of the results, depending on the accuracy mode that is specified in the compiler flags. For example we found recently that on icc for IA-32 and Intel EM64T, with aggressive optimizations selected, the compiler was not using the ieee sqrt instruction from sse or x87, but was substituting an inaccurate faster approximation.


==



Regards,




Lexi S.


IntelSoftware NetworkSupport


http://www.intel.com/software


Contact us



Message Edited by intel.software.network.support on 11-30-2005 04:07 PM

Tim Prince's picture

That last bit about the square root approximations would worry me if I read about it out of context. When the compiler employs reciprocal square root approximation, it follows up with an iteration to bring the result up to within a bit or 2 of full accuracy. For more than a year, the Intel compiler option has been available to require use of IEEE accurate square root: -Qprec (Windows) or -mp1 (linux).
This option also prevents the compiler from doing some optimizations on repeated division which could produce results in disagreement with IEEE standard P754 and successors, where there could also be a small difference between SSE and x87.
There are additional compiler flags which can be used to minimize numerical differences between SSE and x87 code. I have no idea whether that was in the original question.
Generally speaking, it seems undesirable to mix x87 and SSE code, and current compilers avoid doing that, except for the Banias/Dothan option -QxB.

rutenis's picture


Thank you very much! It's very useful advice and we'll follow it.

Login to leave a comment.