floating point on IA32 versus MIPS

floating point on IA32 versus MIPS

Hello,

I have recently switched from running programs on an SGI Origin system with a MIPS R10000 processor and MIPSpro compiler to running on an Intel IA32-Xeon system using the Intel Fortran compiler. While results of my simulations are similar, some divergence does occur. I suspect that this divergence is due to the details of the floating point operations. I have tried syncing the two by forcing them to meet IEE-754 standards (i.e. -mp, -mp1 options) but I have had limited success. I now suspect that it might the divergence might be due to differences between the architectures that the compiler cannot easily compensate for (i.e. details of the registers etc.) I was wondering if anyone out there has had some experience with this and might be able to point me in the right direction?

best regards,
Charlie Stock

4 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

About the closest you can come is to use options such as
-xW -mp1
and perhaps put
!dir$ no vector
directives on sum reductions.
-mp might increase the differences, if it causes more use of extra precision x87 code.
It has been nearly 4 years since I've use the MIPSPro compiler, so I don't know the exact option, but it also had an option of similar nature to -mp1.

Thank you for your advice tcprince, sorry for the delayed reply.

I have actually found that the greatest improvement in agreement that I could get was by specifying
-pc64 with the Intel compiler while running the MIPSpro compiler with settings designed to meet or
beat IEEE 754. While the MIPSpro compiler does have options which set IEEE 754 as a minimum standard (i.e. -IEEE_arithmetic=1, -OPT:roundoff=0) it does not have an option which forces the operations to precisely meet IEEE 754 standards (such as with -mp or -mp1 with the Intel compiler). I suspect that the differences between simulations on the two machines are due to precision above IEEE 754 which I cannot remove using the MIPSpro compiler.

I suspect that the reason for the closer convergence with the -pc64 specification stems from the fact that the R10000 chips have 64 bit floating point registers. The floating point registers on the Xeon chip I am using are 80 bit, and I believe that -pc64 is knocking them down to 64. Any further thoughts would be appreciated (especially if you think I'm barking up the wrong tree here) - and thank you again for your help.

For x87 code, -pc64 rounds intermediate results to double precision, when they must be spilled to registers. If you wished to go all the way, you could set the x87 control word to 53-bit precision. CVF had a library function and masks defined for this purpose, but it usually has to be done with in-line asm in a C function. On Windows, the default is 53-bit precision, so this move is sometimes required to obtain identical results on Windows and linux. There is an example on the SuSE site, in Andreas Jaeger's discussions of how to run SPECfp.

x87 extended precision usually is more accurate, but I can understand your desire to verify the possibility to eliminate the differences. This is why I suggested -xW, which eliminates extra precision, except where the compiler chooses x87 code. In my experience, -mp produces more x87 code than -mp1, so it may not accomplish your goal.

Both the Intel and the MipsPro compilers optimize sum reduction by batching, and they will choose the batches differently. The Intel compiler reports vectorization when it does this. Batching the sums usually, but not always, improves accuracy slightly. x87 extended precision almost certainly improves accuracy of sum reduction, sometimes by 3 digits.

Leave a Comment

Please sign in to add a comment. Not a member? Join today