P-4 precision problem with version 8 Intel compilers on LINUX

P-4 precision problem with version 8 Intel compilers on LINUX


I have run into precision problems with the new INTEL compilers for
LINUX, both for FORTRAN (ifort) and C (icc). I am running Redhat 8
on a Dell Pentium IV, and am using the INTEL compilers version 8.

It seems that using the processor-optimized compilation flags
(and thus activating the "vectorizer") affects the outcome
of computations. I have looked around for documentation on this
behavior and couldn't find anything. Any help and suggestions where to look
or how to solve this problem are appreciated.

Here's an example program (analog issues arise with C code):

program main
double precision a
integer i,j
do i=0,100000
do j=0,100000
a = a * 1.00000001d0
print *,a

(Don't ask why one would run such code.) Here's what happens
with target architecture flags set:

> ifort -tpp7 -xW -O3 tmp.f ; time a.out
tmp.f(6) : (col. 6) remark: LOOP WAS VECTORIZED.
1.845u 0.007s 0:01.85 99.4% 0+0k 0+0io 171pf+0w

Here's the output without the P-IV flags:

> ifort -O3 tmp.f ; time a.out
30.611u 0.060s 0:30.87 99.3% 0+0k 0+0io 172pf+0w

I.e., it took much longer to run the test program, and the answers are
different. The second output appears to be 'correct', as judged from
comparing the run with a gcc compiled program.

I have experimented with some of the flags like -mp for maintaining
precision, same outcome. Apparently only removing the -tpp7 and -xW flags helps
to make results consistent. Obviously, it would be nice to make use of the
speed improvement without loosing precision.

Thanks for your help in advance


4 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Ah, this is fun. When you don't vectorize, the computations are done in the standard x87 registers which are extended precision and range. This means that additional low order bits are carried around in the computation of "a" and this affects the result.

When you compile with -xW, the vectorization changes to use the SSE2 instructions and registers, which are NOT extended precision and range. Thus the computations are rounded to standard double precision and the extra low fraction bits are not carried around.

gcc does not vectorize, so it will use the x87 method. Also, if you ran this on a non-x86 processor (such as a Sun SPARC), you'd see the same result as the Intel compiler gets with SSE2.

The -xW results are more consistent (and faster), but you do lose the extra intermediate precision that can be visible in strange tests such as this one.

Retired 12/31/2016

It was pointed out to me that there may be another problem at work here, unrelated to precision. I'll play with this some more if I get the time.

Retired 12/31/2016

There seems to be an actual bug at work here. I see Tim Prince responded to your post in comp.lang.fortran about it, and I think he identified the problem, in that the compiler is combining the loops and creating one monster loop whose iteration count exceeds a 32-bit integer. This is an unsafe optimization.

Retired 12/31/2016

Leave a Comment

Please sign in to add a comment. Not a member? Join today