Optimiization settings changing results

Optimiization settings changing results

I probably should have noticed this long ago, but I've always used the same optimization setting (maximize speed). Just now I tried turning off vectorization (/Qvec-) and saw that my program generates different results. I then ran through a few different optimization settings, focussing on a single number generated by the program. These are the values I got:

/Od /O1 /O2 /O3
2365 727 727 727 /Qvec-
2258 2258 /Qvec

Without vectorization, any optimization produces a change over no optimization. With vectorization, the results for /O2 and /O3 are the same but different from those with no vectorization.

I'm using 11.0.072. Have I just discovered the reason why this version is no longer available, and why Wendy Doerner was helping me obtain 11.0.075 (which I still haven't managed to obtain)? If so, my interest in getting this fixed version has just jumped by an order of magnitude.

13 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

You left out one important number: the correct one. Do you know what that is, and is any one of the three results that you listed equal to it? If so, which one?

Results that differ with level of optimization (or choice of compiler) may be caused by errors in the program, rather than bugs in the compiler.

Bugs in the optimizer are known to exist, and there have been a few reported over the past few months, even with releases more recent than 11.0.075. In fact, there are one or two optimizer bugs in the current versions of the compiler, waiting to be analyzed and removed. Often, for a not-yet-fixed optimizer bug, there is a known work-around; it may be worth your while to try such a work-around.

If you can post a pared-down version of your program, it may be possible to narrow down the causes of the error.

The "correct" number in this case is a bit ill-defined, for reasons I'll explain. My purpose in attaching numbers to the different optimization cases was really to show that some runs gave identical results, and that there is variation depending on the optimization flags. The program is a rather complex Monte Carlo simulator, in which a large number of interactions are simulated over many time steps - this makes it difficult to provide a pared-down version. In the course of a simulation there are many millions of decision branches, based on pseudo random numbers, and when a different path is taken at any one of these branches the simulation results are likely to be different in any detail (e.g. the number I reported). The interesting question is what can make the program execution take a different path. There are several points in the code where a vector of probabilities is computed, say p(:), and then a choice is made among the N possibilities corresponding to these probabilities, on the basis of a random variate R. The decision is made in code like this:

real :: R, p(N)
R = uniform_rng()
psum = 0
do i = 1,N
psum = psum + p(i)
if (R <= psum) then
if (i > N) i = N

I am thinking that the comparison (R <= psum) could have different results when R is very close to psum, depending on some memory or register states that influence the floating point bits that are "down in the noise". These bits might be different when different optimization settings are in place. I should say that because this is a Monte Carlo program, the injection of an additional (extremely rare) source of randomness is not of concern. In any case I have to ensure that the results of interest are reliable estimates, unaffected by the random number seed, or else perform multiple runs with different seeds then extract statistics. In other words, if what I've just suggested is a feasible explanation for the differences I'm seeing, then my concern about a possible optimizer bug is probably unfounded.

Currently I am using 4-byte reals, and I've just thought that it might be interesting to experiment with double precision in this part of the code, to see if it reduces the probability of these variations. It seems to me that it should.

If your various runs were all started with the same RNG seed, and the result being compared is that for a relatively stable statistical average, rather than an instantaneous state variable, I'd expect the results to be unaffected by the optimizations in effect.

There is one other source of differences: the kind of FPU instructions chosen may different with different options. In particular, if results from x87 calculations are compared with those from, say, SSE2, modest differences can be expected.

It is also worthwhile to compare to results from other compilers, e.g., Gfortran.

The number that I reported has no statistical significance, since it is generated by a very small number of occurrences. I was using it simply as an indicator of whether or not the code execution was identical. I'm now using a more direct indicator - at the end of the run I generate a random number and display it. If two runs display the same RN then it's clear that they took the same execution path. I see that /Od and /O1 are the same, and different from /O3 and /Oy, which are the same.

The modest differences between X87 and SSE2, say, would certainly make it possible that (R <= psum) would occasionally generate different results. I was surprised to find that the test after the loop { if (i > N) i = n } was needed, even though the probs p(:) sum to 1. My code makes a pretty thorough exploration of the space of the RNG.

I have been doing some testing with gfortran, but haven't done a careful comparison of the results yet. (I got distracted yesterday into comparing the speed of the two compilers. To my surprise I found that on this code gfortran is just as fast as ifort.) I'll report back when I have some results.

Comparing ifort and gfortran for a short test run, looking only at the number of times the RNG is called, or, to be more accurate, at the value of the last RN returned, gives interesting results. Here are the actual RN values (of course the values themselves have no significance, just the occurrence of the same value):

Flag ifort gfortran
Od (O0) .430984 .143728
O1 .430984 .512643
O2 .512643 .512643
O3 .512643 .512643

This is very reassuring, since it shows that any optimization in gfortran yields exactly the same path as O2 and O3 in ifort. (BTW these runs were made using real(8) for that code fragment I posted, but it made no difference.) I'm still curious about differences that do exist. They may well arise in the computation of the probabilities p(:), and your suggestion about X86 vs. SSE2 etc. could have the answer.

For a given compiler, optimization setting and RNG seed the results are always the same, of course.

I now see that differences between ifort and gfortran are manifested when the length of the simulation is increased. Nevertheless, the test results above do indicate that the compilers are generating results that differ only extremely rarely. I will now carry out tests looking at the model output that I expect to be statistically meaningful.

I don't know enough about your problem to make any strong conclusions, but I find the remaining discrepancies disquieting enough that, in your place, I'd feel an urge to find explanations.

I feel that urge too. I'm tracking down the source(s) of divergence between the ifort- and gfortran-built versions of the program.

The question as to which, if any,is the "right" result is very much to the point.

Floating-point calculations are approximate; results can vary very slightly, within the expected uncertainty, due to differences in optimization level, compiler version, processor type, and other causes. In very complex calculations, these very tiny variations can be greatly amplified to produce large variations in the final result. See the article at http://software.intel.com/en-us/articles/consistency-of-floating-point-results-using-the-intel-compiler/.
For the Intel compiler, if you want to compare results between different optimization levels, you should build with /fp:precise. This doesn't necessarily give you the "right" (in the sense of exact) result, if you are experiencing the effects of accumulated rounding errors as above, but it disables certain optimizations thatmay contribute to these variations, so makes the result more reproducible. I expect gfortran has a similar option. These options wouldn't necessarily make the different compilers agree, though.

And if you are suffering from such rounding errors, going to double precision may help a lot, and also with consistency between Intel and other compilers.

Yes, program errors (such as uninitialized variables) can sometimes cause variations in results , also.
The observation that the result varies with optimization level for both Intel Fortran and gfortran makes a compiler bug seem very unlikely. Hard to believe that both would have the same bug, and less likely still that one of them would be without optimization.

For those risky optimizations which ifort and gfortran share, gfortran invokes them only with -ffast-math. As Martyn pointed out, ifort -fp:source (or :precise) disables such options.
If you are using 32-bit gfortran with (default) x87 code, the nearest equivalent option is ifort -arch:ia32. The primary relevant effect of these is the promotion of scalar single precision expression evaluation to double. Vectorization (which is disabled in ifort -arch:ia32 and 32-bit gfortran without a -march= setting) removes the implicit promotion to double precision, in both ifort and gfortran.
If your application depends on the implicit promotion of single to double precision, it should be written in so that it occurs regardless of compiler flags. This includes the use of constants which require declaration as double precision.
The 64-bit versions of all compilers default to SSE/SSE2, with no implicit promotion to double, a fact which should help reinforce the argument for making required promotions explicit.
Both ifort and gfortran have options to help in discovery of uninitialized variables. ifort /Qdiag-enable:sc is the most aggressive of these, but also produces false indications.

Thanks Martyn and Tim for very helpful comments.

I spent quite some time on tracking down the point of divergence between simulations with ifort- and gfortran-compiled code, for a simple test case. I was not surprised to discover where the difference occurred. As I mentioned, this is a Monte Carlo simulation (actually simulating, among other things, the random-walk motion on a 3-D lattice of a large number - like 10^5 - of agents, which also interact), and there are many millions of decision points where a discrete selection is made on the basis of a set of probabilities and a generated random value. The point is that one different decision will affect the whole subsequent course of the simulation, quantitatively but not qualitatively. I located the divergence at a point where a test was being made (R < p), and in gfortran, say, the random value R exceeded p by a very small amount, like 10^-7, while in the ifort case it fell on the other side of p.

Both programs are using the same RNG, so are identical at the 32-bit integer level, but the integer is translated into a double precision real, and in my subroutine using R I was treating R and p as real (i.e. single precision). The simple expedient of making R and all probabilities real(8) in every instance made the two programs generate identical results for that test case. The same was true at different optimization levels.

Even with double precision there remained the possibility of slightly different R values being generated occasionally by the two programs (without taking further measures to ensure conformity, along the lines you have suggested), and indeed if I run a simulation for long enough I do eventually see a divergence. I'm no longer concerned about any problems with the optimizer on my code. I may complete this process by taking Tim's advice and trying to fix on compiler settings that guarantee precise agreement between gfortran and ifort at all times, as an aid to checking and debugging on multiple platforms.

Glad that worked out. Be aware, though, that there aren't any settings that guarantee identical results between ifort and gfortran. Even without any optimization, different instructions may sometimes be used, and math libraries, (for which there is no official standard), may return different results for functions such as log() etc.

I understand. There may be a way to ensure conformity of a stripped down version of the model, simulating motility only. The probabilities are computed by very simple arithmetic, and it may be possible to make them identically the same between the two compilers. I'm also wondering about the idea of working with integers in that area of the code, since the RNG does create identical integer values. I'd need a way to compute the probabilities that will allow them to be mapped into integer values that are compiler-independent. It seems to me that it should be possible, but I haven't given it any serious thought.

Leave a Comment

Please sign in to add a comment. Not a member? Join today