Tips for finding strange unexpected behavior . . .

Tips for finding strange unexpected behavior . . .

Hello all,

First time trying icc and I find (at 4AM no less) that my program produces wildly incorrect results when compiled with icc (10.1.008 -xP -fast -static) as compared to working properly with gcc (v3.4.6 -O3 -mfpmath=sse,387 -march=nocona) and MSVC++ running on Conroe/Clovertown series processors. Perhaps the more experienced folks on here can suggest what sort of things I should focus on when I starting hunting for this bug.

I understand that there are tons of places where a subtle difference could break a program and I'm certainly not asking for an exhaustive list or anything extremely detailed - just off the top of your heads anything at all you think will be helpful or has caused similarly troubles in the past.

Thanks in advance,
Oren

16 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

I am also facing the same, that is the only reason why I joined this forums :)

You don't even mention having tried the usual things:
static verifier
-O1
-fp:precise
-check

Then, you could build a set of object files with each compiler, and link combinations to isolate which one is buggy.

Tim,

Thanks for the tips. Sorry if it's rude by I actually posted here before I did /anything/ to try to diagnose the problem -- my intention was to get a bit of guidance (as I am new at this) before I started off . The 2 minutes that you spend writing your message probably saved me hours.

Thanks again,

Oren

Quick update: adding -fp-stack-check causes a seg-fault ... investigating some cases where overflow/underflow could cause a divide by 0.

If I understand you correctly your program exhibits numerical instability.

Solution for that is to check whether intermediate results are calculated using adequate precision. Intel compiler defaults to certain potentially unstable numeric optimizations which you can rule out by using /fp:extended or /fp:strict switches.

I am not sure why you would like to enable stack checking. It shouldn't have anything to do with it.

Furthermore, are you linking to some library compiled with another compiler? That may also cause problems sometimes.

That was off the top of my head.

General Update: The program works with -O0 but not with -O1.

I am going to try making various versions with a particular source-file compiled with -O1 and the rest compiled with -O0 (anyone have a good script that will do this automatically?).

Igor,

The program actually does not require any significant amount of numerical precision - it is a brownian motion simulator and so any rounding error will likely be significantly less than the "thermal noise". Nevertheless, I will try /fp:extended and fp:strict and reply with an update.

There are no external libraries except . I use SFMT* but it is compiled alongside the program (total compile time is ~ 2 minutes even with -fast so I haven't really bothered with precompiling it).

Thanks for your input!

Oren

* http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/SFMT/index.html

PS. Stack checking actually helped me fix a very minor bug in my code in which I calculated some things I didn't really need to.

Actually, I realize that my statement above about floating point precision is incorrect. The vast majority of the code is not precision-sensetive at all (the last 6 digits have no significance at all) but there are some portions that should not be optimized.

Specifically, I do quaternion rotations on unit-vectors that (theoretically) preserves length so I never check to see that they are indeed unit length. This is probably stupid but I tested it with gcc and, while the length of the 'unt' vectors oscillated, they certainly didn't drift. The question is, should I enforce normality on the vectors everytime they are rotated (cost of about 10 double-precision ops) or should I compile that part of the code with -O0 and the rest with -O3??

More testing!

Could you show us that particular piece of code which is numerically unstable? Perhaps someone here could help. It would also be nice if we could compile it (i.e. should not have external dependencies) so we can check the code that the compiler generates.

>>The question is, should I enforce normality on the vectors everytime they are rotated (cost of about 10 double-precision ops) or should I compile that part of the code with -O0 and the rest with -O3??

If your rotations are all performed in one subroutine then for diagnostic purposes you could insert conditional compiled code that normalizes the supposed unit vector. I would imagine that if your were to look a the computational requirements of your code that one of the principal requriements is that a unit vector is indeed a unit vector.

In any event, running a test with the normalization included is relatively easy to do. If that proves to the the source of the problem then you can look at reworking the code such that the rotation and normalization are coordinated with terms consolidated. (or, if possible,simply postpone normalization until after rotation).

Jim Dempsey

If you are depending on implicit double evaluation, you should make it explicit with casts or definitions (if you need long double, set the long-double compile option). You shouldn't get much change in numerics from -O0 to -O2, provided you set -fp:precise (and -fp:double, if you want to continue depending on implicit promotion, like K&R). Only recently did you make any hint that you might be using float data types and requiring double evaluation.
I'm having difficulty following the thread; it seems several unrelated topics have been introduced.

First off, thanks a lot to everyone for the great advice.

Jim, as it is, I never normalize the unit-vectors at all - they start off normal and every operation theoretically ought to keep them normalized (as I said previously, jitter is OK, drift is not). I am currently testing a version that explictly renormalizes them everytime they are modified - - I will tell you how it goes shortly.

Tangentially related (or maybe not at all), my code spends 80% of its time in a particular function that is completely insensetive to the normalization of the vectors, so long as their values have not drifted too far. It would therefore be quite a waste if I had to adopt a stricter floating point model just to keep the rotation part (Quaternion operator* doesn't even show up on my profiler).

Igor, it would be quite difficult to post code because there are a lot of user-defined classes that would need to go with it.

Tim - everything is explictly double! There are absolutely no variables of type 'float' in my code. I am currently trying various combinations of -O2, -fp:precise and explicit normalization of unit-quaternions. I will report back shortly.

Thanks again,

Oren

edit: Sorry for not keeping this thread very well organized - a lot of different issues have come up!

Update: Setting '-fp-model precise' OR normalizing explicitly fixes the problem (at least in the preliminary test runs).

The normalize function is not terribly expensive, so I'll probably use '-xP -fast' along with normalization to get the maximum speed out of it.

Oren

PS. Tim might yell at me for another digression but what the heck:

Assume the following:
int count = 100-1000 (somewhere in the range)
vector stuff
stuff.size() = count;

Which of the following is faster?
stuff.assign(count,0.0);
OR
for(int i =0; i <= stuff.size(); i++)
stuff[i]=0.0;

Further update, '-xP -fast' + normalization works properly and is clearly the fastest compiler for this code. It runs a test simulation in 2:30 that gcc takes 5:32 !

I will certainly report back to my boss that icc is a cost-effective move for saving us serious time.

Thanks to everyone that helped!

Oren

edit: Damnit! When running a different (longer and more involved) simulation, I get the same sort of unexpected numerics. -fp-model precise fixes the problem but at a high cost! More to come . .

edit2: Does icc have an IsNan(double) ?

You can over-ride some of the effects of -fp-model precise:
-fp-model precise -no-prec-div -no-prec-sqrt -ftz
and you should be able to use
-xP -O3 -ipo -static
followed by these options.
In my experience, the most likely points for slow-down with -fp-model precise are non-vectorization of sum reductions and math functions. You would see those differences in your screen echo when compiling.
The most common failures which would be avoided by -fp-model precise are violation of parentheses and left-to-right evaluation, or (less often) the range limitations which would be avoided by prec-div, prec-sqrt, and ftz.
If you are spending a lot of time processing underflows, as would be implied if ftz affects your execution time, you may have to look at scaling your algorithms to avoid dependence on gradual underflow.
I've lost some arguments about correct evaluation of expressions without disabling safer optimizations, so my only suggestion there is one you already mentioned; find out where you can't get satisfactory performance with safe optimizations, and enable the risky optimizations only there.

Damnit! When running a different (longer and more involved) simulation, I get the same sort of unexpected numerics. -fp-model precise fixes the problem but at a high cost!

Haven't I said "numerically unstable optimizations" at the very beginning of this thread? I just love it when I am right! :-)

Leave a Comment

Please sign in to add a comment. Not a member? Join today