Trouble with long equation

Trouble with long equation

Hi,
I have some strange behaviour with a "normal" equation.
Consider this (hard to read, I know) line:

C(7)=(E(8)-(2*E(10)+2*E(46)+E(42)))*PI/E(5)-E(42)

All single values are of type REAL, all of them have neither NULL nor 0 value.
When running through this line, the result I get is frustrating "NaN". I have several more such examples, same behaviour. Now when I split this long equation into

HILF1C7=2*E(10)
HILF2C7=2*E(46)
HILF3C7=HILF1C7+HILF2C7+E(42)
HILF4C7=E(8)-HILF3C7
HILF5C7= HILF4C7*PI
HILF6C7=E(5)-E(42)
C(7)=HILF5C7/HILF6C7

guess what. No more NaN but a usual (desired) value I can continue calculating with.
I have dozens of such equations so I cannot believe to need to split all of them into shortest pieces.

This behaviour occurs under Intel Visual Fortran Compiler 8.1 within Visual Studio 2003.
Does anyone here have an idea what to do to avoid a time-spending effort to split these equations?

Thanks again in advance.
Harald

36 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

You need to restock your supply of parentheses. Try this:

C(7)=(E(8)-(2*E(10)+2*E(46)+E(42)))*PI/(E(5)-E(42))

Steve - Intel Developer Support

Hi Steve,
thanks for your quick answer. But as I see, your suggestion gives a different mathematical result

C(7)=(E(8)-(2*E(10)+2*E(46)+E(42)))*PI/(E(5)-E(42))

is different from

C(7)=(E(8)-(2*E(10)+2*E(46)+E(42)))*PI/E(5)-E(42)

To shorten them, yours is

C(7)=X*PI/(E(5)-E(42))

while the original is

C(7)=X*PI/E(5)-E(42) which is like C(7)=[X*PI/E(5)]-E(42)

Or even shorter, I have A/B-C which you turn into A/(B-C).

Yes, it is different, but it matches (or at least comes closer to) your split-out assignments where you do the subtraction first. I don't know what you really want here, but I suggest you fully parenthesize the expression the way you want. If you still have problemss, please submit a test case.

Steve - Intel Developer Support

I think that was Steve's point - your single line code was (A/B)-C but your split down one amounted to A/(B-C)

Are you sure your numerator will fit into a real?

We really need to see the value of the numbers.
You have to make sure that you are not setting up a condition
where NaN is the expected result. For example if you are doing
(A/B)-C, you don't want both A and B to be 0.

Well, this all looks very strange here. When I tried a smaller test program with the values we have trouble with, everything is fine, no NaN in sight:

PROGRAM NaNTest
REAL T2,E,K1
DIMENSION E(10)

T2=1.483530
E(6)=0.4000000
E(10)=0.1000000
K1=T2/E(6)/(T2/E(6)-(E(6)/E(10)/(5+E(6)/E(10))))
PRINT *,K1
STOP
END

But in our code this comes to a NaN problem. We wonder why. The program goes through some do-loop, and when it comes to the 3rd iteration, the subroutine with this equation produces a NaN while in the two former iterations it does not. Of course with different values.

The declaration is done by

implicit real (A-Z)

Anyone having an idea what the reason for this problem could be?

What are your Project settings?
I think we had similar troubles before we set
the following compiler and linker options :
* Fortran->Data->Local Variable Storage = All Variables SAVE
* Fortran->Data->Initialize Local Saved Scalars to Zero = Yes

Good luck
Hans

Hallo Hans,
thanks for this idea, sounds reasonable. Anyway, after the suggested changes we have still the same problem also after a build-all.

You may be addressing an array out of bounds. Does this happen in a debug build? If so, you should be able to step through the code with the debugger and find where the NaN is coming from (perhaps by watching the value of variables as they change.)

Steve - Intel Developer Support

>sblionel wrote:
You may be addressing an array out of bounds. Does this happen in a debug build?

Not only in debug mode, but also when running without debugging.

>If so, you should be able to step through the code with the debugger and find where the NaN is coming from (perhaps by watching the value of variables as they change.)

This is the problem. The values the debugger shows are the ones I have posted above in a former message.

K1=T2/E(6)/(T2/E(6)-(E(6)/E(10)/(5+E(6)/E(10))))

T2=1.483530
E(6)=0.4000000
E(10)=0.1000000

And, this is not the only equation producing NaN. There are some more, looking similar. Mysterious is that splitting the equations into sub-calculations turns out to become no NaN.

I didn't see that you resolved the inconsistency between the long equation and your separate expressions.

Steve - Intel Developer Support

I remember: Searching wrong array bounds is very hard. I have spent much time, also with the tool "ftnchek".
Hans

It might be that somewhere in the optimisation an intermediate value is generated which is NaN have you tried compiling with no optimisation for the affected routine.
Also as there are a lot of sequential divisions you could try rewriting your equation

K2=(T(2)/E(6))*((T(2)/E(6))-(E(6)/E(10))*(5+(E(6)/E(10))))

and see if that helps

Hi Craig,

>It might be that somewhere in the optimisation an intermediate value is generated which is NaN have you tried compiling with no optimisation for the affected routine.

We use default values for optimization which is disabled.

>Also as there are a lot of sequential divisions you could try rewriting your equation
>K2=(T(2)/E(6))*((T(2)/E(6))-(E(6)/E(10))*(5+(E(6)/E(10))))>and see if that helps.

Unfortunately this is a different equation than ours.

sblionel wrote:
I didn't see that you resolved the inconsistency between the long equation and your separate expressions.

We also tried

K1=(T2/E(6))/((T2/E(6))-((E(6)/E(10))/(5+(E(6)/E(10)))))

but still NaN.
Remember that this equation works twice but in the third loop with the same values it produces a NaN.

A solution for this problem is really important for us as we have dozens or even hundreds of such equations which we do not want to split each into 10 pieces or so.

If the problem persists with a current compiler, please submit a test case to Intel Premier Support and we'll be glad to take a look.

Steve - Intel Developer Support

Hi Steve,
we have the 8.1 compiler as we and the customer once decided to take this one. It was even before 9.x was announced. A change in the current project in progress will not take place.
Sending the project might also be difficult as it contains customer data etc. But I will check this possibility.

Well, you can try the latest 8.1 compiler. But I can't imagine how we'll solve this in the forum.

Steve - Intel Developer Support

its a rearrangement of your equation avoiding the use of sequential division opertors.

remember

a/b/c=ac/b

although I have seen fortran compilers which will treat it as a/bc and even ac/b in some levels of optimisation and a/bc in others. This is one reason why brackets are very important.

It would help a lot if
a) you gave the result of the simple computation for the values you give as being applicable when you get a Nan, and
b) a set of values, including the answer you get, for one of the preceding steps through the loop in the actual complicated program when you do not get a Nan.

That is, we need inputs for a,b,c,d,e etc and output X when you try to compute X=a/b/(c-d/e), or whatever, in both the program that gives the NAn and the simple example that fails to reproduce it. You must be very careful that you copy the equation exactly when running the simplified version.

It would help a lot if
a) you gave the result of the simple computation for the values you give as being applicable when you get a Nan, and
b) a set of values, including the answer you get, for one of the preceding steps through the loop in the actual complicated program when you do not get a Nan.

That is, we need inputs for a,b,c,d,e etc and output X when you try to compute X=a/b/(c-d/e), or whatever, in both the program that gives the NAn and the simple example that fails to reproduce it. You must be very careful that you copy the equation exactly when running the simplified version.

OK, I can be more concrete now with a much smaller equation.
Consider these lines of code within a SUBROUTINE:

BLABLA1 = ATAN(0.0036992922)
BLABLA2 = ATAN(0.047968611/12.96697)
ZZZZ = 0.047968611
YYYY = 12.96697
BLABLA3 = ATAN(ZZZZ/YYYY)

N=ATAN(R1/X(1))

Now consider that R1 and X(1) have the same values as ZZZZ and YYYY.
The strange behaviour is: In the problematic project, BLABLA3 as well as N are computed as NaN. Of course when I create a new project with

PROGRAM NaNTest
REAL BLABLA1,BLABLA2,BLABLA3,ZZZZ,YYYY
BLABLA1 = ATAN(0.0036992922)
BLABLA2 = ATAN(0.047968611/12.96697)
ZZZZ = 0.047968611
YYYY = 12.96697
BLABLA3 = ATAN(ZZZZ/YYYY)
STOP
END

all results are the same, no NaN in sight.

The master question is: Why does the program give NaN in our context, while the same calculation alone in a testprogram does not? I admit I have no more idea at the moment.

I assume, this program was once ok. Whatwas changed? I have had terrible NaN-problems while we went from VMS IBM-Fortran to PC (IVF).
Hans

Hi Hans,
this is still some code which once worked on an old BS2000 or so. I will take a look whether there is a trial version or s.th. of the IVF 9.x to check the problem on this newer compiler.

craig wrote:
remember: a/b/c=ac/b
I think,

a/b/c = (a/b) / c <> ac/b
IVF9 calculates so, from left to right as the standard says.
Hans

Message Edited by hansruopp on 09-14-2005 01:58 AM

Hi Hans,
but this does not explain the mystery with ATAN e.g.

No, sooory.
Please report here your experience with 9.0.
But I fear, that the error is caused by different compiler behaviour on Host and PC. As Steve wrote: You may be addressing an array out of bounds.
And IVF is less tolerant.
Good luck
Hans

Message Edited by hansruopp on 09-14-2005 02:26 AM

hansruopp wrote:
I assume, this program was once ok. What was changed? I have had terrible NaN-problems while we went from VMS IBM-Fortran to PC (IVF).
Hans

Possibly, it may have seemed be OK, but was, nonetheless, broken, e.g., where IVF is giving NaN, the B2000 compiler gave 0.0. I think this may require some serious delving into the behavior of the compiler used on the B2000 system, possibly even talking to the guys who wrote the original program back when Fortran didn't have If..then..else..endif or a character data type (I'm that old...)

I, too, once worked with various flavors of IBM's Fortran compilers (big blue boxes in the bowels of the building). What I remember is that they did not use IEEE arithmetic, did not have NaN, did not maintain guard bits during floating point arithmetic, and (in some incarnations) had their optimizers introduce intriguing and difficult to trace bugs (why would one get a divide check after removing a write statement?).

The moral of this is that different hardware systems and different compilers do not necessarily behave the same.

Hi Hans,
the bad news is: No change with the 9.0.018 compiler. If you want to take a look, see http://img390.imageshack.us/my.php?image=nan3zk.jpg
All explanations, also concerning array index out of bounds, do not consider that a trivial (!) function as
ATAN(X/Y) should never give NaN with the X-Y-values I have provided and are available in the watch window in VS in the screenshot I provided in the link above. And: Why does this work in a DO 200 I=1,3 loop for 2 times, but for I=3 it does not? It might be a compiler issue, but I do not see any logic in this problem. This really puzzles us.

I looked at the image.Several things immediately make me suspicious.
1) there are undefined variables, and at least one variable, E, used as array when not defined as such.
2) Your programming practice involves defining variables such as I, K1, N as REAL, when you also refer to a Do-loop 'Do I=1,3'. I always thought it best to keep variables beginning I->O as INTEGER, just for safety.
3) you do not show HELPR1X1 orBLABLA4 value in your watch window.

I recommend adding IMPLICIT NONE everywhere. I also recommend that youshouldwatch the whole of an array whose index is being looped over, not just particular elements, then you can see which ones have been addressed and given values during computations (I believe in DEBUG mode, REALS are initialised to very large values).

Finally, it will probably save a lot of time if you list the WHOLE of the ACTUAL routine where the computation goes wrong, rather than edited and therefore altered extracts.

I agree with Steve Lionel, that the problemalmost certainlygoes back to overwriting due to array index misaddressing problems,possibly caused byREAL/INTEGER confusion, andalso maybecombined with uninitialised variables. I think we have taken your programming problem about as far as it can go now, as I believe that is what it is, not a compiler one.

Hi,
if I cut some of your source to a test programm in a DO 200 I=1,9999999 loop, it works ok.
This means, that the error is causedby a statement at totally another source place by overwriting some in the addresses in the working storage (sorry for my english).
This occured for us sometimes, if arrays went out of bond or parameters mismatched.
I found it with the tool ftnchek, but it is very time-killing.
Hans

Hi,
thanks for all your patience.
Yes, a standalone program does the job perfectly.
Now we have a little change in this program/problem:
I changed the order of some
DIMENSION
INTEGER
REAL
lines in a way that the declarations are now before the DIMENSION instruction. The - surprising - result: No more NaN here on my machine!

BUT: This change does not help a bit for my co-worker. She still has NaN.
We first thought it was the new 9.0 compiler I used. So I uninstalled the 9.0 and installed the same 8.1 as she uses. Same result: On my machine it still works. So she uninstalled the 8.1 and reinstalled it. Still NaN. We compared the project settings within Visual Studio - all the same. Of course she uses the same sources as I do.

And if that would not be confusing enough: I copied the .exe from my machine to the trouble computer. We executed this .exe with the correct parameters and access to the same files. And the mistery continues: Still NaN. So I think the compiler and Visual Studio can not be responsible. But what may influence and cause the different results?

What is different between these two PCs?
- Different CPU: P-III 1GHz vs. Celeron 2GHz
- Different OS : Win 2000 Server vs. Win XP

Any hints welcome.

The different OSs would mean that you are probably using different versions of system DLLs so potentially this may change the results you get. Also check you have exactly the same compiler version (I'd be surprised if it wasn't) and runtime components on each machine.

I wouldn't expect the different hardware to affect it at all - unless one of the machines has an unpatched FPU bug - did this affect PIIIs or Celerons at all? (or of course if you had optimised your code for a very specific target processor and I can't see the default setting doing this)

Another thing I would check between the versions compiled on each machine is the size of the executables - if they're not identical I would suspect you had missed some compiler or linker setting differences.

Another possibility could be some other piece of software with a memory leak accessing the same memory locations as your code - especially if you haven't initialised your arrays - this would be a nightmare to track down though, or of course some other part of your program misbehaving and corrupting your arrays- check COMMON blocks and EQUIVALENCE statements as well as array bounds.

Hi Craig,

>The different OSs would mean that you are probably using different versions of system DLLs so potentially this may change the results you get.

In the meantime we checked a 3rd PC with also Win2k on it - again NaN (incorrect) result.

>Also check you have exactly the same compiler version (I'd be surprised if it wasn't) and runtime components on each machine.

All the same.

>I wouldn't expect the different hardware to affect it at all - unless one of the machines has an unpatched FPU bug - did this affect PIIIs or Celerons at all? (or of course if you had optimised your code for a very specific target processor and I can't see the default setting doing this)

There is no optimization enabled. The running exe is on P-III, the incorrect results are on Intel Celeron 2GHz and on AMD Sempron 2.4GHz.

>Another thing I would check between the versions compiled on each machine is the size of the executables - if they're not identical I would suspect you had missed some compiler or linker setting differences.

We compared them in the project properties step by step.

>Another possibility could be some other piece of software with a memory leak accessing the same memory locations as your code - especially if you haven't initialised your arrays - this would be a nightmare to track down though, or of course some other part of your program misbehaving and corrupting your arrays- check COMMON blocks and EQUIVALENCE statements as well as array bounds.

I will look at that. Although it will be very hard to find out.
See my other post, what is frustrating is that even the working .exe alone does not work on the other two machines.

I would add a new point to all others, without going into the details of your problem:

In your example you use such constants as 0.0036992922 and 0.047968611 and as you can see in your own watch window, these are truncated to some other values with fewer decimals and completed with E-3 or E-2 etc. This depends on that you are trying to use a larger number of digits than what REAL(4) gives you. My suggestion would then be:

1- Use IMPLICIT NONE in all of your program units (subroutines, functions);

2- Declare all variables;

3- Use double precition real variables (REAL(8));

4- Type "D" with all real constants, e.g. 0.5D0 rather than 0.5 and 0.0036992922D0 rather than without D0.

Sabalan.

Leave a Comment

Please sign in to add a comment. Not a member? Join today