/fpe

/fpe

Is there somethingabout using /optimize:2 /fpe:0 and using a full-debug mode? I am using v10.0.025 on Win64 with VS2005.

The proram runs successfully with the following combinations:

(1) /optimize:2

(2) /optmize:2 and full debug

(3) /optimize:0, full debug, and /fpe:0

I use /traceback /check:uninit /check:bounds.

When I use /optimize:2, full debug, and /fpe:0 I see one variable getting a value NaN. This variable is not used in any calculation relevant to the problem and I can see the code that initializes this variable to zero. In fact, in theproblematic configuation, the variable gets NaN exactly when the code is trying to initialize it to zero. The resulting NaN changes the values of variables computed earlier and the program diverges. I see the NaN only by using the breakpoints. The program does not throw an exception.

I believe the above symptoms are that of a memory stomp; but it seems to extremely difficult to see where. Any suggestions?

Sincerely

Abhi

16 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

There is nothing special about that combination other than, in your particular example, it hides/prevents the problem. I don't understand your saying that "the variable gets NaN exactly when the code is trying to initialize it to zero" if you also say you can't see where.

Steve - Intel Developer Support

Hi

Sorry for the confusion. Here is what I can see happening:

Initialization block at the begining of a subroutine

Call dummy(a, b, c, d)

Sub dummy(a, b, c, d)

!Initialize

a = 0

b = 0

c = 0

d = 0

.....

End Sub dummy

When in the optimized debug mode with /fpe:0, I see that before initialization a, b, c, and, d have random values during initialization when the debugger says that it is executing c = 0 statement, c gets initialized to NaN. So this happens as the debugger shows the next statement to be initialization of d.

For the particular problem, c never gets used anywhere. When the program returns to the subroutine that calls dummy, I can see that c has NaN. But in addition, at the very first statement after this callI can see that a couplevariables have "lost" their previously computed value. The program will continue but will give wrong results.

I suspect that my floating point stack is getting corrupted. So when I add /Qfpstkchk option, I will get an access violation error and the program will stop exactly after the subroutine dummy.

I wonder what is the relation between /fpe and /Qfpstkchk. Is there something obvious that I am missing?

Abhi

There is no relation at all between those two options, but turning on /fp-stack-check is certainly advisable. I would also use /gen-interface /warn:interface if this is not a VS project.

Are the arguments in the subroutine declared with exactly the same types (and dimensions) as in the caller? Is there an explicit interface for "dummy"?

Without seeing a test case, it's hard to offer suggestions.

Steve - Intel Developer Support

Pardon me for coming back without givinga test-case...I am having hard time reproducing this behavior. So please consider this just as an update on this issue that I am chasing wildly.

I have all interfaces explicit. I have included /warn:interface and /gen:interface and made appropriate corrections.

I am observing that if I add /fp:precise in my options I don't get a floating point stack corruption. Thus, /optimize:2 /debug:full /fpe:0 /fp:stack-check reports an "access violation" without the stack-check I get wrong results and when I add /fp:precise to the above list I get correct results and obivously no stack corruption. I am battling to figure out if there is a memory stomp or uninitialized variable. Can I set the default initial values of all scalars and arrays to something like Infinity?

Abhi

The access violation with /fp-stack-check is a big clue. This is exactly what happens when the FP stack check detects a problem. Find the spot where the access violation occurs, and then back up to the previous function call (most likely). That's where your problem is.

Steve - Intel Developer Support

Steve

I deeply appreciate you entertaining such an open ended querry.

I know exactly where the stack corruption reports back. I do go in that subroutine and find nothing that is corrupted. If I put a break-point at the last line of the called subroutine I find all the variables are properly initialized and all the values are correct. I kept going back like that and couldn't find anything.

What I did next was isolated the subroutine in which stack corrpution is reported. I compiled all the other files with no optimization keeping all other flags (like /fpe:0).When I compile the file that contains just one subroutine in question with optimizations on, I see the problem. Turning off optimizations and/or taking off /fpe:0 switch makes the error go away.

I will keep digging and try to trace the subroutines again.

Abhi

It isn't variables - rather, there is a series of instructions which pushes values onto and takes them off a set of registers called the FP stack. It is not related to the memory stack. It is critical that you never pop more than you push, or push something and not pop it later. It could be that there is a compiler bug generating bad code. We'd need to see a compilable and runnable example in order to investigate. If you're willing to do that, submit it to Intel Premier Support.

To trace it yourself requires you to open the Debug > Registers view and enable viewing the FP registers. Then as you step by instruction you'll see things go onto and off of the FP stack registers.

Steve - Intel Developer Support

Steve,

We have reproduced thesamebehaviour in another code segment and I suspect there is a bug with fpe:0. Its is not going to be easy to give Intel code you can compile to investigate as it will involve giving you a large-ish code base and may require an NDA, but would a webex session showing you behaviour be any helpat all as we can easily do that?

Thanks,

Tony

Try it with the 11.0 beta and see what happens. Without a test case we can build here, investigating may be very difficult, but we may be able to work something out short of an NDA.

Steve - Intel Developer Support

Hi Steve

[A] Using version 11.0.039 beta, I don't get this problem.

[B] I have yet another case when using v11 with same set of flags results in a success while v10 results in failure. In this case, I don't get a fp stack corruption as in the previous one. The strangest thing is the following::-

I am using -optimize:2 /traceback /fpe:0 /check:bounds /check:uninit /Qfp-stack-check /Ge

There is no debug information requested.

Now with version 10..024, (i) Visual Studio -> Debug --> Start without debugging: gives failure, (ii) Visual Studio --> Debug --> Start Debugging: first gives No debug information available window and ask me if I want to continue. I click yes, and the simulation succeeds.

Failure and Success is irrelevant here and what I mean is I get two completely different answers (First case does not produce what I can call an answer.)

When I use exactly same flags but build with version 11.0.039, I don't get this problem.

Addition of /fp:precise to v10, also results in what I call successful solution.

I am aware that this is an open ended problem since I cannot give a test case. But as v11 seems to be working differently (which is good for me, at present), I am wondering if there can be any documentation of fixes made in v11 related to these compiler options or a combination of these options. That may help me find out if there is something wrong in my code.

Abhi

Sorry, we don't have a ready list of such things. Many changes have been made and the developer may not have noted any connection to /fpe. I will note that you do need to use /fp:precise in order to get full IEEE exception semantics.

Steve - Intel Developer Support

This has been a persistent pain ever since VF's incarnation as DVF. It behaves quite erratically being liable to change from one release to the next as noticed. It defies rational explanation. IMO, /fpe:precise is safer and preferable although YMMV.

Gerry

Quoting - Steve Lionel (Intel)Sorry, we don't have a ready list of such things. Many changes have been made and the developer may not have noted any connection to /fpe. I will note that you do need to use /fp:precise in order to get full IEEE exception semantics.

Hi Steve,

I just read your reply - I would like to suggest that it would be REALLY helpful if with each release/update of the compiler you do create a list of changes and key bugs fixed please? It enables us to make an informed decision of when/if we want upgrade to the next version and what value is brings us and our customers.

We have recently spent a considerable amount of resources hunting down potential issues in our code (we are using version 10 of the compiler) that may not be issues at all and may be bugs in your /fpe implementation. And now the problem goes away im version 11 and you cant tell us for sure if /fpe had some bugs fixed or not....so we dont know if there still is a problem lurking in our code or that /fpe at version 10 was broken.

This is a common thread I have experienced with the Intel software we are using (Fortran, C++ and MKL) - the software itself is great, but the documentation really needs some improvement (I have already been in touch directly with Intel on this very topic with regard to MKL).

This is not a personal critisism - Im basically saying please please can Intel spend some resources to improve the documentation of Fotran/C++/MKL :-)

Thanks

Tony

This is a difficult thing. We DO provide such a list with minor updates in a file called README.TXT that is posted alongside the update. But with major versions, we often rewrite sections of the compiler and, without targeting a particular bug, a bug may disappear in the rewrite. It is simply not possible for us to know, unless a bug has been reported to us, whether or not a release fixes a particular problem.
We do also provide a README.TXT for major releases, but it lists only issues reported against earlier versions that are fixed in the major release for the first time (and not in a previous update.)
Keep in mind that a problem description is rarely specific enough to help you recognize an issue. As always, if you want to know if a specific issue is fixed, just ask.

Steve - Intel Developer Support

Quoting - Steve Lionel (Intel)

This is a difficult thing. We DO provide such a list with minor updates in a file called README.TXT that is posted alongside the update. But with major versions, we often rewrite sections of the compiler and, without targeting a particular bug, a bug may disappear in the rewrite. It is simply not possible for us to know, unless a bug has been reported to us, whether or not a release fixes a particular problem.

We do also provide a README.TXT for major releases, but it lists only issues reported against earlier versions that are fixed in the major release for the first time (and not in a previous update.)

Keep in mind that a problem description is rarely specific enough to help you recognize an issue. As always, if you want to know if a specific issue is fixed, just ask.

Hi Steve,

Apolygies if my reply came across as overly critical - it wasnt meant that way. I quite understand its very difficult to be very specific in release notes sometimes.

Thanks

Tony

Leave a Comment

Please sign in to add a comment. Not a member? Join today