NaN with -O2 and above, expected results with -O1 and below

NaN with -O2 and above, expected results with -O1 and below

Imagen de Izaak Beekman

Hi I have a non-trivial, object-oriented statistics library to calculate statistical moments (and co-moments) in a numerically robust fashion, implementing the algorithms described in http://infoserve.sandia.gov/sand_doc/2008/086212.pdf. I have taken great care to implement numerically robust algorithms. However, when I compile my code with -O2 and above something happens that causes NaNs in one of the routines while -O1 and -O0 give the correct, expected results. Additionally compiling with the -check flag gives the expected results, however, I think -check disables a lot of the optimization. I tried to compile with -opt-report and didn't notice anything suspicious in the offending routine. Further more, adding flags like -standard-semantics (which adds -assume protect_parens -assume realloc_lhs etc. to help diagnose potential error sources) does not fix anything. I am suspecting that the optimization algorithm has an error/bug of some variety. Perhaps, either new OO Fortran invalidates one of the assumptions/checks of the optimizer or there is some other compiler bug. I am tempted to submit this to premier.intel.com, but I will have to send them the whole project, as I cannot find a way to reproduce this error with a simpler program/code.

Has anyone had similar experiences? Could this be "correct" behaviour of the compiler? (produce NaNs for -O2 and up while -O1 and below run fine?) Should I submit the whole project to premier support?

Thanks

-Zaak
publicaciones de 25 / 0 nuevos
Último envío
Para obtener más información sobre las optimizaciones del compilador, consulte el aviso sobre la optimización.
Imagen de jimdempseyatthecove

The usual first steps are to compile with runtime checks for use of uninitialized variables and array indexing out of bounds. Also perform a clean and use /gen-interfaces /warn:interfaces Do not assume correct results in debug mode means code got the results by being correct.

Also, if you have convergence routines and/or unit vector generators, add check code to assure your results are not overly sensitive to boundary conditions.

Jim Dempsey

www.quickthreadprogramming.com
Imagen de Izaak Beekman

Hi Jim,

As noted in the original post, when I compile with -check (which enables all runtime checks including array out of bounds etc.) I get correct results and no warnings (enabling runtime checks disables some optimization). I also always compile my code with all compile time warnings present, with the -warn flag, and find nothing suspicious there other than unused dummy arguments (for interface compatiblity reasons).

As far as testing goes, the NaNs are appearing in the unit tests, not in a certain application. The whole point of the algorithms that I have used is that they can handle strenuous/ill-conditioned cases, however the portion of the unit test in which it fails has quite a pedestrian/well-conditioned input case. So it's definitely not a boundary condition/testing covereage issue.

I as far as my Fortran code goes, it is all in modules, although there is one function which is called from C, because the test driver, main(), is in C. I am not sure if if adding -gen-interfaces will have an effect, but I'll give it a shot.

The problems occur on OS X and Linux, and I tried running it in valgrind on Linux (Valgrind doesn't work in OS X, at least not 10.8) and Valgrind didn't find any memory leaks in my code or runtime libraries although it did raise a number of warnings about the fortran runtime library.

The only other diagnostic I can think of is static analysis, which is available to me through work, so I'll try firing up inspectorXE next week to see if the static analysis will find any bugs that the runtime and compile time checks/warnings are incapable of finding. Also, I'll fire it up with debugging symbols and optimizations turned on in a debugger to pinpoint where exectly things go south.

I'll give this a shot, and report back. Any other tips are most welcome!

-Zaak
Imagen de Casey

Zaak,

Please update if you find anything.  I have a similar issue with a model I run that produces NaN when initializing the model domain when compiled with -O2 or greater, but behaves normally under -O1,  I played around with many of flags before narrowing it down to the optimization.  Behavoir is identical whether I compile serially, OpenMP or MPI (all supported by the model).  I haven't had the time to get dirty in the code and as a typical run I do runs for 18-22 hours anyway I'm not really in a rush for results.  Likewise, if I do find the time and figure anything out on my end, I'll pass it along to you.

Imagen de mecej4

Cita:

Izaak Beekman wrote: Could this be "correct" behaviour of the compiler? (produce NaNs for -O2 and up while -O1 and below run fine?)

If your code has errors of the type mentioned (uninitialized variables, array limit overruns), any changes in the compiler options used, time of day, etc., can produce NaNs (and other erroneous output) any time. With such errors present, the behavior of the program is undefined; therefore, any any  behavior is "correct". You should not expect the generation of NaNs to be associated with a specific compiler option.

Imagen de Izaak Beekman

Hi mecej4

If your code has errors of the type mentioned (uninitialized variables, array limit overruns), any changes in the compiler options used, time of day, etc., can produce NaNs (and other erroneous output) any time.

"Errors of the type mentioned..." where? By Jim?

As you point out, if there are errors in my code, this is indeed the correct behaviour of the compiler. However, the Intel provided runtime checks (-check) found NO such errors, as I stated in my original post. Furthermore, enabling them turns off the compiler optimizations, causing my code to produce correct results. It says in the documentation that -check specifically diagnoses array out of bounds and uninitialized variables. Therefore, if such errors exist in my code then the runtime checks are not finding them, or else there are no such errors in my code. ( -check doesn't find anything, as stated in myoriginal post.)

I agree that this behavior sounds like an array out of bounds error or a variable being used before being initialized, so I'll go through my code again by hand, and with the static analysis tool, to double check if the runtime checks aren't finding errors in my code.

What I meant by my original question about "correct" compiler behavior, is whether it is conceivable that floating point operation optimizations or code reordering, strength reduction, etc. could conceivably be responsible for creating NaNs. I guess in general the answer is yes, but I find it hard to beleive that with certain compiler flags the correct variance, ~1, is calculated while with other optimization flags NaNs are produced, and with well conditioned inputs and an algorithm that is designed to be numerically robust.

Any way, thanks for the additional insight mecej4

-Zaak
Imagen de Izaak Beekman

Hi mecej4

If your code has errors of the type mentioned (uninitialized variables, array limit overruns), any changes in the compiler options used, time of day, etc., can produce NaNs (and other erroneous output) any time.

"Errors of the type mentioned..." where? By Jim?

As you point out, if there are errors in my code, this is indeed the correct behaviour of the compiler. However, the Intel provided runtime checks (-check) found NO such errors, as I stated in my original post. Furthermore, enabling them turns off the compiler optimizations, causing my code to produce correct results. It says in the documentation that -check specifically diagnoses array out of bounds and uninitialized variables. Therefore, if such errors exist in my code then the runtime checks are not finding them, or else there are no such errors in my code. ( -check doesn't find anything, as stated in myoriginal post.)

I agree that this behavior sounds like an array out of bounds error or a variable being used before being initialized, so I'll go through my code again by hand, and with the static analysis tool, to double check if the runtime checks aren't finding errors in my code.

What I meant by my original question about "correct" compiler behavior, is whether it is conceivable that floating point operation optimizations or code reordering, strength reduction, etc. could conceivably be responsible for creating NaNs. I guess in general the answer is yes, but I find it hard to beleive that with certain compiler flags the correct variance, ~1, is calculated while with other optimization flags NaNs are produced, and with well conditioned inputs and an algorithm that is designed to be numerically robust.

Any way, thanks for the additional insight mecej4

-Zaak
Imagen de jimdempseyatthecove

Zaak,

At this point in time I believe you will aggree that the problem is one of three possibilities (in no particular order):

a) bug in source code
b) bug in compiler
c) sensitivity in code due to minor fluxuations in data induced by rounding differences due to changes in code path.

When situations like yours happens to me, I assume all three (and possibly others unknown) are candidates for the error. None of the possibilities can be checked off the list until the source of the problem is found.

Have you experimented with changing IPO (InterProcedural Optimizations) using None or Single File?

If IPO was not the issue then my preferred technique is to manipulate the optimization levels on individual source files of a project in a binary search-like process. For example starting with the working configuration (-O1), select the first half of the files for -O2 (this is relatively easy to do in Visual Studio in Windows, I am not sure with the IDE you use on Linux). If this fails, you divide the optimized group in half and remove the optimizations and try again. If the first run succeeds, then mark half of the unoptimized half for optimization. At some point you will/may narrow this down to one source file as causing the problem. Once identified, the task of identifying the cause a), b), c), ?) becomes easier.

Also note that you are in the position of choosing to produce an optimized execuitable with the one source compiled with -O1.

Jim Dempsey

www.quickthreadprogramming.com
Imagen de dkokron

The "-check uninit" is very limited as a colleague discovered.  He developed a procedure to make the various compiler options do what one might expect them to do.

Please see webinar titled "UnInit: Fix your code! Finding computation with uninitialized data" at http://www.nas.nasa.gov/hecc/support/past_webinars.html

Imagen de Izaak Beekman

Cita:

jimdempseyatthecove wrote:

Zaak,

At this point in time I believe you will aggree that the problem is one of three possibilities (in no particular order):

[snip]

Have you experimented with changing IPO (InterProcedural Optimizations) using None or Single File?

If IPO was not the issue then my preferred technique is to manipulate the optimization levels on individual source files of a project in a binary search-like process. For example starting with the working configuration (-O1), select the first half of the files for -O2 (this is relatively easy to do in Visual Studio in Windows, I am not sure with the IDE you use on Linux). If this fails, you divide the optimized group in half and remove the optimizations and try again. If the first run succeeds, then mark half of the unoptimized half for optimization. At some point you will/may narrow this down to one source file as causing the problem. Once identified, the task of identifying the cause a), b), c), ?) becomes easier.

Also note that you are in the position of choosing to produce an optimized execuitable with the one source compiled with -O1.

Jim Dempsey

Yes Jim, I agree with you about the three candidate issues, although I find c) to be highly improbable unless some very very very aggressive things are happening behind the scenes.

RE: IPO -- Do any of the -Ox flags imply IPO? If not I am not using IPO. I agree with you about manipulating the optimization level of individual files to isolate the routine causing the problem, but I think I already have it localized, but it could change with different compiler flags. I'm going to take a look at the checkuninit webinar and mess around with initializing everything to zero and to weird values using the compiler flags to see if I can't find any uninitialized variables that slipped through the -check cracks.

As an aside to the Intel folks, I think there's an issue with the forums. Where I have a duplicate post above, there is supposed to be a post from someone else. I know because I subscribe to this thread and received an automated email with the contents. When I posted my message, however, it seems to have clobbered his.

-Zaak
Imagen de Steve Lionel (Intel)

The -Ox flags do not imply -ipo. -fast does, however.

I have not seen the duplicate post issue reported before, but we'll be on the lookout for it in the future.

Steve
Imagen de Izaak Beekman

Cita:

dkokron wrote:

The "-check uninit" is very limited as a colleague discovered.  He developed a procedure to make the various compiler options do what one might expect them to do.

Please see webinar titled "UnInit: Fix your code! Finding computation with uninitialized data" at http://www.nas.nasa.gov/hecc/support/past_webinars.html

Wow, thanks for sharing this webinar, very informative. I have no idea that -check uninit and -ftrapuv were so limited in usefulness. Are the utilities mentioned in the webinar available to the public? I used to have a NAS pleadeis account, but the contract ended, and with it my account.

I have narrowed the effect of compilation options to a single source file, and still can't find any problems in my source code.

I've tried adding -fpe0 and -fpe-all=0, which give me an error message: "Exception: Numerical" when combined with the proper compilation flags to exercise the bug/problem, but they give no useful line or procedure information. Is there a way to extract this? Also, please note, adding symbol tables with -g causes this runtime problem to disappear making it highly frustrating to debug.

Adding -zero with the offending compiler options does not change the runtime problems.

I am currently working on recreating this problem with a sane build/reproducer which I can submit here. Hopefully I'll have one finished today or tomorrow.

-Zaak
Imagen de Izaak Beekman

Cita:

dkokron wrote:

The "-check uninit" is very limited as a colleague discovered.  He developed a procedure to make the various compiler options do what one might expect them to do.

Please see webinar titled "UnInit: Fix your code! Finding computation with uninitialized data" at http://www.nas.nasa.gov/hecc/support/past_webinars.html

Wow, thanks for sharing this webinar, very informative. I have no idea that -check uninit and -ftrapuv were so limited in usefulness. Are the utilities mentioned in the webinar available to the public? I used to have a NAS pleadeis account, but the contract ended, and with it my account.

I have narrowed the effect of compilation options to a single source file, and still can't find any problems in my source code.

I've tried adding -fpe0 and -fpe-all=0, which give me an error message: "Exception: Numerical" when combined with the proper compilation flags to exercise the bug/problem, but they give no useful line or procedure information. Is there a way to extract this? Also, please note, adding symbol tables with -g causes this runtime problem to disappear making it highly frustrating to debug.

Furthermore, I have run static analysis on the code, with inspector-xe. Static analysis does not flag any problems for the two .inspxe files that I can open, however one of .inspxe files causes inspector-xe to segfault and crash upon opening. I seem to have a knack for breaking things =/

Adding -zero with the offending compiler options does not change the runtime problems.

I am currently working on recreating this problem with a sane build/reproducer which I can submit here. Hopefully I'll have one finished today or tomorrow.

-Zaak
Imagen de Izaak Beekman

EDITED 08/17/2013 4:45 PM EDT:

Oops, I seem to be losing my mind. This post was highly erroneous and has been removed.

-Zaak
Imagen de Izaak Beekman

oops, wrong email copied in the previous post, in response to Steve. Please ignore or delete. I've finaly lost my mind....

-Zaak
Imagen de jimdempseyatthecove

Now that you have narrowed this to one source file, try inserting ISNAN(x) tests. Several years ago I had a similar problem that required tests to identify the problem. In most of the cases the cause was unanticipated input values. This required additional code to handle the exception conditions. In a few cases, the older version of IFV had a bug (since corrected), for this, I left the optimization off. Luckily it did not affect the performance by a significant amount.

Good luck.

Jim Dempsey

www.quickthreadprogramming.com
Imagen de Martyn Corden (Intel)

Have you tried building with -traceback in addition to -fpe0 (for the main routine, applies to whole program) or -fpe-all=0 (for individual routines)?

This is a lightweight option to give a simple traceback through user code when a floating-point exception occurs, e.g. when an SNaN is generated or consumed. There are some limitations, e.g. it can't trace back from a daughter thread to a parent thread in OpenMP. But it can be an easy way to get started debugging a problem.

         The preferred way of disabling optimizations that might cause slight variations in floating-point results is with -fp-model precise. See the article attached at http://software.intel.com/en-us/articles/consistency-of-floating-point-results-using-the-intel-compiler/.

If you still see a problem at -O2, to narrow things down further, try building with

1)  -fno-inline -no-ip     (to disable inlining and inter-procedural optimizations within a single source file)

2)  -no-vec      (to disable vectorization and other loop optimizations).

Imagen de Izaak Beekman

Hi Martyn,

Thanks for pointing me to that article about consistency of floating point results, I certainly need to read it.

When I build with -traceback and -fpe0 -fpe-all=0 I do not get a call stack trace. I simply receive the following error message and program execution halts: `Floating point exception: 8` My code (actually a unit test driver for a library I am developing) is mixed language, with main {} written in C (test driver) and the library code written in Fortran. All sources are compiled and linked with -traceback, but no traceback seems to take place on SIGFPE. I can't figure out why this is the case.

I have also tried your suggestion of adding -fp-model precise as well as -fno-inline -no-ip and -no-vec, neither of which seem to make a difference.

I have localized the issue to the following routine by hand, and have surmised that it is the associate construct which is causing a problem with the -O2 and above optimizations. When I replace the associate construct with the unaliased code all goes according to plan and no NaNs /SIGFPE is produced.

IS_PURE subroutine AddVectors(this,u,v)
 class(spss_covar_t) ,intent(inout) :: this
 real(WP) ,intent(in) :: u(:) ,v(:)
 real(WP) :: du ,dv
 integer(WI) :: i
# ifdef MY_BACKTRACE
 print*, 'Entering AddVectors in' ,__FILE__ ,' @ ' ,__LINE__ ,'.'
 if(size(u) /= size(v)) stop 'The length of u differs from v in AddVectors in '//__FILE__//'!'
# endif
 associate(n => this%n)
 do i=1,size(u)
 n = n + 1
 du = u(i) - this%umean
 dv = v(i) - this%vmean
 this%varu = this%varu + (n - 1)*du*du/n
 this%varv = this%varv + (n - 1)*dv*dv/n
 this%covar = this%covar + du*dv*(n - 1)/n
 this%umean = this%umean + du/n
 this%vmean = this%vmean + dv/n
 end do
 end associate
# ifdef MY_BACKTRACE
 print*, 'Leaving AddVectors in ' ,__FILE__ ,' @ ' ,__LINE__ ,'.'
# endif
 end subroutine

Some notes: if MY_BACKTRACE is undefined IS_PURE is defined as `pure` and will evaluate to this, otherwise it will evalute to the empty string. This subroutine is called as a TBP to a covariance object. The dummy argument `this` is a derived type covariance object to which the procedure is bound. The pocedure updates the state info which includes mean, covariance, 2nd order statistical moments, etc. by processing an additional set (vectors) of the two input covariate samples. 

I suspect that the production of NaNs with -O2 and higher is a compiler bug, because this behavior is associated with a "new" language feature. Now that I have completely localized the issue, I will attempt to create a small reproducer program that I will submit here.

Thanks to everyone for their suggestions and to the folks at Intel with their dedicated tech support and their willingness to listen to my half baked theories of what might be going wrong and profer advice and suggestions.

-Zaak
Imagen de Izaak Beekman

Below is a simple reproducer program and the commands to compile and run it. First the program:

module covariance_m
 implicit none
 private
 public :: covar_t, WP ,WI
 integer ,parameter :: WP = kind(1.0D0) ,WI = kind(1)
 type :: covar_t
 private
 integer(WI) :: n = 0_WI
 real(WP) :: covar = 0.0_WP ,umean = 0.0_WP ,vmean = 0.0_WP ,&
 varu = 0.0_WP ,varv = 0.0_WP
 contains
 procedure :: AddVectors
 procedure :: GetCorrelation
 end type
contains
 subroutine AddVectors(this,u,v)
 class(covar_t) ,intent(inout) :: this
 real(WP) ,intent(in) :: u(:) ,v(:)
 real(WP) :: du ,dv
 integer(WI) :: i
 if ( size(u) /= size(v) ) stop 'Bad call to AddVectors'
 if ( this%n < 0 .or. this%n > nint(0.95_WP*huge(1_WI))) stop 'Set size overflow occurred or pending.'
# ifdef USE_ASSOCIATE
 associate ( n => this%n )
 do i=1,size(u)
 n = n + 1_WI
 du = u(i) - this%umean
 dv = v(i) - this%vmean
 this%varu = this%varu + (n - 1)*du*du/n
 this%varv = this%varv + (n - 1)*dv*dv/n
 this%covar = this%covar + du*dv*(n - 1)/n
 this%umean = this%umean + du/n
 this%vmean = this%vmean + dv/n
 end do
 end associate
# else
 do i=1,size(u)
 this%n = this%n + 1_WI
 du = u(i) - this%umean
 dv = v(i) - this%vmean
 this%varu = this%varu + (this%n - 1)*du*du/this%n
 this%varv = this%varv + (this%n - 1)*dv*dv/this%n
 this%covar = this%covar + du*dv*(this%n - 1)/this%n
 this%umean = this%umean + du/this%n
 this%vmean = this%vmean + dv/this%n
 end do
# endif
 end subroutine
 elemental function GetCorrelation(this) result(res)
 class(covar_t) ,intent(in) :: this
 real(WP) :: res
 res = this%covar/sqrt(this%varu*this%varv)
 end function
end module
program sigfpe
 use covariance_m ,only: covar_t ,WP ,WI
 implicit none
 real(WP) ,dimension(:) ,allocatable :: x ,y
 class(covar_t) ,allocatable :: xy_covar
# ifndef VECSIZE
# define VECSIZE 1024
# endif
 allocate(xy_covar,source=covar_t())
 allocate(x(VECSIZE) ,y(VECSIZE))
 call random_number(x)
 call random_number(y) ! 0 covariance, modulo bad PRNGs
 print*,'About to test AddVectors'
 call xy_covar%AddVectors(x,y) ! may or may not trigger FPE based on associate construct
 print*,'Correlation between two random deviates is:', xy_covar%GetCorrelation()
end program

Now the compilations/tests:

laptop:statistics me$ ifort -warn -traceback -O2 -fpe0 -fpe-all=0 reproducer.F90 #bug free, even at -O2 
laptop:statistics me$ ./a.out 
 About to test AddVectors
 Correlation between two random deviates is: 2.341576630216081E-002 #Acceptably close to zero for our purpoes
laptop:statistics me$ ifort -DUSE_ASSOCIATE -warn -traceback -O2 -fpe0 -fpe-all=0 reproducer.F90 #Enable the associate construct
laptop:statistics me$ ./a.out 
 About to test AddVectors
forrtl: error (73): floating divide by zero #No we didn't, odd
Image PC Routine Line Source
Stack trace terminated abnormally. #Never seen that before
Abort trap: 6
laptop:statistics me$ ifort -DUSE_ASSOCIATE -warn -traceback -O1 -fpe0 -fpe-all=0 reproducer.F90 #Still using associate construct, but compiles and runs for for -O1 and below.
laptop:statistics me$ ./a.out 
 About to test AddVectors
 Correlation between two random deviates is: 2.341576630216081E-002 # Identical to the result at -O2 without associate construct

I think there is a porblem somewhere in the optimizations performed on code using associate. I hope that I have not missunderstood the standard, or the associate block construct, but I think this reproducer code is sound. If this is not the case, I appologize for my error. This error occurs on all versions of ifort that I have tested on Linux and Mac (13.x, beta,  etc.)

-Zaak
Imagen de jimdempseyatthecove

associate( n => this%n ) is referencing a private member variable of type(covar_t) this.

See what happens when you make n public (although this is not what you want, it is just a test to isolate the error).

Though you'd expect such an error to not be sensitive to optimization level.

Jim Dempsey

www.quickthreadprogramming.com
Imagen de Steve Lionel (Intel)

The private makes no difference here. I can reproduce the problem and will investigate.

Steve
Imagen de Izaak Beekman

Thanks Steve. I'm sorry it took me so long to get together a simple reproducer code. Also, I'm glad I'm not (completely) crazy, and that this issue has been successfuly reproduced. If it gets escalated please let me know what the issue/problem report number is so I can follow its resolution.

-Zaak
Imagen de Steve Lionel (Intel)
Best Reply

Issue ID is DPD200247629. I can reproduce this on both IA-32 and Intel 64 with our latest in-house compiler. I will update this thread with any developments.

Steve
Imagen de Izaak Beekman

Just curious if the problems with optimization and associate have been fixed, or are planned to be fixed in a forthcoming release.

-Zaak
Imagen de Steve Lionel (Intel)

Sorry for missing this - it was fixed in Update 2.

Steve

Inicie sesión para dejar un comentario.