tracking NaN problem

tracking NaN problem

Dear All,

I know that to track NaN during runtime, there exist convinient compiler setting:   -check all -traceback -fpe0

my traceback is:

kvec               0000000000899F43  Unknown               Unknown  Unknown
kvec               0000000000837DD7  Unknown               Unknown  Unknown
kvec               00000000008307B0  Unknown               Unknown  Unknown
kvec               00000000007EF235  Unknown               Unknown  Unknown
kvec               00000000007E7A11  Unknown               Unknown  Unknown
kvec               000000000061635E  eigen_mp_ev3_              90  eig.F90
kvec               000000000052EF76  mps_func_mp_mps_r        1813  mp2.F90
kvec               000000000064C5C7  propagate_                893  kvec.F90
kvec               00000000006419E2  MAIN__                    593  kvec.F90
kvec               000000000040B50C  Unknown               Unknown  Unknown
libc.so.6          00000038B722135D  Unknown               Unknown  Unknown
kvec               000000000040B409  Unknown               Unknown  Unknown

the eig.F90 contains

just call zheevr(jobz, range, uplo, n1, a, lda, vl, vu, il, iu, abstol, m1, DD, U, ldz, isuppz, work, lwork, rwork, lrwork, iwork, liwork, info)

So how it is possible that it catches NaN? If I remove the -fpe0, then the zheevr completes and returns with info=0

the matrix a is just :

 (0.499999999999999,0.000000000000000E+000)
 (0.707106781186547,0.000000000000000E+000)
 (0.707106781186547,0.000000000000000E+000)
 (1.00000000000000,0.000000000000000E+000)
 (0.499999999999999,0.000000000000000E+000)
 (0.707106781186547,0.000000000000000E+000)
 (0.000000000000000E+000,0.000000000000000E+000)
 (0.707106781186547,0.000000000000000E+000)
 (1.00000000000000,0.000000000000000E+000)
 (0.000000000000000E+000,0.000000000000000E+000)
 (0.000000000000000E+000,0.000000000000000E+000)
 (0.000000000000000E+000,0.000000000000000E+000)
 (0.000000000000000E+000,0.000000000000000E+000)
 (0.499999999999999,0.000000000000000E+000)
 (0.707106781186547,0.000000000000000E+000)
 (0.707106781186547,0.000000000000000E+000)
 (0.999999999999999,0.000000000000000E+000)
 (1.00000000000000,0.000000000000000E+000)
 (0.888888888888890,0.000000000000000E+000)
 (0.314269680527355,0.000000000000000E+000)
 (0.314269680527355,0.000000000000000E+000)
 (0.444444444444444,0.000000000000000E+000)
 (0.666666666666667,0.000000000000000E+000)
 (0.471404520791032,0.000000000000000E+000)
 (0.000000000000000E+000,0.000000000000000E+000)
 (0.471404520791032,0.000000000000000E+000)
 (1.00000000000000,0.000000000000000E+000)
 (0.000000000000000E+000,0.000000000000000E+000)
 (0.000000000000000E+000,0.000000000000000E+000)
 (0.000000000000000E+000,0.000000000000000E+000)
 (0.000000000000000E+000,0.000000000000000E+000)
 (0.111111111111111,0.000000000000000E+000)
 (0.157134840263677,0.000000000000000E+000)
 (0.157134840263677,0.000000000000000E+000)
 (0.722222222222222,0.000000000000000E+000)
 (0.166666666666666,0.000000000000000E+000)
 (0.499999999999999,0.000000000000000E+000)
 (0.500000000000000,0.000000000000000E+000)
 (-0.707106781186547,0.000000000000000E+000)
 (0.000000000000000E+000,0.000000000000000E+000)
 (-0.707106781186547,0.000000000000000E+000)
 (0.999999999999999,0.000000000000000E+000)
 (-6.661338147750939E-016,0.000000000000000E+000)
 (5.551115123125783E-017,0.000000000000000E+000)
 (-6.106226635438361E-016,0.000000000000000E+000)
 (0.999999999999999,0.000000000000000E+000)

publicaciones de 9 / 0 nuevos
Último envío
Para obtener más información sobre las optimizaciones del compilador, consulte el aviso sobre la optimización.

-fpe0 turns on Floating-point invalid, divide-by-zero, and overflow exceptions. Underflows are flushed to 0. Without that option, the default is to disable exceptions and floating-point underflow is gradual. This is why you are aborting on the NAN with -fpe0 but the program runs to completion with out it.

Dear Annalee. I understand what fpe0 does, but I do not understand Why does the zheevr (I am using Intels MKL) trigger a NaN catch?
The program is completely deterministic, output does not contain any NaN, stat=0, but with I use fpe0 some NaN are catched and seem to originate from MKL routine which
- does not contain NaN on input
- does not contain NaN on output
- terminates with stat=0

The NAN may occur within the zheevr calculations but not cause the final result to NAN. Alternatively, flush to zero may result in a NAN that does not otherwise occur. If your question is specific to MKL, I would suggest posting on the MKL forum as well.

Regards,
Annalee

Ok, thanks. I will do that.

I have recompiled file containing the call to zheevr withouf -fpe0 flag and linked it this way to my program, but this did not really help: It seem that NaN trapping is unable to locate a particular line which throws NaN (in my case call zheevr) , but it is able to locate the envelopping routine containing zheevr. So the trigger is 99.9% stil zheevr but not a real source of a problem. I assume this is because the monitoring for NaN is done by observing some processor flags, which are always triggered.

Is it possible to change default behaviour of the NaN catching function - make it print a warning but not stop the program?

There is no way to do that, but you can get more information about where the NAN occurs by compiling with -g as well as -traceback. I would also suggest running it within a debugger.

Is it possible to change default behaviour of the NaN catching function - make it print a warning but not stop the program
Although one may agree with that wish in principle, there are reasons why it is not practical to implement such a change.

For example, what if the number of NaNs caught during a single execution runs in the millions? Does the user want the NaN error reports mixed into the program output? What if the standard output has been redirected to a file?

Ok, I see that this could be a problem. Thanks for comments. Then I guess the simplest option would be to watch variables in the debugger.

Deje un comentario

Por favor inicie sesión para agregar un comentario. ¿No es socio? Únase ya