1/-Infinity = NaN when using SSE

1/-Infinity = NaN when using SSE

mkatsev's picture

Code:



program test

integer i

real r(100),t

t = 0

do i=1,100

r(i) = 1/log(t)

enddo

print *,r(1)

endprogram



When I compile with options "/libs:static /threads /c" result is " 0.0000000E+00" (as expected)

When I add "/QaxK" result is NaN

If I modify the cycle so that to prevent vectorization (add "print *,'.'" for example) the result is back " 0.0000000E+00"

Compiler: Intel fortran 10.0 for win

Any ideas why this happens?

31 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.
Tim Prince's picture

It appears that you want the -Qprec-div option. When inversion is performed by the Newton's iteration scheme chosen by default for single precision SSE, the NaN is produced. prec-div is needed to produce IEEE results.

mkatsev's picture

Yeah, thanks

By the way, shouldn't the result be -0 but not +0?

Tim Prince's picture

That's an interesting question. By my reading of the textbooks on Fortran
http://www.oup.com/uk/catalogue/?ci=9780198526933
the only way Fortran is required to distinguish the sign of 0. is with the sign() intrinsic, so I try
print *,r(1), sign(1.,r(1)) < 0
and I see what appears to be an incorrect result. Do you want to file a bug report?

mkatsev's picture

Turned out that there is another option :)

/assume:minus0 - enables negative zero

However thanks for your help

Strange that all this options are off by default and almost impossible to find if not knowing what to look for

Tim Prince's picture

Yes, there is a tendency in commercial compilers to set optimizations which break the standard by default, regardless of how unlikely they are to be useful and how likely they are to cause problems. Some of them are held over from past processors which had a large performance penalty for following the standard (don't know if that is the case with minus0). It looks like the minus0 option may give better performance for random mixtures of positive and negative operands, as it generates branchless code.
In this case, my own opinion is that minus0 (thanks for pointing it out) should be included in -fp:precise. I guess I'll adopt -O3 -assume:protect_parens, minus0,buffered_io -Qprec-div -Qprec-sqrt as the most aggressive options I would use. prec-div and prec-sqrt don't give away nearly as much performance on the Penryn processors as on other Intel models.

Steve Lionel (Intel)'s picture

/assume:minus0 exists because making it the default would 1) break some existing programs, 2) be slower in many cases. Most applications don't care. The Fortran standard has changed over time with how it treats -0: in Fortran 2003, the definition of CSQRT changed in an incompatible way compared to Fortran 95 in this regard.

Steve
Tim Prince's picture

ifort is requiring /assume:minus0 to make sign() compatible with Fortran 95:
http://developers.sun.com/solaris/articles/sign.html
The IVF doc quotes the f95 standard, but the see also foot-link leads to the documentation that -minus0 is required to comply with f95. Unfortunately, minus0 currently prevents vectorization of sign intrinsic. That is expected to be fixed in a future release.
The f95 treatment is the only one available in gfortran.

brianlamm's picture

Tim wrote "I guess I'll adopt -O3 -assume:protect_parens, minus0,buffered_io
-Qprec-div -Qprec-sqrt as the most aggressive options I would use.
prec-div and prec-sqrt don't give away nearly as much performance on
the Penryn processors as on other Intel models."

It is vitally important to me to get IEEE 754 arithmetic, except for -0 support. After reading some of the documentation (which sorely needs an overhaul) using /fp:precise automatically observes what using /assume:protect_parens does, I believe. The IVF doc reads:

When this option [/fp:precise] is specified, the compiler applies the following
semantics:

  • Additions are performed in program order, taking into account any parentheses

  • Intermediate expressions use the precision specified in the source code

  • The constant addition may be pre-computed, assuming the default rounding mode

What I would contemplate using though is

/fp:strict /fp:except- /Qprec-sqrt /Qprec-div /Qprec /fpe:3

(fpe:3 is default, I just wanted to show it). A lot of other settings are not shown as I want to focus on getting IEEE arithmetic.

Now, here's where I'm going to show some (quite a bit?) of ignorance (but I appeal to the fact the documentation does not nearly keep up with the compiler capabilities): I contemplate using /fp:strict + /fp:except- so exceptions are disabled (so I won't get a run-time abend) so I can investigate IEEE flags and act accordingly. From what I can gather from the documentation is that combination also "allows modification of the floating point environment", whereas /fp:precise does not. And here's my ignorance: HOW does one modify "the floating point environment".

Now, on to the REAL can-o-worms question: Exactly what minimal set of options to the compiler will Guarantee IEEE 754 compliance, except for -0 support?

Please, all you IVF-use experts, please chime in on your assessment!

-Brian

Tim Prince's picture

I'll answer just part of this:
/fp:precise includes /assume:protect_parens /Qprec-div /Qprec-sqrt /Qftz- but not /assume:minus0. Some of what /fp:precise does is not required by IEEE or Fortran standards.

brianlamm's picture

Now, that's just what I need (as a beginning anyway)!

I had NO IDEA /fp:precise included all those other options. I cannot find that in the IVF doc! But I am glad to see /Qftz- is enabled to avoid flush to zero. I should have also added I wish to use /O3 optimization while fully complying with IEEE except for -0 support.

I think I need /Qprec to avoid "performing optimizations that change NaN comparison semantics and causes all
values to be truncated to declared precision before they are used in
comparisons."

Tim, I know this question is a hotbed, especially for people "on the inside", since it clearly calls into question whether IEEE 754 conformance can be had while maintaining the highest possible optimizations, so thank you very much for your reply.

Keep the assessments coming, please!

And, I would still like to know how/why I would want to change the "floating point environment". The IVF doc gives precious little information on the precise nature of it. One place in IVF doc reads "There are several methods available if you want to modify the default floating point environment. For example,
you can use inline assembly, compiler built-in functions, and library functions." But I Need All the examples IVF is capable of letting me change that environment. It's got to be spread all over the IVF doc, and I'd bet some is Undocumented, alas. Anyway, I just want to make sure I have that capability, and using /fp:strict + /fp:except- I believe allows ability to change floating point environment while still allowing exceptions to be raised but not cause application abend. Correct? Incorrect?

-Brian


brianlamm's picture

Thanks Tim. You seem to be the only insider at Intel with the pineapples necessary to, at least partially, answer my question about which (minimal) set of ifort compiler options are necessary to get full IEEE compliance, except -0 support.


So, it's apparent no one inside Intel is willing to put it out there whether or not it is even possible to get IEEE 754 conformance from IVF, leading to two possible obvious conclusions:


No one inside or outside of Intel even knows whether or not it is possible and are not willing to communicate such, or it is not possible. If no one inside Intel even knows, that's certainly a sad state of affairs (much like the IVF documentation).


-Brian

Steve Lionel (Intel)'s picture

Brian,

I don't think such comments are appropriate.

/fp:strict is the one option that would ensure, to the best of the compiler's ability, all IEEE FP semantics. /assume:minus0 is a Fortran language option not related to IEEE FP behavior. It controls how the SIGN intrinsic and formatted I/O behave in the presence of a minus zero - it does NOT control whether or not such a value can be produced.

Steve
brianlamm's picture
MADsblionel: Brian,

I don't think such comments are appropriate.


I tried deleting it but was not allowed since replies were made. Apologies to all of you.


And thanks much for the replies subsequent to my inappropriate post.


-Brian

g.f.thomas's picture

If anyone knows about IEEE fp compliance then surely it's Intel. It was Intel who commissioned Kahan to come up with the standard in the first instance. I believe DEC had an alternative but it wasn't widely adopted.


As Steve and tim18 pointed out /fp:precise ensures compliance although rounding mode has an impact and is useful in interval arithmetic. Isn't -0 in the same league as 0**0 and the like and is beyond the scope of the IEEE fp standard?


Gerry

brianlamm's picture

My fault I did not make it clear even though I was not intent on -0 support, I did want IEEE compliance. Reading my previous posts, I see how it could be interpreted I thought -0 support was in IEEE 754. -0 supportis in F2003.


As far as I can tell, the difference between [/fp:precise] and[/fp:strict /fp:except-] is in the latter contractions are disabled and access to modifying the floating point environment is enabled.What I cannot gather from doc is whether or not the latter enables full precision rounding of intermediate results, or whether both have the same effect as rounding to precision based on source.


Anyone know the answer to that?


-Brian

Steve Lionel (Intel)'s picture

Just to make it clear - the change to /QxW as the default will happen in a future new version, not a 10.1 update. The release notes suggest what to do if you want to continue using "generic" code.

While you're thinking of futures, also read the release notes for information about the new "compat" OpenMP libraries which will be the default in that same future version.

If performance is important to you, you should be using one of the options to get SSE instructions instead of x87.

Steve
brianlamm's picture

If I use /fp:precise I have no access to modify floating point environment, maybe not a "bad" thing.


I have read the IVF doc regarding rounding:if I use /fp:precise then rounding mode cannot be modified and is "default". How am I to determine what that default is? IEEE default rounding is to nearest, and to even in case of a tie. Is that default for /fp:precise? If not, then I think I must specify /fp:strict /fp:except- so I can use IEEE intrinsics to modify rounding to nearest. IVF documentation mentions nothing about rounding mode in case of a tie.


Performance is always a goal, but not at the cost of correctness. For most of my purposes, I need IEEE conformance to get the utmost in accuracy "first", and performance "second".

Steve Lionel (Intel)'s picture

The default for all values of /fp is IEEE default as you describe it, or what F2003 calls IEEE_NEAREST. This is a processor default, not specific to Fortran. If you want to be able to change the rounding mode, then indeed you'll want "strict".

Eventually we will support the F2003 IEEE intrinsic modules so you can get and set this information cleanly.

I assume that you have also performed a computational arithmetic analysis of your code so that you don't run into issues such as cancellation errors, etc.

Steve
brianlamm's picture

Thanks much Steve, that's good to know, and I look forward to IVF support of IEEE intrinsic support ...


I have done considerable floating point arithmetic analyses on some of my codes where it's particularly important, yet so many factors are involved it's difficult at times.


Do you know what the processor default rounding is in case of a tie? I cannot find that in IVF doc, and perhaps that is not the appropriate place to find the answer to that question.


-Brian

Tim Prince's picture

In case you are asking about rounding for half-way values:
All CPUs designed since IEEE-754 follow that standard, often called "banker's rounding," or "round to nearest even." Fortran over-rules the hardware rounding mode only for the intrinsic functions NINT(), ANINT(),... and ifort follows the Fortran standard. So, NINT(1.5) produces 2, and NINT(2.5) produces 3, while
(1.5 + 1/EPSILON(1.5)) - 1/EPSILON(1.5)
and
(2.5 + 1/EPSILON(2.5)) - 1/EPSILON(2.5)
would both produce 2.0 (in SSE code, under /assume:protect_parens).
Incidentally, the IEEE standards specify that round to nearest even must be the default, and the mode chosen when rounding mode control is not implemented.
There have been compilers which supported a fast_nint option which would speed up the NINT() and ANINT() intrinsics by following the IEEE rather than the Fortran standard (prior to f2003 ieee_arithmetic). This would not be as disruptive as the ifort option /Qrcd which changes INT and AINT intrinsics (explicit or implicit) to IEEE banker's rounding, making them practical equivalents of NINT and ANINT.

g.f.thomas's picture

Brian,


The following code snippetswhich depend on some IVF extensions might be useful to you:


***********code*********


subroutine Getfpucw (fpcw)


!Get the current fpu control word

use dflib


implicit none

character(len=48), intent(out) :: fpcw(8) !Array of exceptions, precision, and rounding mode

!Local
integer(2) :: control = 0 !Current cw


call GETCONTROLFPQQ(control)
call fpcontrl(control, fpcw)


return


end subroutine Getfpucw



subroutine fpcontrl(control, fpcw)


!Describe the contents of the floating point control word.
!Interpret each sub field of the word individually.


use dflib

implicit none


integer(2), intent(in) :: control!Two byte control word from GETCONTROLFPQQ
character(len=48), intent(out) :: fpcw(8)!Array of exceptions, precision, and rounding mode


!Local
integer(2) tcontrol!Two byte temporary control word


!Interpret the exception trap field. Retain only the exception bits
tcontrol = IAND(control, FPCW$MCW_EM)


!Test each exception and print only the ones that are set
if(tcontrol /= 0) then
if(iand(tcontrol, FPCW$INVALID) /= 0) then
fpcw(1) = 'Invalid operation trap disabled'
else
fpcw(1) = 'Invalid operation trap enabled'
endif
if(iand(tcontrol, FPCW$DENORMAL) /= 0) then
fpcw(2) = 'Denormalized operand trap disabled'
else
fpcw(2) = 'Denormalized operand trap enabled'
endif
if(iand(tcontrol, FPCW$ZERODIVIDE) /= 0) then
fpcw(3) = 'Zero divide trap disabled'
else
fpcw(3) = 'Zero divide trap enabled'
endif
if(iand(tcontrol, FPCW$OVERFLOW) /= 0) then
fpcw(4) = 'Overflow trap disabled'
else
fpcw(4) = 'Overflow trap enabled'
endif
if(iand(tcontrol, FPCW$UNDERFLOW) /= 0) then
fpcw(5) = 'Underflow trap disabled'
else
fpcw(5) = 'Underflow trap enabled'
endif
if(iand(tcontrol, FPCW$INEXACT) /= 0) then
fpcw(6) = 'Inexact (precision) trap disabled'
else
fpcw(6) = 'Inexact (precision) trap enabled'
endif
else
fpcw(1) = 'All exception traps enabled'
endif


!Interpret the precision control field. Retain only the precision bits
tcontrol = IAND(control, FPCW$MCW_PC)


!Test precision and print the one that is set
Select Case (tcontrol)
Case (FPCW$24)
fpcw(7) = '24 bit precision enabl
ed'
Case (FPCW$53)
fpcw(7) = '53 bit precision enabled'
Case (FPCW$64)
fpcw(7) = '64 bit precision enabled'
Case Default
fpcw(7) = 'Invalid precision control'
End Select


!Interpret the rounding control field. Retain only the rounding bits
tcontrol = IAND(control, FPCW$MCW_RC)


!Test rounding and print the one that is set
Select Case (tcontrol)
Case (FPCW$NEAR)
fpcw(8) = 'Round to nearest (or even) enabled'
Case (FPCW$DOWN)
fpcw(8) = 'Round down (to -INF) enabled'
Case (FPCW$UP)
fpcw(8) = 'Round up (to +INF) enabled'
Case (FPCW$CHOP)
fpcw(8) = 'Chop (truncate to 0) enabled'
Case Default
fpcw(8) = 'Invalid round control'
End Select


return


end subroutine fpcontrl



subroutine Setfpucw


!Set the fpu control word


use dflib


implicit none


!Local
integer(2) :: control = 0!Current cw
integer(4) :: FPE_STS = 0, FPE_MASK = 0!fpe masks


!An exception is dis/enabled if its flag is set/cleared to 1/0


call GETCONTROLFPQQ(control)
!Clear all cw flags
control = control .AND. #0000
!Set cw to VC++ default: all fpe traps disabled or masked
control = FPCW$NEAR+&
FPCW$53+&
FPCW$INVALID+&
FPCW$ZERODIVIDE+&
FPCW$OVERFLOW+&
FPCW$UNDERFLOW+&
FPCW$DENORMAL+&
FPCW$INEXACT
call SETCONTROLFPQQ(control)
!The foregoing is equivalent to /fpe:3,
!ie, 53-bit precision, round to nearest, disable or mask
!the denormal, underflow. inexact precision, overflow,
!division by zero, and invalid traps.

!Unmask fp overflow, division by zero, and invalid
!FPE_MASK = FPE_M_TRAP_OVF+&
! FPE_M_TRAP_DIV0+&
! FPE_M_TRAP_INV
!To unmask fpunderflow, include FPE_M_TRAP_UND or via the Runtime Error Checking 'Underflow' option
!FPE_STS = FOR_SET_FPE(FPE_MASK)
!The foregoing is equivalent to /fpe:0,
!ie, 53-bit precision, round to nearest, disable the denormal,
!underflow. and inexact precision exceptions and enable the
!overflow, division by zero, and invalid traps.


return


end subroutine Setfpucw



subroutine Getfpusw (fpsw)


!Get the current fpu status word


use dflib


implicit none


character(len=48), intent(o
ut) :: fpsw(6) !Array of low exception bits

!Local
integer(2) :: status = 0 !Current sw


call GETSTATUSFPQQ(status)
call fpstatus(status, fpsw)


return


end subroutine Getfpusw



subroutine fpstatus(status, fpsw)


!Describe the contents of the floating point status word.
!Interpret only the low six exception bits.


use dflib


implicit none


integer(2), intent(in) :: status!Two byte status word from GETSTATUSFPQQ
character(len=48), intent(out) :: fpsw(6) !Character array of exceptions currently set

!Local
integer(2) tstatus ! Two byte temporary status word


!Retain only exception bits
!
tstatus = iand(status,FPSW$MSW_EM)


!Test each exception and print only the one(s) that are set
!
if(tstatus == 0) then
fpsw(1) = 'No exceptions'
endif
if(tstatus /= 0) then
if(iand(tstatus, FPSW$INVALID) /= 0) then
fpsw(1) = 'Invalid operation exception'
endif
if(iand(tstatus, FPSW$DENORMAL) /= 0) then
fpsw(2) = 'Denormalized operand exception'
endif
if(iand(tstatus, FPSW$ZERODIVIDE) /= 0) then
fpsw(3) = 'Zero divide exception'
endif
if(iand(tstatus, FPSW$OVERFLOW) /= 0) then
fpsw(4) = 'Overflow exception'
endif
if(iand(tstatus, FPSW$UNDERFLOW) /= 0) then
fpsw(5) = 'Underflow exception'
endif
if(iand(tstatus, FPSW$INEXACT) /= 0) then
fpsw(6) = 'Inexact (precision) exception'
endif
endif


return


end subroutine fpstatus
*******************end*******************


BTW, re -0, the option is described as "Enable IEEE Minus Zero Support" in the Project/Properties/Floating Point page.


Gerry



brianlamm's picture

Tim, Thanks Much, and yes I was writing "tie" meaning when two closest neighbors are equidistant from target. I'm trying to wrap my little head around "nearest even" you wrote the IEEE specifies: does that mean nearest and then in case of tie even, in order of precedence? Or is some subtle interpretation lurking below that face-value interpretation of mine?


Gerry, Great Stuff! I will Certainly implement what you have graciously provided until IVF support for IEEE intrinsics becomes available.


-Brian

Steve Lionel (Intel)'s picture

Brian,

Take a look at Dave Eklund's "The Perils of Real Numbers" articles in the Newsletter thread at the top of this forum. It explains the rounding. But it is simple - round to the nearest representable value, and if the value is exactly halfway between two representable values, round to the one with a LSB of zero (round to even).

Steve
brianlamm's picture

Thanks Steve. I know what round to nearest is, and I know what round to even is, I just wondered if there was something I was missing with the term round to "nearest even", which seems to me to imply try nearest first, even for ties, just like it reads, and I just wanted to make sure I wasn't missing something that might be more subtly implied by "nearest even". I'll just take away "nearest even" means nearest first, even for tie.


-Brian

Steve Lionel (Intel)'s picture

Perhaps I'm not understanding you, but in the case of a tie, there are two "nearest". When this happens, you pick the one with a zero LSB. Sometimes this will be up, sometimes down.

Steve
brianlamm's picture

Steve, Tim wrote:

tim18: ...
Incidentally, the IEEE standards specify that round to nearest even must be the default, and the mode chosen when rounding mode control is not implemented.



"nearest even" was to what I was referring in Tim's post regarding what IEEE default behavior is. That's what started me thinking I might be missing something subtle. I believe Tim meant either IEEE literally reads "nearest even", or it was shorthand for round to nearest and in case of a tie round to even. That's what I'm going to take away from Tim's post.

-Brian
brianlamm's picture
MAD cprince: /fp: in ifort doesn't have the options on intermediate precision which are supported by icc. So, there isn't full support for anything but source precision.



Unelss I'm reading the IVF doc incorrectly, there is, currently anyway, an option "on" intermediate precision which is not source precision: it is /fltconsistency or /Op and the seventh bullet in the IVF doc description reads:

Whenever an expression is spilled, it is spilled as 80 bits (extended
precision), not 64 bits (DOUBLE PRECISION). When assignments to type REAL and
DOUBLE PRECISION are made, the precision is rounded from 80 bits down to 32 bits
(REAL) or 64 bits (DOUBLE PRECISION). When you do not specify /Op, the extra bits of precision are not always rounded away
before the variable is reused.

I believe, however, Intel has deprecated this option, which I assume means it will be deleted in the (near?) future. If so, any idea of when it will be deleted.

Am I interpreting IVF doc correctly? That is, none of the /fp:(precise, fast (-1,-2), strict, source) options will enable 80-bit intermediate precision?

-Brian
Steve Lionel (Intel)'s picture

There is no actual support for 80-bit precision. Sometimes, when you're using x87 code, 80-bit intermediate values get created. You can't control this directly, and if you use /QxW or "higher", x87 code tends not to be used at all.

/fp:source is intended to always use declared precision. You are right that /fltconsistency, /Op, etc. are deprecated and you should not use them, as /fp will produce better and more consistent code. If you're not using /QxW (at least for now), you'll get x87 code and /fp:fast will still allow 80-bit intermediates.

Steve
Tim Prince's picture

At /Od, the 32-bit ifort uses x87 code. /Op brings about the use of some x87 code (not in vectorized code). In order to use all the bits of the x87 format, you would have to over-ride the default setting of 53-bit precision.

The compiler doc seems inconsistent in implying that -pc80 sets 64-bit precision, and further identifying that as a default, even for the Intel64 compiler which has no x87 code option.

In the 64-bit ifort, /Op uses SSE2 for extra precision intermediates, so there is never promotion beyond double precision.

Certain compilers, such as gfortran, do support a real*10 data type. Evidently, there is no vectorization of real*10. It relies on the glibc run-time support for C long double, which is not present on Windows.

Tim Prince's picture
I tested the paranoia program spara.f from http://www.netlib.org/paranoia/ (with my own bug fixes). With 32-bit ifort Windows, it diagnoses 29 bits extra precision (double precision intermediates) with ifort -assume:protect_parens, and 40 bits (x87 64-bit precision intermediates), when -Qpc80 is added to the options. So, it appears the pc80 option does set 64-bit precision mode.
-QxB, or any option which enables vectorization, removes extra precision, but adding -Op restores it (but only for non-vectorized loops).
-QxB or one of the recommended SSE options appears to be needed for accurate formatted read/write with ifort 10.1.021.
Paranoia has to be built with inter-procedural optimization disabled, as well as other precautions, including protect_parens.

Login to leave a comment.