1/-Infinity = NaN when using SSE

1/-Infinity = NaN when using SSE


program test

integer i

real r(100),t

t = 0

do i=1,100

r(i) = 1/log(t)


print *,r(1)


When I compile with options "/libs:static /threads /c" result is " 0.0000000E+00" (as expected)

When I add "/QaxK" result is NaN

If I modify the cycle so that to prevent vectorization (add "print *,'.'" for example) the result is back " 0.0000000E+00"

Compiler: Intel fortran 10.0 for win

Any ideas why this happens?

31 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

It appears that you want the -Qprec-div option. When inversion is performed by the Newton's iteration scheme chosen by default for single precision SSE, the NaN is produced. prec-div is needed to produce IEEE results.

Yeah, thanks

By the way, shouldn't the result be -0 but not +0?

That's an interesting question. By my reading of the textbooks on Fortran
the only way Fortran is required to distinguish the sign of 0. is with the sign() intrinsic, so I try
print *,r(1), sign(1.,r(1)) < 0
and I see what appears to be an incorrect result. Do you want to file a bug report?

Turned out that there is another option :)

/assume:minus0 - enables negative zero

However thanks for your help

Strange that all this options are off by default and almost impossible to find if not knowing what to look for

Yes, there is a tendency in commercial compilers to set optimizations which break the standard by default, regardless of how unlikely they are to be useful and how likely they are to cause problems. Some of them are held over from past processors which had a large performance penalty for following the standard (don't know if that is the case with minus0). It looks like the minus0 option may give better performance for random mixtures of positive and negative operands, as it generates branchless code.
In this case, my own opinion is that minus0 (thanks for pointing it out) should be included in -fp:precise. I guess I'll adopt -O3 -assume:protect_parens, minus0,buffered_io -Qprec-div -Qprec-sqrt as the most aggressive options I would use. prec-div and prec-sqrt don't give away nearly as much performance on the Penryn processors as on other Intel models.

/assume:minus0 exists because making it the default would 1) break some existing programs, 2) be slower in many cases. Most applications don't care. The Fortran standard has changed over time with how it treats -0: in Fortran 2003, the definition of CSQRT changed in an incompatible way compared to Fortran 95 in this regard.

Retired 12/31/2016

ifort is requiring /assume:minus0 to make sign() compatible with Fortran 95:
The IVF doc quotes the f95 standard, but the see also foot-link leads to the documentation that -minus0 is required to comply with f95. Unfortunately, minus0 currently prevents vectorization of sign intrinsic. That is expected to be fixed in a future release.
The f95 treatment is the only one available in gfortran.

Tim wrote "I guess I'll adopt -O3 -assume:protect_parens, minus0,buffered_io
-Qprec-div -Qprec-sqrt as the most aggressive options I would use.
prec-div and prec-sqrt don't give away nearly as much performance on
the Penryn processors as on other Intel models."

It is vitally important to me to get IEEE 754 arithmetic, except for -0 support. After reading some of the documentation (which sorely needs an overhaul) using /fp:precise automatically observes what using /assume:protect_parens does, I believe. The IVF doc reads:

When this option [/fp:precise] is specified, the compiler applies the following

  • Additions are performed in program order, taking into account any parentheses

  • Intermediate expressions use the precision specified in the source code

  • The constant addition may be pre-computed, assuming the default rounding mode

What I would contemplate using though is

/fp:strict /fp:except- /Qprec-sqrt /Qprec-div /Qprec /fpe:3

(fpe:3 is default, I just wanted to show it). A lot of other settings are not shown as I want to focus on getting IEEE arithmetic.

Now, here's where I'm going to show some (quite a bit?) of ignorance (but I appeal to the fact the documentation does not nearly keep up with the compiler capabilities): I contemplate using /fp:strict + /fp:except- so exceptions are disabled (so I won't get a run-time abend) so I can investigate IEEE flags and act accordingly. From what I can gather from the documentation is that combination also "allows modification of the floating point environment", whereas /fp:precise does not. And here's my ignorance: HOW does one modify "the floating point environment".

Now, on to the REAL can-o-worms question: Exactly what minimal set of options to the compiler will Guarantee IEEE 754 compliance, except for -0 support?

Please, all you IVF-use experts, please chime in on your assessment!


I'll answer just part of this:
/fp:precise includes /assume:protect_parens /Qprec-div /Qprec-sqrt /Qftz- but not /assume:minus0. Some of what /fp:precise does is not required by IEEE or Fortran standards.

Now, that's just what I need (as a beginning anyway)!

I had NO IDEA /fp:precise included all those other options. I cannot find that in the IVF doc! But I am glad to see /Qftz- is enabled to avoid flush to zero. I should have also added I wish to use /O3 optimization while fully complying with IEEE except for -0 support.

I think I need /Qprec to avoid "performing optimizations that change NaN comparison semantics and causes all
values to be truncated to declared precision before they are used in

Tim, I know this question is a hotbed, especially for people "on the inside", since it clearly calls into question whether IEEE 754 conformance can be had while maintaining the highest possible optimizations, so thank you very much for your reply.

Keep the assessments coming, please!

And, I would still like to know how/why I would want to change the "floating point environment". The IVF doc gives precious little information on the precise nature of it. One place in IVF doc reads "There are several methods available if you want to modify the default floating point environment. For example,
you can use inline assembly, compiler built-in functions, and library functions." But I Need All the examples IVF is capable of letting me change that environment. It's got to be spread all over the IVF doc, and I'd bet some is Undocumented, alas. Anyway, I just want to make sure I have that capability, and using /fp:strict + /fp:except- I believe allows ability to change floating point environment while still allowing exceptions to be raised but not cause application abend. Correct? Incorrect?


Thanks Tim. You seem to be the only insider at Intel with the pineapples necessary to, at least partially, answer my question about which (minimal) set of ifort compiler options are necessary to get full IEEE compliance, except -0 support.

So, it's apparent no one inside Intel is willing to put it out there whether or not it is even possible to get IEEE 754 conformance from IVF, leading to two possible obvious conclusions:

No one inside or outside of Intel even knows whether or not it is possible and are not willing to communicate such, or it is not possible. If no one inside Intel even knows, that's certainly a sad state of affairs (much like the IVF documentation).



I don't think such comments are appropriate.

/fp:strict is the one option that would ensure, to the best of the compiler's ability, all IEEE FP semantics. /assume:minus0 is a Fortran language option not related to IEEE FP behavior. It controls how the SIGN intrinsic and formatted I/O behave in the presence of a minus zero - it does NOT control whether or not such a value can be produced.

Retired 12/31/2016


I don't think such comments are appropriate.

I tried deleting it but was not allowed since replies were made. Apologies to all of you.

And thanks much for the replies subsequent to my inappropriate post.


If anyone knows about IEEE fp compliance then surely it's Intel. It was Intel who commissioned Kahan to come up with the standard in the first instance. I believe DEC had an alternative but it wasn't widely adopted.

As Steve and tim18 pointed out /fp:precise ensures compliance although rounding mode has an impact and is useful in interval arithmetic. Isn't -0 in the same league as 0**0 and the like and is beyond the scope of the IEEE fp standard?


My fault I did not make it clear even though I was not intent on -0 support, I did want IEEE compliance. Reading my previous posts, I see how it could be interpreted I thought -0 support was in IEEE 754. -0 supportis in F2003.

As far as I can tell, the difference between [/fp:precise] and[/fp:strict /fp:except-] is in the latter contractions are disabled and access to modifying the floating point environment is enabled.What I cannot gather from doc is whether or not the latter enables full precision rounding of intermediate results, or whether both have the same effect as rounding to precision based on source.

Anyone know the answer to that?


Just to make it clear - the change to /QxW as the default will happen in a future new version, not a 10.1 update. The release notes suggest what to do if you want to continue using "generic" code.

While you're thinking of futures, also read the release notes for information about the new "compat" OpenMP libraries which will be the default in that same future version.

If performance is important to you, you should be using one of the options to get SSE instructions instead of x87.

Retired 12/31/2016

If I use /fp:precise I have no access to modify floating point environment, maybe not a "bad" thing.

I have read the IVF doc regarding rounding:if I use /fp:precise then rounding mode cannot be modified and is "default". How am I to determine what that default is? IEEE default rounding is to nearest, and to even in case of a tie. Is that default for /fp:precise? If not, then I think I must specify /fp:strict /fp:except- so I can use IEEE intrinsics to modify rounding to nearest. IVF documentation mentions nothing about rounding mode in case of a tie.

Performance is always a goal, but not at the cost of correctness. For most of my purposes, I need IEEE conformance to get the utmost in accuracy "first", and performance "second".

The default for all values of /fp is IEEE default as you describe it, or what F2003 calls IEEE_NEAREST. This is a processor default, not specific to Fortran. If you want to be able to change the rounding mode, then indeed you'll want "strict".

Eventually we will support the F2003 IEEE intrinsic modules so you can get and set this information cleanly.

I assume that you have also performed a computational arithmetic analysis of your code so that you don't run into issues such as cancellation errors, etc.

Retired 12/31/2016

Thanks much Steve, that's good to know, and I look forward to IVF support of IEEE intrinsic support ...

I have done considerable floating point arithmetic analyses on some of my codes where it's particularly important, yet so many factors are involved it's difficult at times.

Do you know what the processor default rounding is in case of a tie? I cannot find that in IVF doc, and perhaps that is not the appropriate place to find the answer to that question.


In case you are asking about rounding for half-way values:
All CPUs designed since IEEE-754 follow that standard, often called "banker's rounding," or "round to nearest even." Fortran over-rules the hardware rounding mode only for the intrinsic functions NINT(), ANINT(),... and ifort follows the Fortran standard. So, NINT(1.5) produces 2, and NINT(2.5) produces 3, while
(1.5 + 1/EPSILON(1.5)) - 1/EPSILON(1.5)
(2.5 + 1/EPSILON(2.5)) - 1/EPSILON(2.5)
would both produce 2.0 (in SSE code, under /assume:protect_parens).
Incidentally, the IEEE standards specify that round to nearest even must be the default, and the mode chosen when rounding mode control is not implemented.
There have been compilers which supported a fast_nint option which would speed up the NINT() and ANINT() intrinsics by following the IEEE rather than the Fortran standard (prior to f2003 ieee_arithmetic). This would not be as disruptive as the ifort option /Qrcd which changes INT and AINT intrinsics (explicit or implicit) to IEEE banker's rounding, making them practical equivalents of NINT and ANINT.


The following code snippetswhich depend on some IVF extensions might be useful to you:


subroutine Getfpucw (fpcw)

!Get the current fpu control word

use dflib

implicit none

character(len=48), intent(out) :: fpcw(8) !Array of exceptions, precision, and rounding mode

integer(2) :: control = 0 !Current cw

call GETCONTROLFPQQ(control)
call fpcontrl(control, fpcw)


end subroutine Getfpucw

subroutine fpcontrl(control, fpcw)

!Describe the contents of the floating point control word.
!Interpret each sub field of the word individually.

use dflib

implicit none

integer(2), intent(in) :: control!Two byte control word from GETCONTROLFPQQ
character(len=48), intent(out) :: fpcw(8)!Array of exceptions, precision, and rounding mode

integer(2) tcontrol!Two byte temporary control word

!Interpret the exception trap field. Retain only the exception bits
tcontrol = IAND(control, FPCW$MCW_EM)

!Test each exception and print only the ones that are set
if(tcontrol /= 0) then
if(iand(tcontrol, FPCW$INVALID) /= 0) then
fpcw(1) = 'Invalid operation trap disabled'
fpcw(1) = 'Invalid operation trap enabled'
if(iand(tcontrol, FPCW$DENORMAL) /= 0) then
fpcw(2) = 'Denormalized operand trap disabled'
fpcw(2) = 'Denormalized operand trap enabled'
if(iand(tcontrol, FPCW$ZERODIVIDE) /= 0) then
fpcw(3) = 'Zero divide trap disabled'
fpcw(3) = 'Zero divide trap enabled'
if(iand(tcontrol, FPCW$OVERFLOW) /= 0) then
fpcw(4) = 'Overflow trap disabled'
fpcw(4) = 'Overflow trap enabled'
if(iand(tcontrol, FPCW$UNDERFLOW) /= 0) then
fpcw(5) = 'Underflow trap disabled'
fpcw(5) = 'Underflow trap enabled'
if(iand(tcontrol, FPCW$INEXACT) /= 0) then
fpcw(6) = 'Inexact (precision) trap disabled'
fpcw(6) = 'Inexact (precision) trap enabled'
fpcw(1) = 'All exception traps enabled'

!Interpret the precision control field. Retain only the precision bits
tcontrol = IAND(control, FPCW$MCW_PC)

!Test precision and print the one that is set
Select Case (tcontrol)
Case (FPCW$24)
fpcw(7) = '24 bit precision enabl
Case (FPCW$53)
fpcw(7) = '53 bit precision enabled'
Case (FPCW$64)
fpcw(7) = '64 bit precision enabled'
Case Default
fpcw(7) = 'Invalid precision control'
End Select

!Interpret the rounding control field. Retain only the rounding bits
tcontrol = IAND(control, FPCW$MCW_RC)

!Test rounding and print the one that is set
Select Case (tcontrol)
fpcw(8) = 'Round to nearest (or even) enabled'
fpcw(8) = 'Round down (to -INF) enabled'
Case (FPCW$UP)
fpcw(8) = 'Round up (to +INF) enabled'
fpcw(8) = 'Chop (truncate to 0) enabled'
Case Default
fpcw(8) = 'Invalid round control'
End Select


end subroutine fpcontrl

subroutine Setfpucw

!Set the fpu control word

use dflib

implicit none

integer(2) :: control = 0!Current cw
integer(4) :: FPE_STS = 0, FPE_MASK = 0!fpe masks

!An exception is dis/enabled if its flag is set/cleared to 1/0

call GETCONTROLFPQQ(control)
!Clear all cw flags
control = control .AND. #0000
!Set cw to VC++ default: all fpe traps disabled or masked
control = FPCW$NEAR+&
call SETCONTROLFPQQ(control)
!The foregoing is equivalent to /fpe:3,
!ie, 53-bit precision, round to nearest, disable or mask
!the denormal, underflow. inexact precision, overflow,
!division by zero, and invalid traps.

!Unmask fp overflow, division by zero, and invalid
!To unmask fpunderflow, include FPE_M_TRAP_UND or via the Runtime Error Checking 'Underflow' option
!The foregoing is equivalent to /fpe:0,
!ie, 53-bit precision, round to nearest, disable the denormal,
!underflow. and inexact precision exceptions and enable the
!overflow, division by zero, and invalid traps.


end subroutine Setfpucw

subroutine Getfpusw (fpsw)

!Get the current fpu status word

use dflib

implicit none

character(len=48), intent(o
ut) :: fpsw(6) !Array of low exception bits

integer(2) :: status = 0 !Current sw

call GETSTATUSFPQQ(status)
call fpstatus(status, fpsw)


end subroutine Getfpusw

subroutine fpstatus(status, fpsw)

!Describe the contents of the floating point status word.
!Interpret only the low six exception bits.

use dflib

implicit none

integer(2), intent(in) :: status!Two byte status word from GETSTATUSFPQQ
character(len=48), intent(out) :: fpsw(6) !Character array of exceptions currently set

integer(2) tstatus ! Two byte temporary status word

!Retain only exception bits
tstatus = iand(status,FPSW$MSW_EM)

!Test each exception and print only the one(s) that are set
if(tstatus == 0) then
fpsw(1) = 'No exceptions'
if(tstatus /= 0) then
if(iand(tstatus, FPSW$INVALID) /= 0) then
fpsw(1) = 'Invalid operation exception'
if(iand(tstatus, FPSW$DENORMAL) /= 0) then
fpsw(2) = 'Denormalized operand exception'
if(iand(tstatus, FPSW$ZERODIVIDE) /= 0) then
fpsw(3) = 'Zero divide exception'
if(iand(tstatus, FPSW$OVERFLOW) /= 0) then
fpsw(4) = 'Overflow exception'
if(iand(tstatus, FPSW$UNDERFLOW) /= 0) then
fpsw(5) = 'Underflow exception'
if(iand(tstatus, FPSW$INEXACT) /= 0) then
fpsw(6) = 'Inexact (precision) exception'


end subroutine fpstatus

BTW, re -0, the option is described as "Enable IEEE Minus Zero Support" in the Project/Properties/Floating Point page.


Tim, Thanks Much, and yes I was writing "tie" meaning when two closest neighbors are equidistant from target. I'm trying to wrap my little head around "nearest even" you wrote the IEEE specifies: does that mean nearest and then in case of tie even, in order of precedence? Or is some subtle interpretation lurking below that face-value interpretation of mine?

Gerry, Great Stuff! I will Certainly implement what you have graciously provided until IVF support for IEEE intrinsics becomes available.



Take a look at Dave Eklund's "The Perils of Real Numbers" articles in the Newsletter thread at the top of this forum. It explains the rounding. But it is simple - round to the nearest representable value, and if the value is exactly halfway between two representable values, round to the one with a LSB of zero (round to even).

Retired 12/31/2016

Thanks Steve. I know what round to nearest is, and I know what round to even is, I just wondered if there was something I was missing with the term round to "nearest even", which seems to me to imply try nearest first, even for ties, just like it reads, and I just wanted to make sure I wasn't missing something that might be more subtly implied by "nearest even". I'll just take away "nearest even" means nearest first, even for tie.


Perhaps I'm not understanding you, but in the case of a tie, there are two "nearest". When this happens, you pick the one with a zero LSB. Sometimes this will be up, sometimes down.

Retired 12/31/2016

Steve, Tim wrote:

Incidentally, the IEEE standards specify that round to nearest even must be the default, and the mode chosen when rounding mode control is not implemented.

"nearest even" was to what I was referring in Tim's post regarding what IEEE default behavior is. That's what started me thinking I might be missing something subtle. I believe Tim meant either IEEE literally reads "nearest even", or it was shorthand for round to nearest and in case of a tie round to even. That's what I'm going to take away from Tim's post.


MAD cprince:/fp: in ifort doesn't have the options on intermediate precision which are supported by icc. So, there isn't full support for anything but source precision.

Unelss I'm reading the IVF doc incorrectly, there is, currently anyway, an option "on" intermediate precision which is not source precision: it is /fltconsistency or /Op and the seventh bullet in the IVF doc description reads:

Whenever an expression is spilled, it is spilled as 80 bits (extended
precision), not 64 bits (DOUBLE PRECISION). When assignments to type REAL and
DOUBLE PRECISION are made, the precision is rounded from 80 bits down to 32 bits
(REAL) or 64 bits (DOUBLE PRECISION). When you do not specify /Op, the extra bits of precision are not always rounded away
before the variable is reused.

I believe, however, Intel has deprecated this option, which I assume means it will be deleted in the (near?) future. If so, any idea of when it will be deleted.

Am I interpreting IVF doc correctly? That is, none of the /fp:(precise, fast (-1,-2), strict, source) options will enable 80-bit intermediate precision?


There is no actual support for 80-bit precision. Sometimes, when you're using x87 code, 80-bit intermediate values get created. You can't control this directly, and if you use /QxW or "higher", x87 code tends not to be used at all.

/fp:source is intended to always use declared precision. You are right that /fltconsistency, /Op, etc. are deprecated and you should not use them, as /fp will produce better and more consistent code. If you're not using /QxW (at least for now), you'll get x87 code and /fp:fast will still allow 80-bit intermediates.

Retired 12/31/2016

At /Od, the 32-bit ifort uses x87 code. /Op brings about the use of some x87 code (not in vectorized code). In order to use all the bits of the x87 format, you would have to over-ride the default setting of 53-bit precision.

The compiler doc seems inconsistent in implying that -pc80 sets 64-bit precision, and further identifying that as a default, even for the Intel64 compiler which has no x87 code option.

In the 64-bit ifort, /Op uses SSE2 for extra precision intermediates, so there is never promotion beyond double precision.

Certain compilers, such as gfortran, do support a real*10 data type. Evidently, there is no vectorization of real*10. It relies on the glibc run-time support for C long double, which is not present on Windows.

I tested the paranoia program spara.f from http://www.netlib.org/paranoia/ (with my own bug fixes). With 32-bit ifort Windows, it diagnoses 29 bits extra precision (double precision intermediates) with ifort -assume:protect_parens, and 40 bits (x87 64-bit precision intermediates), when -Qpc80 is added to the options. So, it appears the pc80 option does set 64-bit precision mode.
-QxB, or any option which enables vectorization, removes extra precision, but adding -Op restores it (but only for non-vectorized loops).
-QxB or one of the recommended SSE options appears to be needed for accurate formatted read/write with ifort 10.1.021.
Paranoia has to be built with inter-procedural optimization disabled, as well as other precautions, including protect_parens.

Leave a Comment

Please sign in to add a comment. Not a member? Join today