SSE Instruction Generates Floating Point Exception

SSE Instruction Generates Floating Point Exception

Hi: I've been trying to improve the robustness of our code. To that end, I enabled (maybe unmasked is the right word) floating point exceptions. However, when optimization is turned on, the code seems to generate spurious exceptions. Here is the simplified routine:

program example
  real :: a(3), b(3), c(3)
  integer :: d(3)
  a = (/ 1., 2., 3. /)
  b = (/ 0., 0., 0. /)
  c = (/ 1., 1., 1. /)
  call subr ( a, b, c, d )
end program example
subroutine subr ( a, b, c, d )
  real, intent(in) :: a(3), b(3), c(3)
  integer, intent(out) :: d(3)
  d = int( ( a - b ) / c )
end subroutine subr

If I compile this with "ifort -fpe0 example.f90" I get:

$ ./a.out
forrtl: error (65): floating invalid
... traceback ...

The nub seems to be that the compiler generates "divps" for the division in subr() but it has only loaded two values into the low order floats in the XMM registers; the high order floats are zero. This leads to division of 0 by 0 (i.e. NaN). But, the high order XMM floats aren't saved so the calculation ultimately produces the correct answer.

This happens both with version 13.1.3 20130607 on linux and 14.0.3.202 Build 20140422 on windows.

Thanks,
Allen

Allen
6 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

This should be a compiler error. If the code generated is going to partially fill a vector, then whatever operations follow should not generate a fault (QNaN is ok). FWIW the compiler should have pulled in 3 floats into one register though it may have determined your architecture was better suited to scalar operations.

Jim Dempsey

www.quickthreadprogramming.com

Escalated as issue DPD200357743. I note that the compiler figures out that it can compute the third element at compile-time and just moves the value, but it does the subtract and divide for the other two. I will let you know of any progress.

Steve

In the real code, of course, the compiler doesn't know what arguments it receives and generates a complete sequence of operations; although it still does two elements at once and then computes the third element separately. I can supply that exact code, too, if you need it. But, I'm with Jim, it seems like it should be doing all three at once (and maybe loading a 1. into the top word of the XMM denominator). If only the universe was four dimensional :-)

As ever, thanks for the help.
Allen

Allen

This problem has been fixed - I expect the fix to appear in Update 1 to the version 15.0 compiler. This update is scheduled for sometime in October.

Steve

That's great! Thanks!
 

Allen

Login to leave a comment.