/fp:strict vs /fp:precise or source

/fp:strict vs /fp:precise or source

I have a set of code that calculates the volume of a polyhedron based on vertices of the space.
With fp:precise or source in V11 compiler I get incorrect values. I believe these were correct in earlier versions of the compiler.
I still get correct results with /fp:strict.

I believe when I went to the 11 compiler I just changed "compilers" -- no change in settings.

Should I report this one?

Linda

Linda
26 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

If it's not due to a documented change, such as the change from default x87 code to default SSE2, or the changed meanings of /QxB and /QxK.

Quoting - tim18
If it's not due to a documented change, such as the change from default x87 code to default SSE2, or the changed meanings of /QxB and /QxK.

I did not intentionally have any of the q flags on though it looks like "speculation" was "fast". How would I figure out the cause?

Default settings appeared to be okay.

Linda

Linda

My first guess, as was Tim's, would be the change in default for code generation to assume Pentium 4 and SSE2 instructions instead of x87. This will often change floating point results as x87 could give you "extra" precision.

Try setting, under Code Generation, Enhanced Instruction set, the option for "no enhanced instruction set (/arch:ia32) and see if that changes things.

Steve - Intel Developer Support

Quoting - Steve Lionel (Intel)
My first guess, as was Tim's, would be the change in default for code generation to assume Pentium 4 and SSE2 instructions instead of x87. This will often change floating point results as x87 could give you "extra" precision.

Try setting, under Code Generation, Enhanced Instruction set, the option for "no enhanced instruction set (/arch:ia32) and see if that changes things.

that setting and /fp:precise worked okay. i.e. gave correct answers.

previously the "code generation" was "not set"

Linda

Linda

Ok - that means that your program depends on computations being carried out in greater than declared precision - not a good thing for long-term reliability, especially as this extra precision is not predictable.

Steve - Intel Developer Support

Quoting - Steve Lionel (Intel)
Ok - that means that your program depends on computations being carried out in greater than declared precision - not a good thing for long-term reliability, especially as this extra precision is not predictable.

How can that be? do you want to see this source (and test file). It's only 4 files and small files at that.

Linda

Linda

Yes, I'd like to see the program. You can create a ZIP and attach it to a reply here (see below for attach instructions.)

How can it be? If your program is very sensitive to last-bit differences in FP results and a single-precision computation was computed in double-precision using X87 code, it could get different results than one using SSE instructions and computing in declared precision.

For more than a decade I've seen complaints from customers about inconsistent FP expression results and/or differences from other vendors that did not have the peculiarities of the X87 FP model. The change in the default to use SSE means not only faster programs but more consistent results.

Steve - Intel Developer Support

And so /fp:strict somehow goes back to the x87 way? I'm certainly confused.

My "real" source is now declared all double precision -- real(8) but the same thing occurs.

Linda

Linda

Thanks, Linda, I'll take a look.

Steve - Intel Developer Support

Quoting - Steve Lionel (Intel)
Thanks, Linda, I'll take a look.

Did you ever get anywhere with this? The only way I can make it work is by turning off the extensions (using Iarch32 or whatever that switch is).

(I did find a couple of precision problems in the module but fixing them didn't help)

Linda

Linda

I reproduced the problem easily enough but had to deal with other issues and could not dive into it deeper. Perhaps I'll get a chance to do so next week.

Steve - Intel Developer Support

Quoting - Steve Lionel (Intel)
I reproduced the problem easily enough but had to deal with other issues and could not dive into it deeper. Perhaps I'll get a chance to do so next week.

I'll attach the latest version of my module -- where i weeded out (I think) any single precision problems.

Linda

Linda

Thanks. One thing I did discover is that any attempt to "instrument" the code, to display intermediate values, makes the problem go away. I'm fairly certain that I'll find the program depends on the X87 register format with extended precision, but perhaps I'll be surprised.

Steve - Intel Developer Support

Quoting - Steve Lionel (Intel)
Thanks. One thing I did discover is that any attempt to "instrument" the code, to display intermediate values, makes the problem go away. I'm fairly certain that I'll find the program depends on the X87 register format with extended precision, but perhaps I'll be surprised.

So far, I've run the little test with /fp:strict /speculation=fast and that's okay.
and /fp:strict and speculation=off and that's okay. So certainly not specifying extended precision unless some how /fp:strict makes it show up.

not sure I attached the correct module this morning and also including the "staticverify" which I might have done before I added some more double precision to the module. A little disconcerting in the static verify warnings.

Linda

Linda

putting aprint statement in the middle of a do loop and it seems to run okay (correctly) with /fp:source

are you sure this isn't a compiler bug?

found a couple of instances where single precision was being used. changed and attached.

Linda

Linda

No, I'm not sure - yet.

Steve - Intel Developer Support

Linda, would you please attach the source for your new module DATAPRECISIONGLOBALS? If you've made changes in other sources, such as TestVolume.f90, attach that too. I tried coming up with my own DATAPRECISIONGLOBALS but it caused other errors.

Steve - Intel Developer Support

Quoting - Steve Lionel (Intel)
Linda, would you please attach the source for your new module DATAPRECISIONGLOBALS? If you've made changes in other sources, such as TestVolume.f90, attach that too. I tried coming up with my own DATAPRECISIONGLOBALS but it caused other errors.

Here's the whole thing. Including solution, test file in/out (you can now call from command line).

Linda

Attachments: 

AttachmentSize
Downloadapplication/zip voltest.zip20.42 KB
Linda

Linda,

It appears that you were encountering a compiler bug, one that is fixed in the upcoming 11.1 release (May/June). With that compiler, I get the correct results without specifying /arch:ia32, /fp or /fp-speculation.

Steve - Intel Developer Support

Quoting - Steve Lionel (Intel)
Linda,

It appears that you were encountering a compiler bug, one that is fixed in the upcoming 11.1 release (May/June). With that compiler, I get the correct results without specifying /arch:ia32, /fp or /fp-speculation.

Glad it will be fixed and glad it wasn't just my imagination...

Linda

Linda

It was certainly never your imagination! I looked to see if it is fixed in the next 11.0 update, but it is not - sorry. You do have a workaround until 11.1 is available.

Steve - Intel Developer Support

I noticed something as I was trying to track down some differences in results of two models that I expected to be exactly the same. I was wondering if it was related to the problem reported in this thread.

To find out from where the difference in my results was coming, I started printing values of intermediate results using hexidecimal edit descriptors. The first time I see any difference at all in the two models is after the following code segment.

      AB(k) = 0.
      ZZ(k) = 0.
      do 546 i=1,i1
        AB(k) = AB(k) + AA(i,n)
        ZZ(k) = ZZ(k) + Z(i,n)
  546 enddo
      write(6,'(A,/4(8Z20/))') 'AA: ',(AA(i,n), i=1,i1)
      write(6,'(A,/4(8Z20/))') 'AB: ', AB(k)

The elements of the array AA that I print are all exactly the same, but AB(k) differs in the last bit or two.

Model 1:

AA:
3F87010CC22C0E5B 3C739983AEEF67E9 3C739983AEEF67E9 3C739983AEEF67E9
3C739983AEEF67E9 3C739983AEEF67E9 3C739983AEEF67E9 3C739983AEEF67E9
3C739983AEEF67E9 3C739983AEEF67E9 3C739983AEEF67E9 3C739983AEEF67E9
3C739983AEEF67E9 3C739983AEEF67E9 3C739983AEEF67E9 3C739983AEEF67E9

AB:
3F87010CC22C0EF0

Model 2:

AA:
3F87010CC22C0E5B 3C739983AEEF67E9 3C739983AEEF67E9 3C739983AEEF67E9
3C739983AEEF67E9 3C739983AEEF67E9 3C739983AEEF67E9 3C739983AEEF67E9
3C739983AEEF67E9 3C739983AEEF67E9 3C739983AEEF67E9 3C739983AEEF67E9
3C739983AEEF67E9 3C739983AEEF67E9 3C739983AEEF67E9 3C739983AEEF67E9

AB:
3F87010CC22C0EEE

When I use the compiler option /arch:IA32 for the file containing the above code fragment, the results of the two models are exactly the same, but different from above.

AB:
3F87010CC22C0EF1

I know the differences above are almost insignificant, but I don't want to spend time chasing my tail there is some compiler bug. Somehow the same executable is giving me different results when adding the same numbers. It seems to me that the only way that happens is if the terms are added together in a different order. Putting a print statement inside the loop must do something to the optimization, because then the results are identical.

Any idea when 11.1 will be released?

I see no evidence of a compiler bug. I do see evidence of a program that depends on extra precision above and beyond what is declared for the datatypes.

11.1 will be out next week.

Steve - Intel Developer Support

Quoting - a.leonard

      AB(k) = 0.
      ZZ(k) = 0.
      do 546 i=1,i1
        AB(k) = AB(k) + AA(i,n)
        ZZ(k) = ZZ(k) + Z(i,n)
  546 enddo

Any of the options /arch:IA32 /fp:strict /fp:source /fp:precise /O1 should suppress sum reduction vectorization in that loop, and should give the same results, if all the operands are double precision. The default option to optimize with batched sums usually would give slightly more accurate results, as well as much greater performance. Unfortunately, the order of additions with this optimization will depend somewhat on alignments, which may vary on 32-bit Windows, depending on how you allocated the arrays.

Quoting - tim18

Any of the options /arch:IA32 /fp:strict /fp:source /fp:precise /O1 should suppress sum reduction vectorization in that loop, and should give the same results, if all the operands are double precision. The default option to optimize with batched sums usually would give slightly more accurate results, as well as much greater performance. Unfortunately, the order of additions with this optimization will depend somewhat on alignments, which may vary on 32-bit Windows, depending on how you allocated the arrays.

That't what I needed to know. One of my models allocates some exta memory, so the arrays I'm looking at must end up being aligned differently.

Leave a Comment

Please sign in to add a comment. Not a member? Join today