vectorization of sin/cos results in wrong values

vectorization of sin/cos results in wrong values

if sin is autovectorized there seems to be a signbit issue.
For example the 2nd phase (from PIHALF to PI just flips the sign.)
Here is the code:
__declspec(align(16)) float input[12] = {0.f,0.5f,1.f,1.5f,2.f,2.5f,3.0f,3.5f,4.f,4.5f,5.f,5.5f};
__declspec(align(16)) float output[12] ;

for(int i = 0 ;i < 12 ; i++)
{
output[i] = sin(input[i]);
}

for(int i = 0 ;i < 12 ; i++)
{
char txt[32];
sprintf(txt,"%i %f %f\\n",i,input[i],output[i]);
OutputDebugString(txt);
}

this outputs:
0 0.000000 0.000000
1 0.500000 0.479426
2 1.000000 0.841471
3 1.500000 0.997495
4 2.000000 -0.909297 5 2.500000 -0.598472
6 3.000000 -0.141120
7 3.500000 -0.3507838 4.000000 -0.756802
9 4.500000 -0.977530
10 5.000000 0.95892411 5.500000 0.705540

cos seems ok on 32 bit but also has a sign problem in 64 bit.
Compiler XE 12.1.2.278 on Windows with VS2008.
Same error when using double.
If i turn off autovectorization everything is fine.
Since the performance difference is huge between vectorized and nonvectorized version this is a big issue.

Evitan

67 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hi Evitan,

There's something I'm missing, as I think I'm matching everything you're saying here as closely as I can, and I get the right answers:

Q:\u102930>icl test.cpp

Intel C++ Compiler XE for applications running on IA-32, Version 12.1.0.233 Build 20110811

Copyright (C) 1985-2011 Intel Corporation. All rights reserved.

test.cpp

Microsoft Incremental Linker Version 9.00.30729.01

Copyright (C) Microsoft Corporation. All rights reserved.

-out:test.exe

test.obj

Q:\u102930>test

0 0.000000 0.000000

1 0.500000 0.479426

2 1.000000 0.841471

3 1.500000 0.997495

4 2.000000 0.909297

5 2.500000 0.598472

6 3.000000 0.141120

7 3.500000 -0.350783

8 4.000000 -0.756802

9 4.500000 -0.977530

10 5.000000 -0.958924

11 5.500000 -0.705540

I'm in VS2008 compat mode here. What compiler options are you using, and can you send me a complete test case - this is what I used since you only provided a snippet:

#include 
#include 

int main() {
 __declspec(align(16)) float input[12] =  {0.f,0.5f,1.f,1.5f,2.f,2.5f,3.0f,3.5f,4.f,4.5f,5.f,5.5f};
 __declspec(align(16)) float output[12] ;

 for(int i = 0 ;i < 12 ; i++)
 {
     output[i] = sin(input[i]);
 }

 for(int i = 0 ;i < 12 ; i++)
 {
     char txt[32];
     printf("%i %f %fn",i,input[i],output[i]);
     //OutputDebugString(txt);
 }
 return(0);
}
Brandon Hewitt Technical Consulting Engineer For 1:1 technical support: http://premier.intel.com Software Product Support info: http://www.intel.com/software/support

I tried a new console project and the error did not occur.
So it seems to be an issue when building a dll, since the original error was in a multithreaded dll.
A complete test case is not really easy since the project is big and very special, i have no idea how to strip it down.

I had a look at assembler (although i am not good at it) in the console application there are 3 calls to
0129106F call ___svml_sinf4 (129CE50h)

in my ddl project some generated code gets called (also 3 times of course).
1000BFB6 call 100BEF20

I can send you the assembler of this call if you want. There were no use of any builtin sin function i could see.
But this different assembler output feels strange doesn't it ?

Maybe my settings will help:
/D "NDEBUG" /D "_WINDOWS" /D "_USE_INTEL_COMPILER" /D "USE_LIBPNG=1" /D "_VC80_UPGRADE=0x0600" /D "_WINDLL" /EHsc /MT /GS- /fp:fast /Fo"Release\" /Fd"Release\" /FR"Release/" /W0 /nologo /Wp64 /Qfp-speculation:fast /Qvec_report1 /O3 /G7 /GA

Evitan

Would you be able to try a one loop Test-Case with your old project ( the one that produced incorrect values )?

...
for( int i = 0; i < 1; i++ )
{
output[ 0] = sin( input[ 0] );
output[ 1] = sin( input[ 1] );
output[ 2] = sin( input[ 2] );
output[ 3] = sin( input[ 3] );
output[ 4] = sin( input[ 4] );
output[ 5] = sin( input[ 5] );
output[ 6] = sin( input[ 6] );
output[ 7] = sin( input[ 7] );
output[ 8]= sin( input[ 8] );
output[ 9]= sin( input[ 9] );
output[10] = sin( input[10] );
output[11] = sin( input[11] );
}
...

I wonder if it makes any difference?

Best regards,
Sergey

same result:
0 0.000000 0.000000
1 0.500000 0.479426
2 1.000000 0.841471
3 1.500000 0.997495
4 2.000000 -0.909297
5 2.500000 -0.598472
6 3.000000 -0.141120
7 3.500000 -0.350783
8 4.000000 -0.756802
9 4.500000 -0.977530
10 5.000000 0.958924
11 5.500000 0.705540

block was vectorized.

the issue only occurs if you set

_MM_SET_ROUNDING_MODE(_MM_ROUND_TOWARD_ZERO);

and there is another issue about this rounding mode thing
_MM_GET_ROUNDING_MODE();
gives me a different number everytime i start the app
but it's always the same before and after i call any rounding mode.

so is the whole rounding mode thing maybe obsolete/broken ?

Quoting evitan...so is the whole rounding mode thing maybe obsolete/broken ?

I don't think it is "obsolete" but it reallyaffects calculations! Let's wait for a response from Intel's software engineers...

Hi everybody,

The macro to set the rounding mode changes the floating-point environment. As such, the normal floating-point settings of the compiler (/fp:fast) aren't going to work, since those optimizations assume a default floating point environment. You'll need to set /fp:strict for this to work properly, or at minimum use the pragma "#pragma STDC FENV_ACCESSON". For more details, I strongly recommend Martyn Corden's whitepaper on the topic - http://software.intel.com/en-us/articles/consistency-of-floating-point-results-using-the-intel-compiler/ .

Brandon Hewitt Technical Consulting Engineer For 1:1 technical support: http://premier.intel.com Software Product Support info: http://www.intel.com/software/support

Brandon,

A sign flip error is significantly different than an error in lsb's of precision.

An airplain crash in South America occured because of a sign flip. (although this was due to programming oversight as opposed to calculation from a sinfunction)

Jim Dempsey

www.quickthreadprogramming.com

The algorithms used in various math routines may depend critically on the results of certain operations being rounded in a certain way. Round the result in an unexpected fashion and the flow of control in the algorithm may be changed, table indices for looking up constantsmay wrongetc.It's much more complicated than just thinking the result may be off by an lsb or two.

If you change the default floating-point environment,the assumptions upon whichmany of the algorithmsin the run-time libraries are based, especiallythose which deal with math functions, are violated andlots and lots of things can (and will) go wrong. As this user has proven.

There are good reasons why switches and pragmas are provided to indicate that the run-time environment has been changed from the default.

Thanks for the answers so far.
I totally understand that changing the roundingmode can result in unexpected library code.
But that's exactly the reason why i set it to 'default' _MM_ROUND_TOWARD_ZERO.
Because my dll lives in an environment where other dll's may change the rounding mode.

And i had never problems with version 11 btw. So something has changed here.
What first makes me wonder is that when calling _MM_GET_ROUNDING_MODE() first it says it's NEAREST.
I would have expect it to be toward_zero.

The other thing i notice is that i am unable to round 0.7f to 1 regardless which rounding mode i set.
So for me the rounding modes doesn't make anything, but the vecotization of sin somehow is affected.

Is there something obvious i am not getting about the rounding mode ?
Did the rounding mode change from version 11 to 12 from tozero to tonearest ?

Thanks
Evitan

"Is there something obvious i am not getting about the rounding mode ?"

Sometimes what's obvious to some isn't obvious to everyone until it's pointed out as being obvious. ;-)

That you say

"The other thing i notice is that i am unable to round 0.7f to 1 regardless which rounding mode i set"

worries me a bit. There is no way that the floating-point constant 0.7f can ever been rounded to 1.0f in any reasonable floating-point arithmetic. Perhaps you are not understanding what the rounding modes being discussed here actually do. They change how the results from floating-point instructions executed by the processor, such as adding two floating-point numbers, are rounded before the result isdelivered. It does not (directly) have anything to do with rounding when printing values.

The vectorized sin routine expects to execute in an environment where the rounding mode is round to nearest even. If you change the environmentand don't tell the run-time environment, you may get incorrect results. That's why there's the compiler switch -fp:strict. It tells the compiler to generate code so that run-time library routines know they may be calledin an environment which does not necessarily match that which they expect. (Andthey must change the environment to that which they expect and then restore it after they finish execution.)

I'm not aware of anything in this area changing between versions 11 and 12 (I assume you mean 11.1 and 12.1). Certainly the default rounding mode for floating-point operations has not changed. It has been round to nearest even for decades.

I find it extraordinary that you would be able to change the rounding mode to something other thanround to nearest in a version 11 environment and not see catastophic results.

Quoting Jeff Arnold (Intel)

"Is there something obvious i am not getting about the rounding mode ?"

Sometimes what's obvious to some isn't obvious to everyone until it's pointed out as being obvious. ;-)

That you say

"The other thing i notice is that i am unable to round 0.7f to 1 regardless which rounding mode i set"

worries me a bit. There is no way that the floating-point constant 0.7f can ever been rounded to 1.0f in any reasonable floating-point arithmetic. Perhaps you are not understanding what the rounding modes being discussed here actually do. They change how the results from floating-point instructions executed by the processor, such as adding two floating-point numbers, are rounded before the result isdelivered.

I don't know if you mean rounding withing printf(); you didn't specify with an example.
The rintf() (in default rounding mode) and roundf() functions seem to offer this in C99. There appears to be an svml vector version of round() in the 12.1 compiler, although I've been frustrated trying to get it rather than the slow scalar version.
I compare with the "dirty" C89 code like
#include
#define rintf(x) ((x)>=0? ((float)(x)+1/FLT_EPSILON) -1/FLT_EPSILON:((float)(x)-1/FLT_EPSILON) +1/FLT_EPSILON)
Which requires -fp:precise compilation mode, and adds another exception to your "no way." It doesn't work quite right in cases such as x==1.5f/FLT_EPSILON but it certainly will round 0.7f to 1.0f.
If you want control of rounding modes in printf() and the like, it might be best to apply your desired rounding explicitly. I don't think you can expect that printf() necessarily uses SSE code which would be affected by your changes in SSE rounding mode.
Math functions don't necessarily implement rounding modes other default. Ideally, if rounding mode matters, they would set their own rounding mode, then reset to the entry rounding mode. In IA32 mode, supposing the x87 built-ins are used, those are documented as not being affected by rounding mode (and anyway, you wouldn't be able to use the SSE intrinsics).

"there is no way that the floating-point constant 0.7f can ever been rounded to 1.0f in any reasonable floating-point arithmetic."

Ok i realize that i made not clear that i am more talking about truncating/casting then rounding.
Basically what i want is that the truncation of float values works as everyone would expect.
I often do int z = (int) floatvalue;
So i need a consistent (and fast) way of doing this and i want to rely on 0.9 being truncated to 0 and -0.9 also to 0.
That behaviour i would call _MM_ROUND_TOWARD_ZERO and i thought that the rounding modes would define the behaviour.
But it more and more seems that the rounding mode has nothing todo with float2int conversion.

So a final question, can i always and ever rely on that converting float to int will work the way i think?
And will it be the most efficient ?

My usecase is fast interpolated table readouts. So i often have a float index value and use it like this:

const int index = floatindex;// float to int
const float fractionalpart = floatindex-index; // maybe is there a better way ?
float result = table[index ] + (table[index+1] - table[index ])*fractionalpart ;

If somebody knows a better way please tell me.

And thanks for all the clarification so far!!

That looks OK, if you don't muck with rounding mode. It would be slow (extremely slow, on certain older CPUs) if compiled in IA32 mode. You have drifted away from the original topic.

Quoting Brandon Hewitt (Intel)...
For more details, I strongly recommend Martyn Corden's whitepaper on the topic - http://software.intel.com/en-us/articles/consistency-of
-floating-point-results-using-the-intel-compiler/

[SergeyK] I've read it.
...

Hi everybody,

For the second day we're talking about a problem Evitan hasand I'm not sure that Intel Software Engineers
tried to investigate it. Instead, there aretalks about theory, different "default" and "non-default" FP-environments, etc, etc.

I wonder ifMr. Martyn Corden, or Brandon, or Tim,could reallyinvestigate what is wrong with results ofsin and cos functions?

Thank you in advance.

Best regards,
Sergey

>>If you change the default floating-point environment,the assumptions upon whichmany of the algorithmsin the run-time libraries are based, especiallythose which deal with math functions, are violated andlots and lots of things can (and will) go wrong. As this user has proven.

These assumptions are made by the author of the code, not the user of the code. Further, these assumptions are not "advertised" in the function definition. Something along the line of

CAUTION: This function is known to not work if the default floating-point environment is altered (whatever it happens to be for this version of the runtime library).

A more appropriate implimentation would be for sin/cos to always approximate provided the input is not NaN or Infinity. And a new function added, perhapse named "fast_sin" or "flaky_fast_sin".

When an error of this magnitude (actually not magnitude FP-speak, rather sign) is known to take place using "the wrong floating point settings", then the sin (or whatever) function should assert (at least in Debug build) with an appropriate error, or fall back to: Save FP state, set to appropriate mode, run function, restore FP state.

Jim Dempsey

www.quickthreadprogramming.com

Quoting jimdempseyatthecove...
using "the wrong floating point settings", then the sin (or whatever) function should assert (at least in Debug build) with an appropriate error, or fall back to: Save FP state, set to appropriate mode,

[SergeyK] I think default FPU settings is the best choice in that case.

run function, restore FP state.

Jim Dempsey

Best regards,
Sergey

Quoting evitan...IfI turn off autovectorization everything is fine...

Please take a look at a Test-Case ( Post #19 )and I hope that it will help. In general, I would dothese tests in a new project:

- Don't change Any FPU settings at Runtime

- Debug & ReleaseConfigurations:
- Disable ALL Optimizations
- Run the Test-Case with a Floating Point Model'Precise' ( /fp:precise )
- Run the Test-Case with a Floating Point Model'Strict' ( /fp:strict )
- Run the Test-Case with a Floating Point Model'Fast' ( /fp:fast )

Execute for 'sin' and 'SinNTS11' functions

- Debug & ReleaseConfigurations:
-EnableAutovectorization Optimizations
- Run the Test-Case with a Floating Point Model'Precise' ( /fp:precise )
- Run the Test-Case with a Floating Point Model'Strict' ( /fp:strict )
- Run the Test-Case with a Floating Point Model'Fast' ( /fp:fast )

Execute for 'sin' and 'SinNTS11' functions

Note: 'SinNTS11' calculates a sine value using Normalized Taylor Series up to 11th term.

It isnot clearif non-default FPU-settingscause a problem with sign when 'sin' is called. Use the 'SinNTS11'
function for verification.

Here is a Test-Case:

...

floatPI_DIVBY_180 = 3.141592653589793f / 180.0f;

floatg_dNF3 = 1.0f / 6.0f;
floatg_dNF5 = 1.0f / 120.0f;
floatg_dNF7 = 1.0f / 5040.0f;
floatg_dNF9 = 1.0f / 362880.0f;
floatg_dNF11 = 1.0f / 39916800.0f;

template < class T > inline T SinNTS11( T tA, bool bDegree = true );

template < class T > inline T SinNTS11( T tA, bool bDegree )
{
if( bDegree == true )
tA = tA * PI_DIVBY_180;
return ( T )( tA - tA * tA * tA * ( g_fNF3 - tA * tA * ( g_fNF5 - tA * tA *
( g_fNF7 - tA * tA * ( g_fNF9 - g_fNF11 * tA * tA ) ) ) ) );
};
...

void main( void )
{
__declspec(align(16)) float fInpVal[12] = { 0.0f, 0.5f, 1.0f, 1.5f, 2.0f, 2.5f, 3.0f, 3.5f, 4.0f, 4.5f, 5.0f, 5.5f };
__declspec(align(16)) float fOutVal[12] = { 0.0f };

// A set of values from Windows Calc.exe rounded to 16 digits after the point ( as floats )
__declspec(align(16)) float fChkVal[12] =
{ // Radians
0.0000000000000000f, // 0.0
0.4794255386042030f, // 0.5
0.8414709848078965f, // 1.0
0.9974949866040544f, // 1.5
0.9092974268256817f, // 2.0
0.5984721441039565f, // 2.5
0.1411200080598672f, // 3.0
-0.3507832276896199f, // 3.5
-0.7568024953079283f, // 4.0
-0.9775301176650971f, // 4.5
-0.9589242746631385f, // 5.0
-0.7055403255703919f // 5.5
};

// A set of values with sign errors ( as floats )
__declspec(align(16)) float fInvVal[12] =
{ // Radians
0.0000000000000000f, // 0.0
0.4794260000000000f, // 0.5
0.8414710000000000f, // 1.0
0.9974950000000000f, // 1.5
-0.9092970000000000f, // 2.0 <- here the error starts
-0.5984720000000000f, // 2.5
-0.1411200000000000f, // 3.0
-0.3507830000000000f, // 3.5 <- here it's okay again
-0.7568020000000000f, // 4.0
-0.9775300000000000f, // 4.5
0.9589240000000000f, // 5.0 <- wrong again
0.7055400000000000f // 5.5
};

// A set of values with sign errors ( as strings )
char *szInvVal[12] =
{ // Radians
" 0.0000000000000000", // 0.0
" 0.4794260000000000", // 0.5
" 0.8414710000000000", // 1.0
" 0.9974950000000000", // 1.5
"-0.9092970000000000", // 2.0 <- here the error starts
"-0.5984720000000000", // 2.5
"-0.1411200000000000", // 3.0
"-0.3507830000000000", // 3.5 <- here it's okay again
"-0.7568020000000000", // 4.0
"-0.9775300000000000", // 4.5
" 0.9589240000000000", // 5.0 <- wrong again
" 0.7055400000000000" // 5.5
};

for( int i = 0; i < 12; i++ )
{
fOutVal[i] = sin( fInpVal[i] );
// fOutVal[i] = SinNTS11( fInpVal[i], false );
}

for( int i = 0; i < 12 ; i++ )
{
printf( "%2i % 2f % 2.16f % 2.16f\tAbsError 1: % 2.10f\tAbsError 2: % 2.10f\n",
i, fInpVal[i], fOutVal[i], fChkVal[i],
( fOutVal[i] - fChkVal[i] ),
( fInvVal[i] - fChkVal[i] ) );
}
}

Results for 'sin' - ALL optimizations Disabled - 'Precise' Floating Point Model:

0 0.000000 0.0000000000000000 0.0000000000000000 AbsError 1: 0.0000000000 AbsError 2: 0.0000000000
1 0.500000 0.4794255495071411 0.4794255495071411 AbsError 1: 0.0000000000 AbsError 2: 0.0000004470
2 1.000000 0.8414709568023682 0.8414709568023682 AbsError 1: 0.0000000000 AbsError 2: 0.0000000596
3 1.500000 0.9974949955940247 0.9974949955940247 AbsError 1: 0.0000000000 AbsError 2: 0.0000000000
4 2.000000 0.9092974066734314 0.9092974066734314 AbsError 1: 0.0000000000 AbsError 2: -1.8185943961
5 2.500000 0.5984721183776856 0.5984721183776856 AbsError 1: 0.0000000000 AbsError 2: -1.1969441175
6 3.000000 0.1411200016736984 0.1411200016736984 AbsError 1: 0.0000000000 AbsError 2: -0.2822400033
7 3.500000 -0.3507832288742065 -0.3507832288742065 AbsError 1: 0.0000000000 AbsError 2: 0.0000002384
8 4.000000 -0.7568024992942810 -0.7568024992942810 AbsError 1: 0.0000000000 AbsError 2: 0.0000004768
9 4.500000 -0.9775301218032837 -0.9775301218032837 AbsError 1: 0.0000000000 AbsError 2: 0.0000001192
10 5.000000 -0.9589242935180664 -0.9589242935180664 AbsError 1: 0.0000000000 AbsError 2: 1.9178482890
11 5.500000 -0.7055402994155884 -0.7055402994155884 AbsError 1: 0.0000000000 AbsError 2: 1.4110803008

Results for 'SinNTS11' - ALL optimizations Disabled - 'Precise' Floating Point Model:

0 0.000000 0.0000000000000000 0.0000000000000000 AbsError 1: 0.0000000000 AbsError 2: 0.0000000000
1 0.500000 0.4794255495071411 0.4794255495071411 AbsError 1: 0.0000000000 AbsError 2: 0.0000004470
2 1.000000 0.8414709568023682 0.8414709568023682 AbsError 1: 0.0000000000 AbsError 2: 0.0000000596
3 1.500000 0.9974949359893799 0.9974949955940247 AbsError 1: -0.0000000596 AbsError 2: 0.0000000000
4 2.000000 0.9092960953712463 0.9092974066734314 AbsError 1: -0.0000013113 AbsError 2: -1.8185943961
5 2.500000 0.5984488725662231 0.5984721183776856 AbsError 1: -0.0000232458 AbsError 2: -1.1969441175
6 3.000000 0.1408745646476746 0.1411200016736984 AbsError 1: -0.0002454370 AbsError 2: -0.2822400033
7 3.500000 -0.3525766134262085 -0.3507832288742065 AbsError 1: -0.0017933846 AbsError 2: 0.0000002384
8 4.000000 -0.7668044567108154 -0.7568024992942810 AbsError 1: -0.0100019574 AbsError 2: 0.0000004768
9 4.500000 -1.0228915214538574 -0.9775301218032837 AbsError 1: -0.0453613997 AbsError 2: 0.0000001192
10 5.000000 -1.1336168050765991 -0.9589242935180664 AbsError 1: -0.1746925116 AbsError 2: 1.9178482890
11 5.500000 -1.2947616577148437 -0.7055402994155884 AbsError 1: -0.5892213583 AbsError 2: 1.4110803008

Note: As you can see a sign is calculated properly for 'sin' and 'SinNTS11' functions, but accuracy is not
acceptible in cases when radians are 5.0 or 5.5 for 'SinNTS11' function.

These functions would be implemented with range reduction, based on properties such as remaindering mod 2 PI. It's generally recognized that the remaindering needs to be done in higher precision than the declared precision of the function; e.g. the x87 built-ins use an 80-big long double value for Pi (which happens to be accurate to 66 bits precision). That ought to be sufficient to get accurate double results at 5 radians. Needless to say, more precision here is more costly. Plauger's classic "The Standard C library" shows some typical software techniques. If you want amusement, a long email thread has been going on the gcc-help mail list, with opinions ventured ranging from obstinate to expert, and some discussion about the need for a practical open source library.
Intel runs a long series of tests on math functions, but the tests and results aren't divulged. There are several classic open source tests, such as the TOMS Fortran ELEFUNT (which has been ported to C, including 128-bit long double) and CELEFUNT.
Recently, a lot of academic work has gone into the possibility of correctness criteria and IEEE standards for elementary math functions. I haven't seen anything remotely resembling a useful practical summary, but I'm no expert.
The questions of software vs. firmware implementation have by no means been laid to rest, e.g. with float elementary vector math functions in firmware on MIC prototypes.

Quoting TimP (Intel)...
Intel runs a long series of tests on math functions,...
...

I absolutely believe in that.

Anyway, it is still enigma what is going on with a sign for some results of sin function when Autovectorization is used.

Here is an example of a Round-off problem when adding a small number. Expected result has to be
equal to 1.0, but due to a round-off problem it is not equal to1.0.

>> 'Single-Precision' Test-Case <<

...
RTfloat fSum = 0.0f;
for( i = 0; i < 1000000; i++ )
fSum += 0.000001f;
...

Results:

Accuracy _RTFPU_PC_24 - RTfloat - Result: 1.00903892517089840
Relative error: 0.895795524120330810 %

Accuracy _RTFPU_PC_53 - RTfloat - Result: 1.00903892517089840
Relative error: 0.895795488699064450 %

Accuracy _RTFPU_PC_64 - RTfloat - Result: 1.00903892517089840
Relative error: 0.895795488699064560 %

Note: It also demonstrates that in case of a 'Single-Precision' type 53-bit and 64-bitprecisions are not applicable

>> 'Double-Precision' Test-Case <<

...
RTdouble dSum = 0.0L;
for( i = 0; i < 1000000; i++ )
dSum += 0.000001L;
...

Results:

Accuracy _RTFPU_PC_24 - RTdouble - Result: 1.00903892517089840
Relative error: 0.895795524120330810 %

Accuracy _RTFPU_PC_53 - RTdouble - Result: 1.00000000000791810
Relative error: 0.000000000791811061 %

Accuracy _RTFPU_PC_64 - RTdouble - Result: 1.00000000000791810
Relative error: 0.000000000791811061 %

Hi all,

Sorry it took a little bit of time to get to this, but I had a couple different developers on our math runtime team looking at this and wanted to make sure we understood the behavior here before I replied.

Autovectorization of the sin() function call here causes a call to a dedicated SIMD-friendly sin routine which is different from the scalar sin (you'll see this in generated asm as __svml_sinf4). That SIMD-friendly version, by design, assumes round-to-nearest-even default rounding mode and the trigonometric reduction in that function leads to the wrong quadrant for the reduced argument if run in non-standard rounding modes. Older versions of this SIMD sin() used a different reduction algorithm which is why we dont see this issue with 11.1. This is also true about the scalar sin call it uses a different reduction algorithm. The reason the algorithm was changed from 11.1 to 12.x was for performance reasons.

The compiler defaults to /fp:fast which among other things implies (along with the lack of a user-inserted #pragma stdc fenv_access) that the floating-point environment has not been manually changed, and therefore the compiler can generate calls to the optimized SIMD versions of these math functions, so in this situation where the rounding mode is being changed, we recommend (and document in the User's Guide) that /fp:strict be used, which will fall back to the scalar math function which can handle different rounding modes without affecting results at the precision normally expected.

Brandon Hewitt Technical Consulting Engineer For 1:1 technical support: http://premier.intel.com Software Product Support info: http://www.intel.com/software/support

>> the trigonometric reduction in that function leads to the wrong quadrant for the reduced argument if run in non-standard rounding modes

The XE 2011 SP1 documentation for -fp-fast does not state that there is a quadrant issue relating to sin/cos that will result in a flip in sign of the result of some calculations. With the default being -fp=fast means the default has a defective sin/cos routine that must be addressed by the developers. The documentation states "These optimizations increase speed, but may alter the accuracy of floating-point computations.". Accuracy difference, by interpretation of any user of your code, would be thatsome minor loss in precision of the mantissa is to be expected, but NOT of the sign of the result. It is Intel's responsibility to produce reliable code be it with some loss in precision of the mantissa.

In an earlier post I mentioned a software glitch (programming error) that led to an airplane crash. In this case, the programming error caused the artificial horizon to flip over (sign change) causing the autopilot to flip control inputs, and subsequently may have engaged overrides to pilot inputs. I do not know how else to stress that this is a serious problem that needs to be fixed.

Jim Dempsey

www.quickthreadprogramming.com

Thanks, Jim!

Quoting jimdempseyatthecove...
Accuracy difference, by interpretation of any user of your code, would be thatsome minor loss in precision of the mantissa is to be expected, but NOT of the sign of the result. It is Intel's responsibility to produce reliable code be it with some loss in precision of the mantissa.
...
[SergeyK] I wonder if Intel's software developershave decided to change trigonometric laws?
...

Brandon,

It doesn't matter what aFloating Point Model ('strict', 'fast' or'precise' ) is selected for an application,
a sign must be rightfor avaluereturnedfrom'sin' or 'cos'functions.

It doesn't matter if 'sin' or 'cos' functionsare implemented asscalar,non-scalar, SIMD-friendly, etc.

Guys, you're breaking trigonometric laws!

Best regards,
Sergey

I'm not sure, but possibly Brandon may have meant something along this line:
When you compile with -fast (the default) in order to get automatic invocation of svml libraries, you must not tinker with rounding modes. If you intend to change rounding modes, you should set -fp:strict, or at least set rounding mode back to normal before entering the math library function.
You could test yourself whether the time required to save rounding mode, set round-nearest (and later restore your local rounding mode) is large, and whether it solves associated numeric problems. The original P4 was notoriously slow in saving and restoring rounding modes, particularly in x87; later CPUs had to improve on it, but the compiler retains some choices which seem to have been make for P4.
I can't tell whether all the comments made in this thread are related to the original question. Needless to say, if errors such as are reported here occur with default optimizations, with rounding modes left at default, this would make it extremely important to test whether options such as -fp:source or -fp:strict, and/or -Qimf-arch-consistency:true are required in each application.
I'm also having difficulty guessing whether the comments all refer to the same OS.

Hi all,

Sorry for the delay herein my response - I was havingan involveddiscussion with Martyn Corden on my team and several of our math libraries developers on this topic, and I wanted to wait until that was settled before I responded.

To quickly summarize the result of that discussion though, our math libraries team islooking into if we can change this algorithm to not be as nearly significantly impacted by the rounding mode change as it currently is without dropping the performance of the algorithm at the same time. I'll keep this thread posted when there's an update on their progress.

Brandon Hewitt Technical Consulting Engineer For 1:1 technical support: http://premier.intel.com Software Product Support info: http://www.intel.com/software/support

Quoting Brandon Hewitt (Intel)...To quickly summarize the result of that discussion though, our math libraries team islooking into
if we can change this algorithm to not be as nearly significantly impacted by the rounding
mode change as it currently is without dropping the performance of the algorithm at the
same time
.I'll keep this thread posted when there's an update on their progress.

Hi Brandon,

Are there any updates?

Best regards,
Sergey

doesn't matter what aFloating Point Model ('strict', 'fast' or'precise' ) is selected for an application,
a sign must be rightfor avaluereturnedfrom'sin' or 'cos'functions

Completely agree with you Sergey.
Some loss in the accuracy of the returned value is acceptable, but not the change in the sign.
Such a behaviour can be catastrophic.

Even the sign of sin/cos becomes meaningless for large enough arguments. For example, for arguments of magnitude > 2^64 for the x87 firmware functions there is no effort to get a correct sign or magnitude, while for moderate magnitudes, the zero crossing points are subject to the rounding error implicit in the long double rounded value of Pi.
The Intel compiler -help option tells you explicitly that accuracy of math functions may depend on imf- option settings, although the application to trig functions is left to the imagination. It also tells you that if you want equally accurate treatment regardless of CPU type, you must set the imf-arch-consistency=true.

For example, for arguments of magnitude > 2^64 for the x87 firmware functions there is no effort to get

It is hard to imagine such a large argument being passed to sin() function in real-world application.
Also division and rounding in the range-reduction algorithm can introduce someinnacuracies in the range reduced argument.

Quoting iliyapolak

doesn't matter what aFloating Point Model ('strict', 'fast' or'precise' ) is selected for an application,
a sign must be rightfor avaluereturnedfrom'sin' or 'cos'functions

Completely agree with you Sergey. Some loss in the accuracy of the returned value is acceptable,
but not the change in the sign.
Such a behaviour can be catastrophic.

In Post #8:

http://software.intel.com/en-us/forums/showpost.php?p=175161

Jim gave a real ( unfortunately,tragic ) example what happens in such cases.

Jim gave a real ( unfortunately,tragic ) example what happens in such cases

IIRCI read about fortran related error probably caused by the programmer which was wrongly interpreted by the compiler and lead to the catastrophic explosion of the rocket booster involved in space mission.

http://www.sundoginteractive.com/sunblog/posts/top-ten-most-infamous-software-bugs-of-all-time/

It is hard to imagine such a large argument being passed to sin() function in real-world application.

Also division and rounding in the range-reduction algorithm can introduce some innacuracies in the range reduced argument.

You might find http://www.derekroconnor.net/Software/Ng--ArgReduction.pdf interesting.

Quoting Jeff Arnold (Intel)- collapse sourceview plaincopy to clipboardprint?

  1. Itishardtoimaginesuchalargeargumentbeingpassedtosin()functioninreal-worldapplication.
  2. Alsodivisionandroundingintherange-reductionalgorithmcanintroducesomeinnacuraciesintherangereducedargument.
It is hard to imagine such a large argument being passed to sin() function in real-world application.

Also division and rounding in the range-reduction algorithm can introduce some innacuracies in the range reduced argument.

You might find http://www.derekroconnor.net/Software/Ng--ArgReduction.pdf interesting.

Hi everybody,

Jeff, thank you. The article is good and very interesting. However, I won't be surprised if somebody will defend a PhD on
how to reduce 2^1024 radians to some right value. We do programming and we try apply some knowledge to some real life problems.
I completely agree with Iliya and I can't imaging, for example,a satellite tracking hardware that will return a look-at-angle values
greater than 180 degrees ( onePI radians ).

Best regards,
Sergey

Hi,
Update to the issue.

This issue is fixed in the next version (13.0) of Intel C++ compiler. The beta of 13.0 just ended. If you already registered for the beta, you can still download it.

For 12.1, please use the two work-arounds:
1. use this option like TimP provided: /Qimf-arch-consistency=true
This is an easy work-around.Now I'm wondering why it's not set as default.
2. do not use setRoundingMode() to change the mode.

The fix is in 13.0 only, there are some challenge in porting the fix to 12.1 without affecting the performance in general.

Thanks,
Jennifer

You might find http://www.derekroconnor.net/Software/Ng--ArgReduction.pdf interesting

.
I read thisdocumentit is very interesting and insightful.

I completely agree with Iliya and I can't imaging, for example,a satellite tracking hardware that will return a look-at-angle values
greater than 180 degrees ( onePI radians ).

Radian argument to some software/hardware sine implementation will be directly depended on some physical constraints which measured process or device exhibits.For example: some rotational modulator-filter which modulates emitted IR energy from the fast moving target(airplane) in such a system a value of few hundreds radians(fast rotational speed)will be sufficient to spot very earlythe change in the target position relative to the filter.

Quoting Jeff Arnold (Intel)...You might find http://www.derekroconnor.net/Software/Ng--ArgReduction.pdf interesting.

I've done a couple of simpleverifications for a test-case used in the article (sin( 10^22 * OneRadian ) ) and
here is a screenshot:

Windows Calculator was used:a top result iswithout a range reduction and bottom result is with a "simulated" range reduction.

Here are results from Windows Calculatorfor sin( 10^1024 * OneRadian ):

( top - without range reduction / bottom - with "simulated" range reduction )

Quoting Sergey Kostrov...I've done a couple of simpleverifications for a test-case used in the article (sin( 10^22 * OneRadian ) )...

A short follow up... Here are results forMicrosoft C++ compiler ( Visual Studio 2005 ):

Expected Value
:
Sine of 10^22 radians: -0.85220084976718880059522602157523

Calculated Values:
Sine of 10^22 radians: 0.41214336710708466000000000000000 - Range Reduction: No
Sine of 10^22 radians: -0.85220084976718891000000000000000 - Range Reduction: Yes

Windows Calculator was used:a top result iswithout a range reduction and bottom results is with a "simulated" range reduction

Hi Sergey!
How were you able to disable a range reduction in Windows Calculator?
I need to check Calc.exe imports , but it probably calls one of MSVCRT.DLL sin() functions which in turn calls x87 fsin.So disabling a range reduction is not possible because fsin performs in range reduction in the hardware.By disabling a range-reduction I mean donotmap large periodic radianvaluesto the range [-Pi/4,Pi/4] which is suitable for the accurate calculation of the sine function on the digital computer.

Out of curiosity I tested my own sine implementation "fastsin()" which does not have any range reduction algorithms.
From pure mathematical point of view the Taylor expansion of the sin function is infinitely convergeable, but when executed on the digital computer without any range reduction algorithm the result returned by the function is a disaster.

Here is the result of fastsin() called with an argument of10^22 radians.

Range reduction is not implemented.I mean mapping of large periodic values of radian argument to the range suitable for the calculation [-Pi/4,Pi/4].

radian_arg 10000000000000000000000.000000000000000000000000
fastsin called with an argument of(10000000000000000000000.000000) radians, the
result is -1.#INF00000000000000000000

Here is the result of fastsin() called with an argument of 10^2 radians.

radian_arg 100.000000000000000000000000
fastsin called with an argument of(100.000000) radians, the result is -36803877
3856910700000000.000000000000000000000000

Here is the result of fastsin() called with an argument of 8.0 radians.

radian_arg 8.000000000000000000000000
fastsin called with an argument of( 8.000000) radians, the result is 0.9871283
39487625790000000
<< At 8.0 radians the result starts to diverge from the actual value.
Correct valueis 0.98935824662338177780812359824529

Here is the result of fastsin() called with an argument of 9.0 radians.

radian_arg 9.000000000000000000000000
fastsin called with an argument of( 9.000000) radians, the result is 0.3706866
03602516480000000
<< At 9.0 radians the error grows.
Correctrange-reduced value is0.41211848524175656975627256635244.

Here are results from Windows Calculatorfor sin( 10^1024 * OneRadian

How were you able to calculate 'sin' for such a large argument?
When I tried to dothis on my pc Calc.exe hangs.

Calculated Values:
Sine of 10^22 radians: 0.41214336710708466000000000000000 - Range Reduction: No

Is this possible to disable range reduction of the periodic functions in VS C++compiler.

I suppose you can perform range reduction yourself before calling the library function, so as to skip the library range reduction (if you know enough about its working). Besides reducing range to |x| < Pi/4, library implementations likely use some kind of table of reductions to much smaller range segments.
I suppose one of the reasons why libraries hide such details of their implementation is to avoid encouraging end users to make applications which depend on the details. You could profile or disassemble to see whether the library function uses fsin or fsincos so as to be able to run without SSE2, or to see whether /arch:SSE2 chooses a different function from the library (if you are concerned with the 32-bit compiler).

Quoting iliyapolak

Here are results from Windows Calculatorfor sin( 10^1024 * OneRadian

How were you able to calculate 'sin' for such a large argument?

Hi Iliya, Here are details on how I've tested Windows Calculator:

Task : Calculate sin( 10^1024 * OneRadian ) Select 'Degrees' radio button

Test 1: Without arange reduction

1 8 0 / pi = MS 1 0 x^y 1 0 2 4 = * MR = sin
Result = -0.98684945250044044483447009247958

Test 2: With arange reduction

1 8 0 / pi = MS 1 0 x^y 1 0 2 4 = * MR = Mod 3 6 0 = sin
Result = -0.98684945250044044483447009273862

Notes:
MS - store the displayed number
MR - recall the stored number

This is an example that demostrates limits of Windows Calculator:

Task : Calculate sin( 10^1025 * OneRadian ) Select 'Degrees' radio button

Test 1: Without arange reduction

1 8 0 / pi = MS 1 0 x^y 1 0 2 5 = * MR = sin
Result = 1

Test 2: With arange reduction

1 8 0 / pi = MS 1 0 x^y 1 0 2 5 = * MR = Mod 3 6 0 = sin
Result = -0.9986091762783810016791764873118

As you can see there is a significant drop in accuracy in Test 1 without a range reduction.

Notes:
MS - store the displayed number
MR - recall the stored number

Quoting iliyapolak

Here are results from Windows Calculatorfor sin( 10^1024 * OneRadian

How were you able to calculate 'sin' for such a large argument?
When I tried to dothis on my pc Calc.exe hangs.

What Windows OS did you use? Here is a screenshot with a version of Windows Calculator I used:

I've done a couple of more tests and here short results:

sin( 10^1024 * OneRadian ) - Calculated
sin( 10^1025 * OneRadian ) - Calculated ( with error )
sin( 10^1026 * OneRadian ) - Couldn't Calculate ( hangs )
sin( 10^2048 * OneRadian ) - Couldn't Calculate( hangs )
sin( 10^4096 * OneRadian ) - Couldn't Calculate( hangs )
sin( 10^8192 * OneRadian ) - Couldn't Calculate( hangs )

Here is a screenshot:

I pressed 'Continue' and it hangs, some time later the message is displayed again... I pressed 'Continue' and
it hangs... I repeatedthat 5 times and then"killed" the application.

Quoting iliyapolak

Calculated Values:
Sine of 10^22 radians: 0.41214336710708466000000000000000 - Range Reduction: No

Is this possible to disable range reduction of the periodic functions in VS C++compiler.

I've spent some time today investigating it and my answer is No. However, I've discovered a very interesting
feature ofCRTmath functions from Microsoft:

- There is a CRT function '_set_SSE2_enable' and it allows to enable or disable SSE2 support in math functions

- By default SSE2 support is in enabled state and, for example,'sin' function uses SSE2 instructions to do all calculations;
I'd like to stress, that it doesn't use Intel instruction 'fsin' in that case;

-If SSE2 support is indisabled state then, for example, 'sin' function doesn't use SSE2 instructions to do all calculations;
I'd like to stress, that it uses Intel instruction 'fsin' in that case;

- In both cases there is a range reduction for input argument

Notes:
- Call _set_SSE2_enable( 1 )to enable SSE2 support
- Call _set_SSE2_enable( 0 )todisable SSE2 support

Pages

Leave a Comment

Please sign in to add a comment. Not a member? Join today