# vectorization of sin/cos results in wrong values

For more complete information about compiler optimizations, see our Optimization Notice.

For more complete information about compiler optimizations, see our Optimization Notice.

Quoting iliyapolak

Hi Sergey!

How were you able to disable a range reduction in Windows Calculator?

I need to check Calc.exe imports...Please check it andlet us knowif you find something interesting. To be honest, I'm very impressed with capabilities

of Windows Calculator andI finally understood why it is so accurate. Please take a look at its help:

WinCalc Help->Index->precisionand this is what it displays:

Understanding Extended Precision[SergeyK] Is that a128-bit precision?Extended Precision, a feature of Calculator, means that all operations are accurate to

at least 32 digits. Calculator also stores rational numbers as fractions to retain accuracy.

For example, 1/3 is stored as 1/3, rather than .333. However, errors accumulate during

repeated operations on irrational numbers. For example, Calculator will truncate pi to 32 digits,

so repeated operations on pi will lose accuracy as the number of operations increases.

QuotingTimP (Intel)...I suppose you can perform range reduction yourself before calling the library function...There is a small problem here. It is impossible to use '

%' on any FP-data types.A call to 'fmod' CRT functioncreates additional overhead. An example of acompilation error with '

%' operator is below:...

float fValue1 = 765.0f;

float fValue2 = fValue1 % ( int )360;

...

Compilation error:

...

..\Test\DspSetTest.cpp(824) : error C2296: '%' : illegal, left operand has type 'float'

...

QuotingTimP (Intel)...You could profile or disassemble to see whether the library function usesfsinorfsincos...

It uses

SSE2by default (SSE2support is enabled ),orIntelinstruction'fsin' whenSSE2support is disabled.Please take a look at http://software.intel.com/en-us/forums/showpost.php?p=190884

Quoting iliyapolak

...sin() functions which in turn calls x87fsin. So disabling a range reduction is not possible becausefsinperforms in range reduction in the hardware...I hope that you've already looked at:

http://software.intel.com/en-us/forums/showpost.php?p=190884

and

http://software.intel.com/en-us/forums/showpost.php?p=190885

Microsoft software developers implemented CRT'

sin'functionin a different way. I'll provide technicaldetails later.Best regards,

Sergey

Hi Sergey!

I have also investigated MSVCRT.DLL dynamic library which is responsible for runtime math function.

I disassembled this library and counted at least three sin function.One of them uses x87 fsin to perform calculation.Later I will post code snippets.

My search engine skills aren't up to the task of researching this, but surely after all the discussion about how fsin does range reduction you don't expect a single standard library function or x87 intrinsic (fprem) to do the job. If it did, we wouldn't have to settle for these things being buried in proprietary code, or obscured beyond belief in open source code.

I didn't think you'd take my suggestion of performing your own range reduction and then calling your compiler's library function as a way to increase speed. If I wasn't clear, I meant it as a step to comparing the accuracy of your method against the one in the library function.

Did you look at any posted code examples such as Moshier's cephes library on netlib?

You could look in glibc or newlib, but those seem to set a priority on obscurity.

Plauger's "The Standard C Library" gives the shortest useful explanation I've seen of a typical compromise between speed and accuracy.

For example, my quick and dirty method for 128-bit long double involves these steps:

calculate how many multiples of Pi/2 you must subtract to produce |x| < Pi/4.

Take a value of Pi/2 correctly rounded to double precision (best double approximation to Pi/2); working in long double, subtract the multiple of the double value, then subtract the multiple of the difference between Pi/2 and your double rounded value of Pi/2. You must use compiler options which follow the standards on associativity with parentheses (not icc default). In effect, you are using a value of Pi/2 which is correct to the best of your ability to double plus long double precision. Long double and a half, in the best case.

There have been published implemented algorithms for numerically exact range reduction; but I haven't got my hands on them.

I don't think Bill Plauger got rich trying to explain this, so I'm not expecting much.

I have Windows 7.

I tried three times to calculate sine of 10^1024 radians ,but unfortunately Calc.exe always has been in "hang" state.

Here is the disassembled code from one of the CRT library sin() functions.You can clearly see the x87 'fsin' call.

The investigated CRT sine implementation is called

_CIsin which is compiler optimized version of many sin() implementations.Please read this post:

http://software.intel.com/en-us/forums/showthread.php?t=105474post

#201

MSVCRT.DLL has three sine functions:

1.__libm_sse2_sin >>

This function uses SSE instructions which are part of the Horner scheme approximation of the sine function.2._CIsin>>

which is compiler optimized version of many sin() implementations please lookhere:http://software.intel.com/en-us/forums/showthread.php?t=105474post #2013.sin function>>

asfar as I was able to understand assembly code this function also calls 'fsin'.You can also look at FDLIBM 5 free math library which implements range-reduction for trigonometric functions.

My method is slightly different.

I start with a large radian argument and divide it by 2*Pi radians.Next I have a value which represents a number of full 2*Pi cycles and the decimal part of this number I isolate and extract.

Next this decimal part is multiplied by 2*Pi in order to obtain the proper quadrant from this value I extract decimal part which is multiplied by Pi/2 and the result is subtracted from Pi/2.By knowing the exact part of the full sine cycle which is obtained from the first decimal value I use if- statement which returns a proper quadrant sign.

It is hard to tell without seeing the source code or doing the full disassembling.

It seems to me that probably when the "Extended Precision" mode is enabled Calc.exe could possibly switch to some kind of "poor's man" arbitrary precision arithmetics.

Afaik some of the arbitrary precision implementations storerational numbers as a fractions.

Here is link to Microsoft page describing '

_set_SSE2_enable' :http://msdn.microsoft.com/en-us/library/bthd138d.aspxInteresting that there is no support for trigonometric functions beside the

'atan'.From my own investigation I know that run-time library implements SSE - coded sin() function.

Quoting TimP (Intel)

My search engine skills aren't up to the task of researching this, but surely after all the discussion about howfsindoes range reduction you don't expect a single standard library function or x87 intrinsic (fprem) to do the job...[

SergeyK] IfI will decide to implementmy ownSinebased on Intelfsininstruction I won't do a rangereduction at the beginning. There is a high probability ( let's say 95% )that an input argument in radians

will be in a range from

-2^63to+2^63. So, simply try to usefsinand check for anerror. If there issome error than do a range reduction and call

fsinagain. If there is no error afterthe 1st call tofsinthen return a result.For example, my quick and dirty method for 128-bit long double...[

SergeyK] Unfortunately,I don't use a'long double' FP-type. I've done a couple of tests yesterdayfor a '

double' FP-type ( 53-bit precision ) and my range reduction codeworks well up to10^12*Radians.At

10^14*Radiansaccuracy drops ( just 2 or 3 digits are correct depending on a value of an angle ),and after

10^16*Radiansall resultsare unacceptable.I suppose that 'fsin' is doing range reduction already on the values larger than [-Pi/4,Pi/4].

You can also choose the Java approach anddevelop your own range reduction method and call it if the range is larger than [-Pi/4,Pi/4] after method returns the properly reduced value you can pass it to the fsin.

Quoting iliyapolak

...Interesting that there is no support for trigonometric functions beside the'atan'...I just checked a description for '

_set_SSE2_enable' function in online version ofMSDNandin my local installationMSDNandI see that both are not up-to-date. The list of CRT functions with enabled support for

SSE2instructions is not complete.Sergey please read this post

#71there you can clearly see that Microsoft has various implementation of elementary functions.Take a look at this function '

__libm_sse2_sin'this is a library of SSE2 coded elementary functions.You can use dumpbin.exe and study the MSVCRT.DLL imports.

As Jennifer mentioned previously was coming, the Intel(R) C++ Composer XE 2013 product is now available and resolves the reported issue with vectorization of sin/cos. Let us know if you run into any problems with that version.

Technical Consulting Engineer

For 1:1 technical support: http://premier.intel.com

Software Product Support info: http://www.intel.com/software/support

## Pages

## Leave a Comment

Please sign in to add a comment. Not a member? Join today