I'm posting an update related to my VecLib project.While testing slightly optimized version of sine function where the sine convergence
is achieved with the help of SSE inline assembly I ran into some problem.I eliminated one instruction which performed explicit multiplication of the argument by x^2 so the total count of instruction per one term was three,but the accurracy was greatly reduced up to 2-3 decimal places.Double precision primitives were used so the loss of accurracy can not be blamed for the inaccurate result.
I suspect that somehow combined multiplication of an argument by pre-calculated coefficient coupled with exponentiation of the argument all of it performed in the same xmm0 register which served as an accumulator could have caused probably the loss in accuracy.
I rewrote the inline asm block and removed the load of xmm0 register by adding another instruction which multiplies the argument by x^2 and that problem dissapeared.
Please look at Vec_Sin_f() function inline assembly code block and Vec_Cos_d() inline assembly block.
Here is the part of optimized code which is responsible for the loss of accuracy.This code calculate third term of Taylor expansion by mean of of Horner scheme
mulpd xmm0,xmm1This part of the code is responsible for the inaccurrate result
And here is the part of corrected code.I added another instruction which performs a multiplication of an argument by x^2 in different register
Thanks in advance.