I have never seen compilers (GNU or Intel) generating Newton-Raphson (NR) constructs for faster double precision (DP) divides or square roots. I know that there are no DP equivalents of RCPSS, RCPPS, RSQRTSS and RSQRTPS. 3 questions :
- Why there is no DP equivalents of RCPSS, RCPPS, RSQRTSS and RSQRTPS ?
- Is it possible, with compiler flags, to generate NR constructs for DP using the existing fast single precision RCP and RSQRT instructions (with a higher number of NR iterations, probably 4 or 5 instead of 2, something like that) ?
- If not possible, why ? Not efficient ? No demand/interest for faster DP (precision near from DP) divides or square roots ?
Thank you in advance