I am trying to write a cube root routine that is many times faster than the susual C expression "pow(a,1.0/3.0)".Recently Ihave received outstanding help from the Intel Fortran Compiler Forum, and one of the routines that were suggested in that Forum was:
The idea is to write a low level C routine, based on the rational approximation that is nearly as efficient as the square root routine that is implemented in hardware. Square root uses 38 clock cycles on a Pentium 4 CPU, and a software cube root routine using something like 70-80 clock cycles is the goal. In the Fortran Forum it was recommended to contact the C forum, due to the many low level C experts in this forum.
In the routine in the link above I have made the following observations.
1) As the numerator and denominator in the rational approximation both can be written (((a* fr + b) * fr +c)*fr+d)*fr + e