For my personal use I have implemented routines to compute natural logarithms on vectors in single precision. They are faster than the MKL "EP" version and essentially as accurate (ulp<0.95 vs. ulp<0.88) as the MKL "LA" version.
Basic algorithm is a 11th order optimal polynomial for 0.75
The functions are:
//for large vectors
void log( float* y, float* x, long int n );
//for small vectors
__v4sf log( __v4sf x );
which you can get from here:
(compiles on 64-bit linux with g++-4.2.1 )
It is obvious that a similar speed-up for 'sin', 'cos', 'exp', etc. can also be achieved. I also think it is possible to significantly speed-up the double precision versions as well, as every extra term in the series only adds 0.25 cycles/element, i.e. going from 11th to 21st order polynomial only adds 2.5 cycles/element, so I'd expect a double precision version to complete in about 12 cycles.
Personally I don't need these functions or the higher precision, but if anyone is interested...I'm for part-time hire. And because I really, really enjoy doing stuff like this, I'm a bargain.