I have been running benchmarks comparing Intel VML (from MKL library) math functions with the current Open CL implementation. The was no measurable difference between AMD driver and Intel driver. However, functions (sin, cos, exp, log...) are not (auto) vectorized and reach only about 10% of speed from Intel VML. I hope you will find some ways to improve on that. It really puts Open CL based solution at a huge disadvantage in compare to say an algorithm written in C++.