When comparing OpenCL* kernel performance with native code (for example, C or Intel® Streaming SIMD Extensions (Intel® SSE) intrinsic), make sure that both versions are as similar as possible:
rsqrt(x)is inherently of the higher accuracy than
__mm_rsqrt_psSSE intrinsic. To use the same accuracy in native code and OpenCL* code, do one of the following:
__mm_rsqrt_psin your native code with couple of additional Newton-Raphson iterations to match the precision of OpenCL*
native_rsqrtin your OpenCL* kernel, which maps exactly to the
rsqrtpsinstruction in the final assembly code.
rsqrt, there are relaxed versions for
sqrt, etc. Refer to “Working with the -cl-fast-relaxed-math Flag” of the User’s Guide (see Related Documents) for the full list
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804