• 10/30/2018
  • Public Content
Contents

Comparing OpenCL™ and Native Code Performance

When comparing an OpenCL™ kernel performance on CPU device with native code performance, make sure that both versions of code are as similar as possible. Consider the following guidelines:
  • Wrap exactly the same set of operations.
  • Do not include program build time in the kernel execution time. You can amortize this step by program precompilation using the
    clCreateProgramFromBinary
    call.
  • Track data transfers costs separately.
  • Use data mapping to make data transfers similar to the way data is passed in native code (by use of pointers). Refer to the Mapping Memory Objects (USE_HOST_PTR) section
  • Ensure the working set is identical for native and OpenCL code.
  • Make the memory access patterns equal (row-wise compared to column-wise).
  • Demand the same accuracy. Consider the example for CPU device.
    rsqrt(x)
    is inherently of the higher accuracy than
    __mm_rsqrt_ps
    SSE intrinsic. To use the same accuracy in native code and OpenCL code, do one of the following:
    • Equip
      __mm_rsqrt_ps
      in your native code with couple of additional Newton-Raphson iterations to match the precision of OpenCL™
      rsqrt
      .
    • Use
      native_rsqrt
      in your OpenCL™ kernel, which maps exactly to the
      rsqrtps
      instruction in the final assembly code.
    • Use the relaxed-math compilation flag to enable similar accuracy for the whole program. Similarly to
      rsqrt
      , you can use the relaxed versions of
      rcp
      ,
      sqrt
      , and so on. Refer to the Developer Guide for Intel® SDK for OpenCL™ Applications for the full list of supported functions.

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804