• 10/30/2018
  • Public Content
Contents

Getting Credible Performance Numbers

Performance measurements are done on a large number of invocations of the same routine. Since the first iteration is almost always significantly slower than the subsequent ones, the minimum (or average, geometric mean, and so on) value for the execution time is usually used for final projections.
An alternative to calling kernel several times is using a single “warm-up” run.
The warm-up run might be helpful for kernels with small amount of computations, as it helps to amortize the following potential (one-time) costs:
  • Bringing data to the cache
  • Lazy object creation
  • Delayed initializations
  • Other costs, incurred by the OpenCL™ runtime
NOTE
: You need to make your performance conclusions on reproducible data. If warm-up run does not help or execution time still varies, try running large number of iterations and then average the results. For time values that range too much, consider using
geomean
.
Consider the following:
  • For bandwidth-limited kernels, operating on the data that does not fit in the last-level cache, the warm-up run does not improve the stability of measurement significantly.
  • For a kernel with a small number of instructions executed over a small data set, make sure there is a sufficient number of iterations, so that the kernel run time is at least 20 milliseconds for CPU device.
  • Kernels with smaller run time might provide unreliable data, so increasing the amount of computations artificially gives you important insights into the hotspots. For example, you can add loop in the kernel, or replicate some pieces.
Refer to the “OpenCL™ Optimizations Tutorial”  SDK sample for code examples of performing warm-up run before starting performance measurement.

See Also

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804