• 10/30/2018
  • Public Content

Getting Credible Performance Numbers

Performance measurements are done on a large number of invocations of the same routine. Since the first iteration is almost always significantly slower than the subsequent ones, the minimum (or average, geometric mean, and so on) value for the execution time is usually used for final projections.
An alternative to calling kernel several times is using a single “warm-up” run.
The warm-up run might be helpful for kernels with small amount of computations, as it helps to amortize the following potential (one-time) costs:
  • Bringing data to the cache
  • Lazy object creation
  • Delayed initializations
  • Other costs, incurred by the OpenCL™ runtime
: You need to make your performance conclusions on reproducible data. If warm-up run does not help or execution time still varies, try running large number of iterations and then average the results. For time values that range too much, consider using
Consider the following:
  • For bandwidth-limited kernels, operating on the data that does not fit in the last-level cache, the warm-up run does not improve the stability of measurement significantly.
  • For a kernel with a small number of instructions executed over a small data set, make sure there is a sufficient number of iterations, so that the kernel run time is at least 20 milliseconds for CPU device.
  • Kernels with smaller run time might provide unreliable data, so increasing the amount of computations artificially gives you important insights into the hotspots. For example, you can add loop in the kernel, or replicate some pieces.
Refer to the “OpenCL™ Optimizations Tutorial”  SDK sample for code examples of performing warm-up run before starting performance measurement.

See Also

Product and Performance Information


Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.