• 2019 Update 4
  • 03/20/2019
  • Public Content

Getting Credible Performance Numbers

Performance measurements are done on a large number of invocations of the same routine. Since the first iteration is almost always significantly slower than the subsequent ones, the minimum value for the execution time is usually used for final projections. Projections could also be made using other measures such as average or geometric mean of execution time.
An alternative to calling the kernel many times is to use a single “warm-up” run.
The warm-up run might be helpful for small or "lightweight" kernels, for example, the kernels with execution time less than 10 milliseconds. Specifically, it helps to amortize the following potential (one-time) costs:
  • Bringing data to the cache
  • “Lazy” object creation
  • Delayed initializations
  • Other costs incurred by the OpenCL™ runtime.
You need to build your performance conclusions on reproducible data. If the warm-up run does not help or execution time still varies, you can try running a large number of iterations and then average the results. For time values that range too much use
Consider the following:
  • For bandwidth-limited kernels, which operate on the data that does not fit in the last-level cache, the ”warm-up” run does not have as much impact on the measurement.
  • For a kernel with a small number of instructions executed over a small data set, make sure there is a sufficient number of iterations, so the kernel runs for at least 20 milliseconds.
Kernels that are very lightweight do not give reliable data, so making them artificially heavier could give you important insights into the hotspots. For example, you can add loop in the kernel, or replicate its heavy pieces.
Refer to the “OpenCL Optimizations Tutorial” SDK sample for code examples of performing the warm-up activities before starting performance measurement. You can download the sample from the Intel® SDK for OpenCL Applications website at intel.com/software/opencl/.

Product and Performance Information


Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804