Optimizing Utilization of Execution Units

When you tune your programs for execution on the Intel® Graphics device to improve performance, be aware of the way your kernels are executed on the hardware:

  • Optimize the number of work-groups
  • Optimize the work-group size
  • Use barriers in kernels wisely
  • Optimize thread utilization

The primary goal of every throughput computing machine is to keep a sufficient number of work-groups active, so that if one is stalled, another can run on its hardware resource.

The primary things to consider:

  • Launch enough work items to keep EU threads busy, keep in mind that compiler may pack up to 32 work items per thread (with SIMD-32).
  • In short/lightweight kernels: use short vector data types and compute multiple pixels to better amortize thread launch cost.
