• 03/25/2021
  • Public Content
Contents

Temporary Register Variable Usage

Each thread on an execution unit has its own set of registers to store values. The more work that can be done using register to register operations can help reduce memory penalties. However, if there are more temporary variables than registers, some of those variables will have to be stored in memory, where reading and writing have a latency cost. Avoiding this spillover can help to improve performance.
On Xᵉ-LP reducing register pressure allows not only to increase SIMD width, but also significantly better code scheduling. More information is available here : https://www.slideshare.net/IntelSoftware/the-architecture-of-11th-generation-intel-processor-graphics/14
When writing shaders, the following guidelines should be considered to help reduce spillover and improve performance:
  • Try to optimize the number of temporaries to 16 or fewer per shader. This limits the number of register transfers to and from main memory, which has higher latency costs. Check the instruction set assembly code output and look for spill count. Spills are a good opportunity for optimization as they reduce the number of operations that depend on high latency memory operations. This can be done in Intel GPA by selecting a shader and choosing to look at the machine code generated by the compiler.
  • If possible, move the declaration and assignment of a variable closer to where it will be referenced.
  • Weigh the options between full and partial precision on variables as this can store more values in the same space. Use caution when mixing partial precision with full precision in the same instruction as it may cause redundant type conversions.
  • Move redundant code that is common between branches out of the branch. This can reduce redundant variable duplication.
  • Avoid non-uniform access to constant buffer/buffer data. Non-uniform access requires more temporary registers to store data per SIMD lane.
  • Avoid control flow decision based on constant buffer data, as this forces the compiler to generate sub-optimal machine code. Instead, use specialization constants, or generate multiple specialized shader permutations.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.