Wide BVH Traversal with a Short Stack

By Karthik Vaidyanathan, Carsten Benthin, and Sven Woop

Published:11/20/2019   Last Updated:11/19/2019

Diagram of  BVH-4 transvsersal
Figure 1. Example BVH-4 traversal with a short stack, showing the sorted children Q and the restart trail count at each level. i The count at level 1 is set to 4, as there are no entries pushed onto the stack at this level.ii Processing node I yields no hit children. Therefore, node J is popped from the stack, incrementing the count at level 2.iii Processing node J yields no hit children. Therefore, node K (the last node corresponding to level 2), is popped from the stack and the counter is set to 3.iv Processing node K yields no hit children. The following pop operation skips levels 1 and 2 (gray) as they indicate the last child was already traverseds and node B (level 0) ispopped from the stack.


Compressed wide bounding volume hierarchies can significantly improve the performance of incoherent ray traversal, through a smaller working set of inner nodes and therefore a higher cache hit rate. While inner nodes in the hierarchy can be compressed, the size of the working set for a full traversal stack remains a significant overhead. In this paper we introduce an algorithm for wide bounding volume hierarchy (BVH) traversal that uses a short stack of just a few entries. This stack can be fully stored in scarce on-chip memory, which is especially important for GPUs and dedicated ray tracing hardware implementations. Our approach in particular generalizes the restart trail algorithm for binary BVHs to BVHs of arbitrary widths. Applying our algorithm to wide BVHs, we demonstrate that the number of traversal steps with just five stack entries is close to that of a full traversal stack. We also propose an extension to efficiently cull leaf nodes when a closer intersection has been found, which reduces ray primitive intersections by up to 14%.

Research Area: Rendering, ray tracing, bounding volume hierarchy (BVH), GPU systems.

Wide BVH Traversal with a Short Stack (1.31 MB, PDF)
HPG 2019 Slides (4.351 MB, PPT)

Published in High-Performance Graphics 2019

Product and Performance Information


Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804