Distributing Shadow Map Calculations

Before joining Intel, one of the re-occurring issues with balancing performance and quality in games was how to handle the shadows for slow changing or stationary lights such as the Sun. For example a recent game I worked on used a combination of cascaded shadow maps along with a low resolution shadow map covering the entire map from the large scale terrain, with the computation for these shadow maps being distributed over several frames - the nearby shadow map was calculated every frame, with more distant shadow maps calculated less frequently. By reducing the rate at which the shadow maps are calculated performance can be increased with some trade off in quality, but one of the limiting factors is the frame rate hitching (brief drops in frame rate) which can occur when the rendering load isn’t evenly balanced across the frames.

A while back I took a look at whether shadow maps which change slowly could be calculated at a decent rate using the CPU, through the Windows Advanced Rasterization Platform (WARP), and transferred to the graphics adapter for consumption in the scene shading. Since most modern CPUs have several hardware threads which can run different code, this calculation could be handled asynchronously with the normal rendering and transferred to the graphics adapter when complete. The prototype was sufficiently interesting that Zane Mankowski, along with Jeff Andrews and Steve Smith took up the work required to turn it into a sample and article, and this will soon be available along with other samples targeting processors with the microarchitecture codename Sandy Bridge.

I’ll update this blog – or perhaps even write a whole new one - when the sample goes live.

[Quick update] The sample, called Onloaded Shadows, has just gone live along with others here.

[Yet Another Quick update] There's now an article up on Gamasutra, along with some discussion.
For more complete information about compiler optimizations, see our Optimization Notice.