Deferred Rendering for Current and Future Rendering Pipelines

By Andrew Lauritzen
Intel Corporation

Beyond Programmable Shading, SIGGRAPH Courses 2010

This sample demonstrates a number of deferred rendering techniques including conventional deferred shading, deferred lighting and tile-based deferred shading. Tile-based deferred shading is implemented in DirectX 11 Compute Shader and achieves high performance even with many lights by amortizing the light culling overhead over screen tiles as well as grouping lights to avoid wasting memory bandwidth. Multi-sample anti-aliasing (MSAA) is fully supported for all techniques. The tile-based techniques in particular use efficient user-space scheduling to apply per-sample shading to only the edge pixels that require it.

Citation: Andrew Lauritzen, Deferred Rendering for Current and Future Rendering Pipelines, Beyond Programmable Shading, SIGGRAPH 2010, July 2010.

Slides: deferred_shading_siggraph_2010.pdf (990KB)

Code: deferred_shading.zip (40MB)

Videos & Screenshots

Screenshot 1
Screenshot 1

Screenshot 2
Andrew Lauritzen
08-14-2010
08-14-2010
Tech Articles
 
Research
 
 
 
no
Article presented at SIGGRAPH 2010 demonstrates a number of deferred rendering techniques including conventional deferred shading, deferred lighting and tile-based deferred shading.
For more complete information about compiler optimizations, see our Optimization Notice.

Comments

I implemented this method and

I implemented this method and I was suprised that the DEFER_PER_SAMPLE version was actually slower than the classic way of handling deferred MSAA 'edge' texels.

Was thinking that my lighting function was too complicated, so I run this sample again (on my Radeon HD 7950) and this sample is actually slower when using the DEFER_PER_SAMPLE version (I got ~300 fps with DEFER_PER_SAMPLE and ~400 fps with only tiled deferred shading when running this sample).

I remember that a few years ago this method was a gain, so I'm curious if the authors could give some clues about the reasons why it is n more interesting to defer the MSAA samples and better process them "brute-force" and pray for warp/branch coherency?

Thanks.