Software Occlusion Culling Update

Note: This version is obsolete but is being retained for historical purposes. Please check out the latest version: Software Occlusion Culling

This is an update to the Software Occlusion Culling sample. This update consists of new features and optimizations which have reduced the total cull time and the total frame time by a factor of 4X and 2X respectively. Below is a screen shot of the updated sample.

Here is a list of the new features / updates that are included in this version of the sample:

  • New set of Occluders
  • New depth buffer view
  • VS2012 support
  • Rasterizer optimizations
  • Pipelining

New set of Occluders :
In the previous version, all castle walls along with the wooden pillars, tiny wooden trim and decorations were used as occluders to avoid special pre-processing of the art assets. However, we understand that the narrow pillars and tiny decorations are not good candidates for occluders in the scene. In this version, we chose only objects that are sufficiently large to occlude other objects in the scene. As shown in the image below only the castle walls (without the pillars and the wooden decorations) and the ground plane are used as occluders. This reduces the number of occluders that have to be rasterized to the depth buffer to 115 as opposed to 1628 in our previous version. Below is a screen shot of the occluders in the scene

New depth buffer view :
The sample supports a grey scale depth buffer view as shown below


VS2012 support:

The sample can now be compiled in VS2012. There are 2 projects for VS2010 and VS2012. One of them with ‘AVX’ in its name (SoftwareOcclusionCullingDX_2012_AVX / SoftwareOcclusionCullingDX_2010_AVX) is compiled with the /arch:AVX flag and can be compiled only on AVX supported systems. Use the other project (SoftwareOcclusionCullingDX_2012/ SoftwareOcclusionCullingDX_2010) on non AVX supported systems.

Rasterizer Optimizations:
Fabian Giesen has been optimizing this sample on github  and maintaining a blog. Most of the optimizations have been integrated into the sample.

When software occlusion culling is enabled, once every frame the occluders are rasterized to the depth buffer on the CPU. Then the occludee AABB are rasterized and depth tested against the CPU rasterized depth buffer to generate a list of models that are visible to send to the GPU for rendering. When pipelining is enabled, the sample does not wait for the software occlusion culling algorithm to complete and generate the list of visible models. Instead occlusion culling is kicked off in frame n to generate a list of visible models and they are sent to the GPU for rendering in frame n+1.


The performance for the updated Software Occlusion Culling sample was measured on a 2.3 GHz 3rd gen Intel® Core™ processor (Ivy Bridge) system with 4 core / 8 threads and Intel® HD Graphics 4000. We set the rasterizer technique to SSE, the occluder size threshold to 1.5, the occludee size threshold to 0.01, and the number of depth test tasks to 20. We enabled frustum culling and multi-tasking and disabled vsync. 
The castle scene has 115 occluder models and 48700 occluder triangles. It has 27025 occludee models (occluders are treated as occludees) and ~1.9 million occludee triangles.

The time taken to rasterize the occluders to the depth buffer on the CPU was ~0.71 milliseconds, and the time taken to depth test the occludees was ~0.67 milliseconds. The total time spent on software occlusion culling was ~ 1.38 milliseconds.

No Optimizations

Multi-threading +

Frustum Culling

Multi-threading +

Frustum Culling +

Depth test Culling

Frame rate(fps)7.5119.5670.11
Frame time(ms)133.5151.1214.26
# of  draw calls2327973601831
For more complete information about compiler optimizations, see our Optimization Notice.



Cool stuff here! I believe I found a slight issue, it's an edge case though but I thought I'd share my findings :) If there are no occluders, the SSE implementation of the depth rasteriser doesn't update the depth summary (aka the hi-z).

i.e. DepthBufferRasterizerSSEST::RasterizeBinnedTrianglesToDepthBuffer() in the allBinsEmpty case, needs to break rather than return, so CreateCoarseDepth() is always called.

Regardless, great work! Very impressive occlusion solution.

Hi Maarten, We have fixed the zip file for some reason it failed to upload correctly the first time. We have also uploaded a tar.gz file. Hope this helps

FYI, when I test the zip file it says;

End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in one of or, and cannot find, period.

Sample code ZIP file is still broken. Verified it on OSX, Windows and Linux. Can you provide a tar.gz file? Thanks

I have tested the zip file by unzipping it on a couple of different machines and it works fine. Can you please download the zip file again and try

I cannot unzip it, either.

I cannot unzip it, either.

I downloaded the zip and ran the sample from it. It seems to work fine for me. Can you give more details on what is broken or what errors you are seeing?

The file seems to be broken. Can you check the file for errors? Thanks.

The file seems to be broken. Can you check the file for errors? Thanks.

Add a Comment

Have a technical question? Visit our forums. Have site or software product issues? Contact support.