Software Occlusion Culling Update

This is an update to the Software Occlusion Culling sample. This update consists of new features and optimizations which have reduced the total cull time and the total frame time by a factor of 4X and 2X respectively. Below is a screen shot of the updated sample.

Here is a list of the new features / updates that are included in this version of the sample:

  • New set of Occluders
  • New depth buffer view
  • VS2012 support
  • Rasterizer optimizations
  • Pipelining

New set of Occluders :
In the previous version, all castle walls along with the wooden pillars, tiny wooden trim and decorations were used as occluders to avoid special pre-processing of the art assets. However, we understand that the narrow pillars and tiny decorations are not good candidates for occluders in the scene. In this version, we chose only objects that are sufficiently large to occlude other objects in the scene. As shown in the image below only the castle walls (without the pillars and the wooden decorations) and the ground plane are used as occluders. This reduces the number of occluders that have to be rasterized to the depth buffer to 115 as opposed to 1628 in our previous version. Below is a screen shot of the occluders in the scene


New depth buffer view :
The sample supports a grey scale depth buffer view as shown below

VS2012 support:

The sample can now be compiled in VS2012. There are 2 projects for VS2010 and VS2012. One of them with ‘AVX’ in its name (SoftwareOcclusionCullingDX_2012_AVX / SoftwareOcclusionCullingDX_2010_AVX) is compiled with the /arch:AVX flag and can be compiled only on AVX supported systems. Use the other project (SoftwareOcclusionCullingDX_2012/ SoftwareOcclusionCullingDX_2010) on non AVX supported systems.

Rasterizer Optimizations:
Fabian Giesen has been optimizing this sample on github  and maintaining a blog. Most of the optimizations have been integrated into the sample.

Pipelining:
When software occlusion culling is enabled, once every frame the occluders are rasterized to the depth buffer on the CPU. Then the occludee AABB are rasterized and depth tested against the CPU rasterized depth buffer to generate a list of models that are visible to send to the GPU for rendering. When pipelining is enabled, the sample does not wait for the software occlusion culling algorithm to complete and generate the list of visible models. Instead occlusion culling is kicked off in frame n to generate a list of visible models and they are sent to the GPU for rendering in frame n+1.

Performance:

The performance for the updated Software Occlusion Culling sample was measured on a 2.3 GHz 3rd gen Intel® Core™ processor (Ivy Bridge) system with 4 core / 8 threads and Intel® HD Graphics 4000. We set the rasterizer technique to SSE, the occluder size threshold to 1.5, the occludee size threshold to 0.01, and the number of depth test tasks to 20. We enabled frustum culling and multi-tasking and disabled vsync. 
The castle scene has 115 occluder models and 48700 occluder triangles. It has 27025 occludee models (occluders are treated as occludees) and ~1.9 million occludee triangles.

The time taken to rasterize the occluders to the depth buffer on the CPU was ~0.71 milliseconds, and the time taken to depth test the occludees was ~0.67 milliseconds. The total time spent on software occlusion culling was ~ 1.38 milliseconds.

SSE
No Optimizations

Multi-threading +

Frustum Culling

Multi-threading +

Frustum Culling +

Depth test Culling

Frame rate(fps) 7.51 19.56 70.11
Frame time(ms) 133.51 51.12 14.26
# of  draw calls 23279 7360 1831
Per informazioni complete sulle ottimizzazioni del compilatore, consultare l'Avviso sull'ottimizzazione