Software Occlusion Culling Update

Note: This version is obsolete but is being retained for historical purposes. Please check out the latest version: Software Occlusion Culling

This is an update to the Software Occlusion Culling sample. This update consists of new features and optimizations which have reduced the total cull time and the total frame time by a factor of 4X and 2X respectively. Below is a screen shot of the updated sample.

Here is a list of the new features / updates that are included in this version of the sample:

  • New set of Occluders
  • New depth buffer view
  • VS2012 support
  • Rasterizer optimizations
  • Pipelining

New set of Occluders :
In the previous version, all castle walls along with the wooden pillars, tiny wooden trim and decorations were used as occluders to avoid special pre-processing of the art assets. However, we understand that the narrow pillars and tiny decorations are not good candidates for occluders in the scene. In this version, we chose only objects that are sufficiently large to occlude other objects in the scene. As shown in the image below only the castle walls (without the pillars and the wooden decorations) and the ground plane are used as occluders. This reduces the number of occluders that have to be rasterized to the depth buffer to 115 as opposed to 1628 in our previous version. Below is a screen shot of the occluders in the scene


New depth buffer view :
The sample supports a grey scale depth buffer view as shown below

 

VS2012 support:

The sample can now be compiled in VS2012. There are 2 projects for VS2010 and VS2012. One of them with ‘AVX’ in its name (SoftwareOcclusionCullingDX_2012_AVX / SoftwareOcclusionCullingDX_2010_AVX) is compiled with the /arch:AVX flag and can be compiled only on AVX supported systems. Use the other project (SoftwareOcclusionCullingDX_2012/ SoftwareOcclusionCullingDX_2010) on non AVX supported systems.

Rasterizer Optimizations:
Fabian Giesen has been optimizing this sample on github  and maintaining a blog. Most of the optimizations have been integrated into the sample.

Pipelining:
When software occlusion culling is enabled, once every frame the occluders are rasterized to the depth buffer on the CPU. Then the occludee AABB are rasterized and depth tested against the CPU rasterized depth buffer to generate a list of models that are visible to send to the GPU for rendering. When pipelining is enabled, the sample does not wait for the software occlusion culling algorithm to complete and generate the list of visible models. Instead occlusion culling is kicked off in frame n to generate a list of visible models and they are sent to the GPU for rendering in frame n+1.

Performance:

The performance for the updated Software Occlusion Culling sample was measured on a 2.3 GHz 3rd gen Intel® Core™ processor (Ivy Bridge) system with 4 core / 8 threads and Intel® HD Graphics 4000. We set the rasterizer technique to SSE, the occluder size threshold to 1.5, the occludee size threshold to 0.01, and the number of depth test tasks to 20. We enabled frustum culling and multi-tasking and disabled vsync. 
The castle scene has 115 occluder models and 48700 occluder triangles. It has 27025 occludee models (occluders are treated as occludees) and ~1.9 million occludee triangles.

The time taken to rasterize the occluders to the depth buffer on the CPU was ~0.71 milliseconds, and the time taken to depth test the occludees was ~0.67 milliseconds. The total time spent on software occlusion culling was ~ 1.38 milliseconds.

 SSE
No Optimizations

Multi-threading +

Frustum Culling

Multi-threading +

Frustum Culling +

Depth test Culling

Frame rate(fps)7.5119.5670.11
Frame time(ms)133.5151.1214.26
# of  draw calls2327973601831
For more complete information about compiler optimizations, see our Optimization Notice.

10 comments

Top
Phil B.'s picture

Cool stuff here! I believe I found a slight issue, it's an edge case though but I thought I'd share my findings :) If there are no occluders, the SSE implementation of the depth rasteriser doesn't update the depth summary (aka the hi-z).

i.e. DepthBufferRasterizerSSEST::RasterizeBinnedTrianglesToDepthBuffer() in the allBinsEmpty case, needs to break rather than return, so CreateCoarseDepth() is always called.

Regardless, great work! Very impressive occlusion solution.

Charumathi Chandrasekaran (Intel)'s picture

Hi Maarten, We have fixed the zip file for some reason it failed to upload correctly the first time. We have also uploaded a tar.gz file. Hope this helps

Maarten Hoeben's picture

FYI, when I test the zip file it says;

Archive: softwareocclusionculling.zip
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in one of softwareocclusionculling.zip or
softwareocclusionculling.zip.zip, and cannot find softwareocclusionculling.zip.ZIP, period.

Maarten Hoeben's picture

Sample code ZIP file is still broken. Verified it on OSX, Windows and Linux. Can you provide a tar.gz file? Thanks

Charumathi Chandrasekaran (Intel)'s picture

I have tested the zip file by unzipping it on a couple of different machines and it works fine. Can you please download the zip file again and try

Wenchao H.'s picture

I cannot unzip it, either.

Wenchao H.'s picture

I cannot unzip it, either.

Charumathi Chandrasekaran (Intel)'s picture

I downloaded the zip and ran the sample from it. It seems to work fine for me. Can you give more details on what is broken or what errors you are seeing?

mhoeben's picture

The file softwareocclusionculling.zip seems to be broken. Can you check the file for errors? Thanks.

mhoeben's picture

The file softwareocclusionculling.zip seems to be broken. Can you check the file for errors? Thanks.

Add a Comment

Have a technical question? Visit our forums. Have site or software product issues? Contact support.