As a follow-up to Adaptive Volumetric Shadow Maps for DirectX* 11, we present a port of the same algorithm adapted for Android* devices that support OpenGL ES* 3.1 and the GL_INTEL_fragment_shader_ordering OpenGL* extension.
Beyond being just a simple port, this version includes a number of optimizations and tradeoffs that allow the algorithm to run on low power mobile devices (such as tablets and phones), in contrast to the previous sample which targeted Ultrabook™ system-level hardware.
The AVSM algorithm allows for generation of dynamic shadowing and self-shadowing for volumetric effects such as smoke, particles, or transparent objects in real-time rendering engines using today’s Intel® hardware.
To achieve this, each texel of the AVSM shadow map stores a compact approximation of the transmittance curve along the corresponding light ray.
Transmittance curve expressed using 4 nodes
The main innovation of the technique is a new streaming compression algorithm that is capable of building a constant-storage, variable-error representation of a visibility curve that represents the light ray’s travel through the media that can be used in later shadow lookups. This essentially means that every time a new partial shadow caster (light occluder) is added to the shadow map, the algorithm performs an optimal lossy compression of the transmittance curve – for each texel individually.
Lossy compression: reduction from 4 to 3 nodes by removal of the least visually significant node (A)
This algorithm relies on the GL_INTEL_fragment_shader_ordering OpenGL extension that enforces deterministic shader execution order on the per-pixel level based on triangle submission order. This accomplishes two important goals that are needed by the AVSM algorithm:
- Synchronization, allowing for thread-safe data structure access.
- Per-pixel shader execution ordering (based on triangle submission order), allowing for deterministic behavior of lossy compression between subsequent frames, which prevents temporal visual artifacts (“flickering”) that otherwise appear.
On DirectX 11 this feature is available through Intel Pixel Synchronization Extension, or, more recently, natively through DirectX 11.3 and DirectX 12 feature called Raster Order Views.
Example of AVSM smoke shadows (disabled - left, enabled - right) in Lego Minifigures* Online by Funcom Productions A/S
Below is a list of the main differences between the Android and the original DirectX implementations. The differences are mostly focused on optimizing the algorithm performance for low power target hardware such as tablets and phones:
- Transparent smoke particles are rendered into a lower resolution frame buffer and blended to the native resolution render target to reduce the blending cost of the high overdraw. This is not an AVSM-specific optimization but was necessary for such an effect to be practical on the target hardware.
- In some scenarios, mostly when shadow casters are not moving too quickly in reference to the AVSM shadow map matrix, the shadow map can be updated only every second frame to reduce the cost. To balance computation across both frames, some operations (such as clearing the buffers) can then be performed in the alternate frame.
- Only every second smoke particle can be added to AVSM map but with twice the opacity. This slightly reduces visual quality but improves insertion performance by a factor of two.
- To reduce the cost of sampling the AVSM shadows, the sampling can be moved from per-pixel to per-vertex frequency. The old (DirectX11) sample uses screen space tessellation to achieve this at no quality loss compared to per-pixel sampling. This sample, however, uses a geometry shader to output a fixed billboard quad made up of four triangles and five vertices. AVSM sampling and interpolation using a five vertex quad (one in the middle in addition to four corners) provides good balance between quality and performance better suited to target hardware.
- For receiver geometry that is always behind shadow casters (such as the ground), full sampling is unnecessary and replaced by only reading the value of the last node.
Sample running on Tesco hudl2* device with Android*
The sample UI provides ways of toggling on/off or tweaking most of the above listed optimizations, as a way of demonstrating the cost and visual quality tradeoffs.
The sample code will run on any OpenGL ES 3.1 device that supports GL_INTEL_fragment_shader_ordering extension. Devices that support that extension include Intel® Atom™ processor-based tablets (code named Bay Trail or Cherry Trail).
Please refer to README.TXT included in the archive for build instructions.
Based on Adaptive Volumetric Shadow Maps, EGSR 2010 paper by Marco Salvi.