Pre-Compositing Textures for Terrain Rendering

Download Article

Download Pre-Compositing Textures for Terrain Rendering [PDF 632KB]

Introduction

There are many ways to generate and render terrain, but for years one of the most common ways has been to generate a terrain from a heightmap and use a mask texture to blend, or composite, diffuse textures together. Depending on the number of textures blended, this can be a fairly time consuming operation as it typically involves multiple texture samples over a large number of pixels. One solution is to take advantage of the fact that both terrain geometry and terrain texturing are almost always static and don’t change every frame, or even during the entire game. This allows the programmer to switch the texture compositing from the GPU, where it is calculated every frame, to the CPU where it is only calculated when needed. If this concept is taken to the extreme, the textures for the entire terrain can be baked prior to run-time resulting in the “mega-texture.” The middle ground is where this sample focuses, showing how to composite both diffuse textures and normal maps in the area immediately surrounding the camera in order to save time on the GPU.

Texture Compositing Overview

Texture compositing (or texture splatting) is a technique where a mask (or blend) texture is used to determine how to blend textures together [1]. For example, if the terrain contains a grass texture, a single channel texture can be used to denote how much grass should be visible with 0.0 indicating no grass and a 1.0 indicating 100% and anything in between being partially transparent grass. For each additional texture that is added to the mix, an additional channel is needed in the blend texture to represent that texture. If a RGBA8888 format texture is used, four diffuse maps can be controlled with a fifth texture blended in as a base texture. The fifth texture is applied first at full opacity with the remaining textures blended on top of it. In this sample, the blend texture is used to blend together five diffuse textures and five normal maps. Each normal map is paired with a diffuse texture and regulated by the same blend map channel. To combine the textures, a linear interpolation between the source and destination textures is used. This means that the order the textures are composited in makes a difference. If both grass and dirt have a 1.0 in their associated channels, whichever is blended last will obscure the other. Figure 1 shows an example composition of grass (base texture), dirt, stone, rock, and snow. The blend map stores the blend factor for the individual textures in the rgba channels.
 



Figure 1 - Texture Splatting - In this example, a blend map is used to blend together 4 textures with a 5th being used as a base texture. The red square simply denotes that only a portion of the blend map is used to generate the composited texture.

What to Composite

In order for compositing to work on the CPU, the terrain needs to be broken into tiles. If it is not, then the composited texture is either going to require a massive amount of memory as the entire terrain will be composited, or the resolution will be very low as it will be stretched out across the terrain. CPU-composited tiles are chosen based on the camera’s position. The sample uses the tile that contains the camera and the eight surrounding tiles. When the camera crosses into a new tile, tiles no longer in the grid are dropped and new ones are added; tiles that weren’t dropped out of the grid don’t need to be recomposited.
 



Figure 2 - The nine tiles surrounding the camera (C) are composited. If the camera moves to a new tile, composited textures for the out of bounds tiles are marked for deletion (D) while textures for newly in-range tiles are marked for compositing (N).

Tiles not in the compositing grid are shaded on the GPU in the standard manner, having the textures blended there. This will generally be a much smaller amount of pixels as the majority of the visible terrain will be made up of composited tiles. In fact, since it is guaranteed that the GPU tiles will be farther away from the camera, a simpler shader can be used. For example, fewer textures can be used or less detailed mip levels.

Details

All of the work related to rendering the terrain occurs in the file Terrain.cpp and starts in the function Terrain::Render(). For each frame, all of the tiles are first sorted by their distance from the camera and culled to reduce unnecessary draw calls and pixel operations. Next, if CPU rendering is enabled, the nine tiles marked as having their compositing operations complete are rendered. Next, any tiles still being composited or outside of the compositing area are rendered using the compositing shader on the GPU. After all the tiles are submitted for rendering the function DetermineNewTiles() is called, which, as the name implies, determines whether there are any new tiles that need to be composited and if so, kicks off the necessary asynchronous tasks.

The first thing DetermineNewTiles() does is check to see if any mipmap tasks are complete. If so, the composited texture resources are copied from the CPU to the GPU and the tile is marked for CPU rendering. Next, we check to see if there is any compositing work still ongoing, if so the function exits and no new work is added to the queue. This is done to ensure that no work in progress is cancelled. In practice, this will rarely happen as it would involve the camera crossing multiple tile boundaries within a fraction of a second. The most likely scenario is when the camera crosses near an intersection of tiles.

After it is determined that there is no ongoing work we determine which tiles are new and need to be composited and which are old and can be dropped. Once the tiles that need to be composited are determined, two sets of asynchronous tasks are started. The first task is to generate the composited texture which is done in CompositeTileRange(). The second, which is dependent on the first task set, is to generate the mipmaps for the composited texture and compress it using the desired compression method, DXT5 in this sample. Since these are being done asynchronously, the main thread can continue on with its normal work. The dependent mipmap task set is what is checked at the start of the DetermineNewTiles() function in order to ascertain whether CPU work is done and the process is repeated.

In this sample, the approach taken to compositing the textures is fairly straightforward. The blend texture is stretched over the entire terrain while the diffuse textures are each stretched over one terrain tile. Both the source and destination (composited) textures are the same size so the source textures are essentially just copied to the destination after being modified by the blend value. The main issue to be aware of is that unlike sampling on the GPU, texture operations on the CPU are not done at the pixel level, but instead at the destination resolution which is most likely less than pixel resolution. The same problems of minification or magnification can occur, only they happen before the texture is sampled by the GPU. In an extreme case where point sampling is used on the CPU half the data of the source texture is lost completely as shown in Figure 3. This problem can be easily solved by choosing a destination resolution that is an even multiple of the source resolution.
 



Figure 3 - In this example both the source (left) and destination (right) texture resolutions are the same. Tiling the source texture twice in the horizontal direction does not produce the desired results. By tiling twice, the step size becomes two and every other texel is skipped and the information is lost.

To support blending textures on the CPU they need to be accessible on the CPU. This sample stores each source texture as 8 bits per channel. In addition, there needs to be a place to store the results of the blending operation so 9 destination buffers, with 8 bits per channel, are created along with enough extra space to store the mipmap chain. These are stored in an array of CompositedTexture structs along with some other information specific to each tile that has been composited, such as DirectX resource pointers, the current stage in the compositing pipeline, and timing information. After mipmaps are created, the textures are compressed using DXT5 compression with results written directly to a mapped staging buffer. This ends the mipmap task, which DetermineNewTiles() will detect on the next frame causing a CopyResource() function call to copy the staging buffer data to a GPU texture resource.

Performance

As stated earlier, the goal of this sample is to save time on the GPU by reducing the complexity of the shader used for processing terrain pixels. Since this is basically a pixel shader optimization, the performance benefit varies greatly depending on how many pixels are processed, which varies based on resolution and camera orientation. For testing, the starting camera location and orientation were used (shown in Figure 4). The red area indicates the tiles being pre-composited on the CPU and the green areas are tiles being composited in the pixel shader on the GPU. The GPU compositing shader contains 12 texture samples by default: a blend map, a specular map, and 5 diffuse and 5 normal maps. The CPU compositing shader contains three texture samples: a composited diffuse map, a composited normal map, and a specular map. The compositing time is fairly constant at about 300 ms per tile, which includes compositing both diffuse and normal maps, generating mipmaps, and DXT5 compression. All work was performed on a pre-release 3rd generation Intel® Core™ processor, code-named Ivy Bridge GT2 quad-core system with hyperthreading and 4 GB memory. The operating system was 64-bit Windows 7*.
 



Figure 4 - Sample with highlighting enabled. Red areas are pre-composited on the CPU while green areas are composited on the GPU.



Table 1 - Frame times in ms for CPU compositing and GPU compositing at 1366x768 resolution. The number of textures decreases by removing pairs of diffuse and normal map textures from the GPU compositing shader. The CPU composited shader remains unchanged at three texture reads.



Table 2 - Selected GPU metrics from Intel Graphics Performance Analyzer for rendering the tile closest to the camera. The main discrepancy between the two methods is in bandwidth used for all the texture reads and the total time required to render the terrain tile.

One issue to be aware of that can affect performance is a power saving feature in Windows 7 called core parking. When activity is light on a core, Windows will “park” it, or not schedule work for that core until activity on the system has reached some threshold that allows the processor to move cores into a low power state and save energy. Unfortunately this can lead to slow frames in this sample. The reason is because for the most part, only one core is being used for the main game loop. When the camera crosses a tile boundary, a bunch of work is suddenly kicked off, which fully subscribes all the cores. Since many of the cores are parked as activity has been light, the main thread which is responsible for rendering can be swapped out for a significant period of time until all of the cores become unparked. The end result is frame hitching when new tiles need to be composited.

Conclusion

For games that have either exceeded the frametime for terrain or would like to do more with terrain, then pre-compositing terrain textures on the CPU is one approach that can help solve this problem. Due to the relatively static nature of terrain, the work only needs to be done once and can be reused for many frames. For the test scene, the frametime improved by 20% and the actual GPU time for rendering a tile improved by 48%. Drawbacks include the increase in memory required both on the CPU and GPU, and restrictions on the relative dimensions of source and destination textures. Benefits include faster rendering times for terrain tiles and since the work doesn’t need to occur every frame, some more intensive operations could be done like additional texture layers or Wang tiles [3] for increased variability.

Some mitigation efforts that could be done to address the memory issues would be to combine the mipmap and compression work with the compositing work. The compositing task could decompress textures, do the compositing work, generate mipmaps, and recompress all at once. This would save quite a bit of memory by allowing both source and destination textures to be stored in compressed format. Further, the compositing task could output the compressed texture directly to a mapped staging buffer, which would eliminate the need for buffers to store the composited textures on the CPU. Additionally, the time taken for compositing could potentially be decreased through algorithm improvements and SSE/AVX usage.

Thanks

A special thanks to Frank Luna for allowing us to use code from his book “Introduction to 3D Game Programming with DirectX 10.”[2]

 

References

1. Charles Bloom, “Terrain Texture Compositing by Blending in the Frame-Buffer (aka "Splatting" Textures),” Nov 2, 2000, Retrieved 14:29, December 13, 2011, from /sites/default/files/m/d/e/1/splatting.txt

2. Frank D. Luna, “Introduction to 3D Game Programming with DirectX 10”, Wordware Publishing, Inc., 2008

3. Wang tile, Oct 6, 2011, In Wikipedia, The Free Encyclopedia. Retrieved 22:28, December 13, 2011, from http://en.wikipedia.org/w/index.php?title=Wang_tile&oldid=454296576
 

For more complete information about compiler optimizations, see our Optimization Notice.
Tags: