Rendering dynamic, complex scenes in real-time is a challenging task. Developers and gamers alike love to have environments that are as realistic in appearance as possible without sacrificing game-play or speed. One such environment that more and more games are adopting, and that is used as an example in this column, is the great outdoors. Unfortunately for performance, being outdoors means, among other things, rendering a lot of geometry. Terrain, water, clouds, and vegetation especially, are all factors that contribute to the challenge. However, with a bit of creative thinking, and smart use of the 3D API, be it OpenGL, Direct3D*, or any other API, developers can squeeze an amazing amount of performance out of today's hardware.
The example discussed in this column specifically pertains to rendering forests of trees, and uses Direct3D*, but the ideas presented could be applied to any 3D scene containing lots of objects with similar geometry using any 3D API. Trees provide an interesting example because they provide all the necessary ingredients for the rendering challenge. They are geometrically complex, they may require a detailed texture, they are abundant in nature, and they are affected by natural elements (e.g. gravity, wind, etc...). Because trees exist in nature, people know what they look like, how they behave, and how they fit into their environment. A consequence of this is that people also have an expectation of what trees will be in a game. Since the goal is to create as realistic an environment as possible, developers must take all of that general knowledge into consideration when creating virtual environments.
Feasibility of Rendering Full Geometry
A great many species of trees live in the natural world, and modeling every one would be very difficult and time consuming. Most of the trees in a given geographic area, however, are relatively similar, and detailed modeling of as few as four different trees, which are then rotated and scaled differently, can result in an impressively realistic forest. The best part is that you don't need, or need to be, a talented artist. Each tree in the example project corresponding to this column is procedurally generated on the fly, and contains more than 10,000 polygons! The utilization of a noise function determines how many trees end up on the land, as well as each tree's type, scale, rotation, and position. A constant variable that is a threshold determines the tree density of the forests, and is set to allow a total of about 400 trees into the example's initial scene.
Assuming that all the trees in the initial scene are in place, and that each tree's properties are set (type of tree, scale, rotation, and position), the interesting question of how to render the trees comes into play. The answer to this question is by no means set in stone, and may in fact vary depending on the desired effect. Let's get into a couple of the possible approaches, discussing the pros and cons of each, and expand on some important implementation decisions. Our desired effect here is the fastest possible rendering of realistic trees.
This first approach is by far the slowest, and the run-time performance is unacceptable for any real-time gaming environment. The idea is simple: render the full geometry of everything in the scene. Four hundred trees, ten thousand plus polygons per tree, not to mention land, sky, and water is a lot of geometry to render, over 4 million triangles per frame! On the other hand, the amount of work that must be done in software is extremely small. Our framework is not as efficient as the best engines available today, but at best, we could only achieve 1 or 2 fps, which is unacceptable. This kind of performance quickly sparks thoughts of improvement, and as we will see next, huge improvements are possible without reducing the polygon count of the trees, or the tree density of the forests.
The first obvious improvement is to apply some sort of visibility culling to the scene. The example divides the world into a 2-dimensional grid, and arranges each quadrant of a grid such that a quadrant contains a single patch of terrain, a single plane at a fixed level for water, and a linked list of trees. Any quadrant that falls outside the view volume of the camera is not rendered. Granted, this is a relatively simple approach to culling, and adds some work to be done in software, but it will prove to serve the desired effect well. From here on out, assume culling is on. With this small improvement, only 40% of the quadrants in a 10x10 grid are drawn, resulting in a maximum frame rate of 3 fps. A finer division of quadrants (i.e. putting more, smaller quadrants in the scene) will result in the percentage of quadrants drawn being lowered to about 25%. Nonetheless, rendering the full geometry of the visible scene is still extremely expensive, and the question of what really needs to be rendered remains.
Enter the Impostors
It turns out that it is not necessary to render the full geometry of a tree that is so far away from the camera that it takes up a single pixel of screen real estate. The user can barely see it anyway. It is not even necessary to render the full geometry of a tree unless that tree is close enough to the camera where the user can see the difference between a 3D tree and a texture that looks like a tree, or an impostor. An impostor, in the context here, is a billboard that always faces the camera with an applied texture that visually represents the geometric object it replaces, as figures 1 and 2 illustrate.
Figure 1: A solid view of a tree and an impostor (the alpha channel of the impostor allows omission of the black region when rendering).
Figure 2: A wireframe view of the same tree and impostor
Using impostors will drastically reduce the polygon count, but it raises other issues:
- How will billboards be used to represent different views of the individual trees?
- When to render an impostor vs. a real tree?
- How to go about creating the impostors?
Now things become more interesting from a developer's standpoint. Consider walking a full circle around a tree in real life. The branches of the tree remain stationary with respect to the tree trunk. In other words, they do not spin around the tree's trunk to face your eyes at all times. Now consid er moving a camera around a tree, or any object for that matter, in a game. If the tree were to look exactly the same from any point of view, then the goal of simulating realism would not be met. To avoid this problem, the full geometry of any tree within a certain distance of the camera is rendered while trees beyond that distance are rendered as impostors. But, a problem remains to be solved with this solution.
As mentioned earlier, each tree in the scene has unique properties assigned to it at initialization. If the same texture were applied to all the impostors, then all the trees rendered as impostors would look exactly alike, which would effectively create a very unnatural forest. Also, when the distance between the camera and a tree became small enough to constitute rendering the tree's full geometry, a very noticeable popping would surely occur. All is not lost, however, for we have a powerful tool in our toolbox to help us maintain the illusion: render-to-texture.
Smart use of Render-to-Texture
Rendering just one view of a tree to a texture is not enough, as per reasons just discussed; therefore, a decision must be made as to how many views of a tree to render to texture, and whether to use one texture per view, or one texture representing all the chosen views of a tree. The trick to selecting how many views to use is careful selection of the swapping distance, the distance from the camera at which the full geometry of a tree is rendered instead of its impostor. In order to keep the memory footprint small, but maintain the appearance of realism, the example uses a swapping distance such that no noticeable popping occurs when using just eight views: N, NE, E, SE, S, SW, W, NW. Generally, just these eight are effective, but experiment with more or less views depending on your application's needs. Of course, keep in mind that it all depends on the scale of the world in which your trees live, but with some experimentation finding a suitable value is relatively easy.
Next consider a second decision concerning the textures: whether to use one texture per view or one texture representing all eight views of a tree. Texture swapping can be quite expensive with modern APIs, so choosing to render one view of a tree to a texture would add some penalty. Rendering all eight views of a tree to a single texture reduces the amount of texture swapping enormously. Taking one step further, rendering all eight views of all four trees to a single texture eliminates texture swapping between impostors entirely! In the interest of maintaining a realistic appearance while keeping resource usage down, and the performance up, the example uses this last technique.
Though an elegant solution, choosing to render all possible views to a single texture using render-to-texture has its drawbacks. The first drawback concerns rendering to a texture with an alpha channel, and this caveat actually applies to all render-to-texture techniques (the alpha channel is necessary so alpha testing may be done against the background of the impostors at render time). The example code creates a render target with an alpha channel, which is something that not all graphics hardware supports. We tested the example on several graphics cards, and worked well (running the latest publicly available drivers), but there is hardware out there on which the example will not correctly execute.
If supporting graphics hardware that does not support render targets with an alpha channel is a requirement, there is a way to work around the problem. Instead of creating a render target with an alpha channel, create a basic (lockable) render target. Also create a system memory texture with the same format as the render target, a system memory texture with an alpha channel, and a video memory texture with an alpha channel. An illustration best describes how this solution works, but we'll also step through it:
Figure 3: Creating an impostor with alpha for hardware that does not directly support Render Target Textures with alpha.
First, render an object to the target. Lock the render target, copy the bits to the system memory texture with the same format as the render target, and unlock the render target. At this point the render target may be discarded, as it is no longer needed. Copy the bits from the system memory texture to the system memory texture with an alpha channel, and take care to set the alpha bits as necessary (opaque where the object was rendered, transparent otherwise). Finally, lock the video memory texture with alpha, copy over the bits from the system memory texture with alpha, and unlock the video memory texture with alpha. The final result is an exact impostor of the object with alpha values set for alpha testing when the impostor is rendered. Since this is all done at startup time, performance is not an issue.
A second drawback to rendering all possible views to a single texture concerns using a possible alternative method to impostors, point sprites. All eight views of all four trees rendered to a single texture means that each impostor must apply some sort of texture transform (see figure 4). Point sprites do not support texture transforms as of DirectX* 8.0, therefore enabling point sprites would mean rendering each view of each tree to a single texture. That particular path was avoided for reasons discussed earlier, thus we must rule out point sprites even though they have the advantage of only storing a z-position.
Figure 4: Impostor Texture Transform (selecting texture coordinates)
MIP-mapped textures become substantially more difficult as well because we would have to maintain a buffer around each tree's view in order to keep the different views from bleeding together.
With a texturing scheme is in place, it is time to move on to another important aspect to consider when rendering hundreds (potentially thousands) of small objects with Direct3D*: smart use of vertex buffers. Since all the impostors are essentially quads with specific texture coordinates based on each tree's rotation, it is possible to use:
- a single 32-vertex VB for all the impostors, or
- a 4-vertex VB per impostor
Doing either requires selecting the appropriate texture coordinates per impostor on the fly, but that will be true in any case using a single texture representing multiple views. The performance hit comes with the hundreds of calls to DrawIndexedPrimitive(), with each call only rendering 4 vertices.
A better approach is to do the transformations for all the impostors in software, and stuff all the transformed vertices into a single, large vertex buffer. The example project uses a vertex buffer large enough to accommodate 1000 tree impostors (4000 vertices), though it could potentially support up to ~16,000 tree impostors (~64,000 vertices). The example only supports 1000 tree imposters in a VB because anything over 4000 vertices had negligible performance gains. Although it may seem like a lot of work to use a single texture, do the impostor transforms (scale, rotate, position, calculate texture coordinates) in software, and use a single, large vertex buffer, the final result is really quite pleasing both visually and with respect to performance. Performance, as measured by the frame-rate, boosts from a slow 3 fps to an awesome 90 fps, a 1000% increase!
Scene rendered with no impostors
Similar scene rendered with impostors
As the numbers demonstrate, the fast rendering of easy impostors is a great tool to have in a developer's toolbox. Combining techniques like render-to-texture with procedural content alleviates the need for additional artwork, and saves in content development time. A careful thought process concerning visibility culling, texturing, and API usage will increase the overall rendering speed of your pipeline, and ultimately improve the overall visual quality. Hopefully the ideas presented here will prove useful the next time you are faced with the difficult problem of rendering reality.
Product and Performance Information
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.