Rendering grass with Instancing in DirectX 10

Introduction



Rendering realistic looking grass in real-time is hard, especially on consumer graphics hardware because of its geometric complexity. The intent of this article and source provided is to introduce the concept of geometry instancing with Direct3D10 APIs to the reader and show how it can be used to implement a realistic looking grass on consumer graphics hardware.

Instancing Grass



A typical patch of grass can easily have a few hundred thousand blades; each very similar to the other with variations in color, position, and orientation. Rendering a large number of small objects, each made from a few polygons is not optimal. Current generation graphics APIs as DirectX and OpenGL are not designed to efficiently render models with a small number of polygons thousands of times per frame. To efficiently render thousands of blades of grass, the number of draw calls will need to be reduced drastically. If the geometry of the grass blades doesn't change then the best approach would be to process all the grass elements in a vertex buffer and render them in one draw call. However if the geometry does change often for example, if level of details scheme is being used for geometry simplification, this approach would not work, as a large amount of data would need to be send to the graphics card every time the geometry changes. Geometry instancing allows the reuse of geometry when drawing multiple similar objects in a scene. The common data is stored into a vertex buffer and the differentiating parameters like position, color are stored in a separate vertex (instance) buffer. The hardware uses the vertex and instance buffer to render unique instances of the models.

The use of Geometry instancing API's helps factor common data from the unique data (flyweight design pattern) and thus lowers memory utilization & bandwidth. The vertex buffer can stay resident on the graphics memory and the instance buffer can be updated more frequently if needed, thereby giving us the performance and the flexibility needed.

Implementation Details



In this sample, Numerous small patches of grass are drawn all across the terrain. A patch of grass consists of a vertex buffer containing a number of randomly placed intersecting quads. Each quad is mapped with a texture contaning a few blades of grass.

A natural waving motion of the grass blades is achived by animating the vertices of each quad using a combination of sine waves of different frequencies. Color changes that occur with the waving motion and wind are simulated using the same sine wave used to animate the grass. Refer [4] for more details. Geometry instancing is used to place numerous small patches along a grid on the terrain. This method allows selective drawing of patches that are visible refer to figure below. Patches with various levels of detail depending on camera position can also be introduced with relative ease.

The figure below higlights the dynamic culling of grass geometry (only patches shown in blue are rendered). Refer to code sample for more details.



Instancing with Direct3D10



This section shows the steps needed to implement Geometry Instancing using the Direct3D 10 API's.

Step 1). Defining Vertex & Instance buffers. In Direct3D10 there is no distinction between the various buffers types. they are all stored as D3D10Buffers. To render instanced grass two buffers are created one with static geometry information for the patch and the other contains various positions at which the patches are to be drawn.

Step 2). Associating the input buffers with a vertex shader. The vertex & instance buffers are associated with a vertex shader in Direct3D 10 using an input layout object, which describes how vertex buffer data is streamed into the IA (Input assembly) pipeline stage.



An input-layout object is created from an array of input-element descriptions and a pointer to a compiled shader. Each element describes data structure of the vertex buffer/buffers and its layout. The input-element array described below was used for the instancing grass demo. The first two elements of the array define the data structure of the vertex data coming from the vertex buffer and the third element describes the data structure of the instance data coming from the instance buffer (second vertex buffer). Notice the input slot (fourth data entry) for the elements is different for the vertex and instance buffer. For more details look refer to [1].

static const D3D10_INPUT_ELEMENT_DESC grassLayout[] =
{
{ "POSITION", 0, DXGI_FORMAT_R32G32B32_FLOAT, 0, 0, D3D10_INPUT_PER_VERTEX_DATA, 0 },
{ "TEXCOORD", 0, DXGI_FORMAT_R32G32_FLOAT, 0, 12, D3D10_INPUT_PER_VERTEX_DATA, 0 },
{ "vPPos", 0, DXGI_FORMAT_R32G32_FLOAT, 1, 0, D3D10_INPUT_PER_INSTANCE_DATA, 1 },
};
UINT numElements = sizeof(grassLayout)/sizeof( grassLayout[0] );

// Create the input layout
D3D10_PASS_DESC PassDesc;
// Get a pointer to the compiled shader
pRenderTechnique->GetPassByIndex( 0 )->GetDesc( &PassDesc );
// Associate the shader to the layout description
hr = pd3dDevice->CreateInputLayout( pLayout, numElements,
PassDesc.pIAInputSignature,
PassDesc.IAInputSignatureSize,
&pVertexLayout );



Step3). Binding Objects

Once the vertex buffers ready, they are bound to the IA stage as shown by the source listing below. An array of vertex buffer pointers containing vertex and index buffer, strides and offsets is created and bound to the IA along with the previously created layout.

	ID3D10Buffer* pVB[2];
UINT strides[2];
UINT offsets[2] = {0,0};
pVB[0] = pVertexBuffer;
pVB[1] = pInstanceBuffer;
strides[0] = sizeof ( T_VERTEX );
strides[1] = sizeof ( T_INSTANCE );
pd3dDevice->IASetVertexBuffers( 0, //first input slot for binding
2, //number of buffers in array
pVB,//array of vertex buffer ptrs
strides, //array of strides
offsets //array of offset values
);

// Set the input layout created previously
pd3dDevice->IASetInputLayout( pVertexLayout );



Step 4). Drawing the primitives

Once all the input resources have been bound to the pipeline, draw calls are issued to render the primitives. Direct3D10 supports various instanced draw calls for drawing geometry, based on the primitive topology used. The example below shows the draw primitive calls for triangle lists used in the rendering instanced grass sample.

	// Set primitive topology
pd3dDevice->IASetPrimitiveTopology
( D3D10_PRIMITIVE_TOPOLOGY_TRIANGLELIST );
// Set other resources
pDiffuseVariable->SetResource( pTextureRV );

// Render Instanced Billboards
D3D10_TECHNIQUE_DESC techDesc;
pRenderTechnique->GetDesc( &techDesc );

for( UINT p = 0; p < techDesc.Passes; ++p )
{
pRenderTechnique->GetPassByIndex( p )->Apply( 0 );
pd3dDevice->DrawInstanced
( vertexCnt, // number of vertices per instance
instanceCnt,// number of instances
0, // Index of the first vertex
0 // Index of the first instance
);
}



The implementation of the instancing portion of the source is broken up into 2 classes InstancedBillboard & BBGrassPatch. The InstancedBillboard class is a generic class used to hide implementation details on instancing. It accepts the vertex data structure and the instance data structure as the templated inputs. The grass patch class is responsible for the implementation details and the initialization of the grass blades and patches.


Rendering grass blades with Alpha-to-coverage>



The grass patch is rendered as a number of randomly placed intersecting quads. Each quad is mapped with an alpha texture containing a few blades of grass. Rendering it as is with alpha blending will need the quads to be sorted back to front in order to render transparency correctly. This can be really expensive especially if we need to sort hundreds of thousands of quads every time the camera moves and send the data to the graphics card or use depth peeling on the graphics card which is equally prohibitive.

Alpha-to-coverage is used to solve this problem. Since the grass billboards use the alpha channel as a cut outs (alpha is either 0 or 1) this method works well. However, the cut outs are not always binary often to make vegetation look better the edges of the cut outs are blurred. In this case we Alpha-to-coverage along with multi sample anti-aliasing (MSAA) is used to solve this problem.

Alpha-to-coverage converts the alpha component output from the pixel shader to a coverage mask that is applied at the sub-pixel resolution of an MSAA render target. When the MSAA resolve is applied the output pixel gets a transparency from 0 -1 depending on alpha coverage and MSAA sample count. The images produced look correct and are devoid of artifacts. Refer [2] for more details.

This method is a pseudo order independent transparency solution and works well if the alpha channel used is being used for cutouts (alpha is either 0 or 1) and works well for our grass billboards. This method doesn't work if correct transparency is desired i.e. like looking through a window.

Future work and Conclusions


The sample source.

The Direct3D10 API's really simplify the implementation of instancing and also offer improved performance. The mapping of vertex buffers & shader inputs is done via the creation of input layouts during intialization and doesn't need to be done at draw time every frame as in earlier versions of the API, therby improving performance.

The sample source provided can be modified with relative ease to use level of detail. Multiple static patches at different levels of detail can be generated each with a fewer number of polygons. The patches can be placed on the grid with instancing depending distance from the camera. Further simplification can be added by using two animated textures for grass patches really far out.

The sample currently runs at a reasonably good frame rates on consumer graphics hardware (30-60fps) and could be further optimized by tweaking the size of the static grass patch (hence the number of blades) and number of instances along with the other level of detail optimizations mentioned above.

The image below shows a field of grass rendering on integrated part with alpha to coverage running at about 60 fps.

References and Related Articles



[1] Getting Started with the Input-Assembler Stage (Direct3D 10) http://msdn.microsoft.com/en-us/library/bb205117(VS.85).aspx

[2] Instancing10 Sample - Microsoft DirectX SDK http://msdn.microsoft.com/en-us/library/bb205317(VS.85).aspx

[3] Francesco Carucci "Inside Geometry Instancing" - From GPU Gems 2: Programming Techniques for High-Performance Graphics and General-Purpose Computation, 2005

[4] Animated grass with Pixel and Vertex Shaders /sites/default/files/m/c/e/f/shaderx_animatedgrass.pdf

[5] Alexander Kharlamov, Iain Cantlay (NVIDIA Corporation). "Next-Generation SpeedTree Rendering" - From GPU Gems 3.

About the Author



Anu Kalra is a Sr. Software Engineer at Intel, He works at the visual computing software division developing technologies & influencing gaming ISVs in the adoption of multi core and Larabee. He has a Master's degree in Computer Science from the University of Illinois with a specialization in computer graphics and virtual reality.

Para obter mais informações sobre otimizações de compiladores, consulte Aviso sobre otimizações.