With a basic understanding of how meshes get rendered you can apply selected techniques to achieve better rendering performance.
What is my polygon budget? That is a very common question that artists ask when making models for real-time rendering. It is also a difficult question to answer because it’s more complicated than just a number.
I started out as a 3D artist back in the days of PlayStation* One, and later I became a graphics coder. I wish this article had been available when I first started building 3D models for games. The fundamentals discussed should be useful to many artists. Although most of the information in this article will not make a huge impact on performance or your day-to-day work, it will give you a basic understanding of how a graphics processing unit (GPU) draws the meshes you create.
The number of polygons in a mesh typically indicates how fast it will render. However, even though the polygon count often correlates with the frames per second (FPS), you might discover that even after reducing the number of polygons, the mesh still renders slowly. With a basic understanding of how meshes get rendered you can apply selected techniques to achieve better rendering performance.
To understand how a GPU draws polygons, first consider the data structure used to map a polygon. A polygon consists of a collection of points—known as vertices—and references. Vertices are often stored as arrays of values, such as those shown in Figure 1.
Figure 1. An array of values for a simple polygon.
In this case, four vertices in three dimensions (x, y, and z) produce 12 values. To create polygons, a second array of values describes the vertices, as shown in Figure 2.
Figure 2. An array of references to vertices.
These vertices, bound together, form two polygons. Notice that two triangles, each with three corners, can be represented by four vertices because Vertex 1 and 2 are being reused for both of the triangles. For this data to be handled by the GPU, each polygon is assumed to be a triangle. GPUs expect that you are working with triangles, because that is what they are designed to draw. If you want to draw polygons with a different number of vertices, the application will need to split them into triangles before the GPU draws them. For example, if you are creating a cube out of six polygons with four sides each, it is no more efficient than making a cube out of 12 polygons that each have three sides; those triangles are what the GPU will draw. As a rule, do not count polygons. Instead, count triangles.
The vertex data used in the above example is three-dimensional, but it doesn't have to be. You might only need two dimensions, but you will often store other data as well, such as UV coordinates for textures and normals for lighting.
When a polygon is being drawn the GPU first determines where to draw the polygon. To do this, it computes the position on the screen where the three vertices appear. This is known as a transform. A small program called a vertex shader performs the calculations on the GPU.
The vertex shader often performs other kinds of calculations as well, such as handling animations. Once all three vertices in a polygon have been computed, the GPU calculates which pixels are in that triangle and then proceeds to fill the pixels using another little program called a fragment shader. The fragment shader usually runs once per pixel. In some rare cases, however, it may run more than once per pixel to improve anti-aliasing. Fragment shaders are often called pixel shaders, because in most cases fragments correspond directly to pixels (see Figure 3).
Figure 3. One polygon drawn on screen.
The timeline of what the GPU does when drawing a polygon is shown in Figure 4.
Figure 4. Timeline of a GPU drawing a polygon.
If you split the triangle in two and draw both triangles (see Figure 5), the timeline for the operations appears, as shown in Figure 6.
Figure 5. Splitting a polygon in two.
Figure 6. Timeline of the GPU drawing two polygons.
This scenario requires twice as much transformation and setup, but—because the number of pixels is the same—the operation does not have to rasterize any more pixels. This shows that twice as many polygons does not necessarily require twice the time to render.
If you examine the two polygons in the previous example, you can see that they share two vertices. This suggests that it might be necessary to compute these vertices twice, but a mechanism called the vertex cache lets the computations be used again. The results of the vertex shader computations are stored in the cache—a small region of memory containing the last few vertices for reuse. The timeline for drawing the two polygons using the vertex cache is shown in Figure 7.
Figure 7. Drawing two polygons using the vertex cache.
Using a vertex cache, you can draw two polygons almost as fast as drawing one, if they have shared vertices.
For the vertex to be reusable, it needs to be identical each time it is used. The position, of course, must be identical, but other parameters must also be the same. The parameters that are passed to the vertex depend on the engine you are using. Two very common parameters are:
Whenever you perform UV mapping on a 3D object you any seam that is created will mean that vertices along the seam cannot be shared. In general it is therefore a good idea to avoid seams (see Figure 8).
Figure 8. UV mapping of texture seams.
To light a surface correctly, each vertex typically stores a normal, which is a vector that points away from the surface. By having all polygons that share a vertex defined by the same normal, the shape appears smooth. This is known as smooth shading. If each triangle has its own normals, the edges between the polygons become accentuated and the surface appears flat, which is why it is called flat shaded. Figure 9 shows two identical meshes, one with smooth shading and the other one with flat shading.
Figure 9. Smooth shading compared to flat shading.
This smooth-shaded geometry consists of 18 triangles using 16 shared vertices. Flat shading the 18 triangles requires 54 (18 times 3) vertices, because none of the vertices are shared. Even if the two meshes have the same number of polygons, they still differ in performance.
GPUs are fast primarily because they can do many things in parallel. Marketing materials for GPUs often emphasize the number of pipelines that they have, which determines how many things the GPU can do at the same time. When the GPU draws a polygon it assigns a number of pipelines to fill a square of pixels. This is typically a square with dimensions about eight by eight pixels. The GPU keeps doing this until all pixels are filled in. Obviously triangles aren't square, so some of the pixels in the squares are going to be inside the triangle and some will be outside. Hardware will be assigned to all the pixels in the square, even those that fall outside of the polygon. Once all vertices in a square has been computed, the hardware discards the pixels outside the triangle.
Figure 10 shows a triangle that requires three tiles to draw it. Most of the blue pixels that get computed are used, but those shown in red fall outside the boundaries of the triangle and will be discarded.
Figure 10. Three tiles to draw a triangle.
The polygon in Figure 11—with the exact same number of pixels, but stretched out—requires more tiles to fill; most of the work in each tile (the red area) will be discarded.
Figure 11. Filling tiles in a stretched-out image.
The number of pixels being drawn is one factor in the rendering, but the shape of the polygon is also important. For greater efficiency avoid long, skinny polygons and favor triangles in which each side is roughly equal in length, and the angle of each corner is close to 60 degrees. The two flat surfaces in Figure 12 have been triangulated in two different ways but will look identical when rendered.
Figure 12. Surfaces triangulated in two different ways.
They have exactly the same number of polygons and the same number of pixels, but because the surface on the left has more long, skinny polygons than the one on the right, it will be slower to render.
To draw a six-pointed star, you can build a mesh that has 10 polygons in it, but you can also draw the same shape using only two polygons, as shown in Figure 13.
Figure 13. Two different ways to draw a six-pointed star.
You might think it is faster to draw two polygons rather than t. In this case, however, that is probably not true, because the pixels at the center of the star are drawn twice. This is known as overdraw, essentially drawing over pixels more than once. Overdraw is something that naturally happens in all rendering. For example, if a character stands partially obscured behind a pillar, the entire character will be drawn even if the pillar obscures parts of the character. Some engines employ advanced algorithms to avoid drawing objects that will not be in the final image, but this is difficult. It often takes longer for the CPU to figure out that something does not need to be drawn than it takes for the GPU to draw it.
Working as an artist, you must accept that at times there will be overdraw, but it is good practice to remove surfaces that cannot be seen whenever possible. If you work with a team of engineers, ask them to add a debug mode to your game engine in which everything becomes transparent. That makes it easier to spot hidden polygons that could be removed.
Figure 14 shows a simple scene: a box standing on a floor. The floor consists of just two triangles and the box consists of 10 triangles. The overdraw of this scene is shown in red.
Figure 14. A box standing on the floor.
In this case, the GPU will draw the floor under the box despite the fact that it will never be seen. If instead a hole is created in the floor under the box, you have a greater number of polygons but much less overdraw, as shown in Figure 15.
Figure 15. Hole beneath the box to avoid overdraw.
In scenarios such as this you need to make a judgment call. Sometimes it is worth trading fewer polygons in exchange for some overdraw. Other times it is worth adding polygons to avoid overdraw. As another example, the following two figures show two identical-looking meshes of a surface with some spikes protruding from it. In the first mesh, Figure 16, the spikes are placed on top of the surface.
Figure 16. Spikes placed on the surface.
In the second mesh, Figure 17, holes are cut in the surface under the spikes to reduce overdraw.
Figure 17. Holes cut beneath the spikes.
In this case, a lot of polygons—many of them skinny—have been added to cut the holes. Additionally, the surface of overdraw that is avoided is not very large, so for this scenario the technique is not effective.
Imagine modeling a house standing on the ground. To implement this, you can either leave the ground as it is or you can cut a hole for the house in the ground. There is more overdraw in the version of the house without a hole cut beneath it. However, this depends on the geometry and point of view from which the house will be viewed. If you draw the ground under the floor of a house, this approach can create a lot of overdraw when viewed from inside the house and looking down. It won’t, however, make that big a difference if the house is seen from an airplane. The best practice in these cases is to have a debug mode in your games engine to make surfaces transparent so that you can see what is being drawn behind the surfaces that are visible to the end user.
When the GPU draws two polygons that overlap, how does it determine what polygon is in front of the other? Early computer graphics innovators spent a lot of time trying to solve this problem. Ed Catmull (who later became president of Pixar and Walt Disney Animation Studios) wrote a paper that outlined ten different approaches for solving the problem. At one point he notes that this problem would be trivial to solve if computers had enough memory to store one depth value per pixel. Back in the 1970s and 1980s, this was a lot of memory. Today, however, it is how most GPUs work: It is called a Z-buffer.
A Z-buffer (also known as a depth buffer) works in this way: Each pixel has a depth value associated with it. Whenever the hardware draws an object, it computes how far away from the camera the pixel being drawn is. It then checks the existing depth value of the pixel. If it is further away from the camera than the new pixel, the new pixel gets drawn. If the existing pixel is closer to the camera than the new pixel, then the new pixel is never drawn. This approach solves many issues and even works well if you have polygons that intersect each other.
Figure 18. Intersecting polygons processed by the depth buffer.
However, the Z-buffer doesn't have infinite precision. If two surfaces are at almost exactly the same distance from the camera, the GPU gets confused and can randomly select one surface over the other, as shown in Figure 19.
Figure 19. Surfaces at the same depth causing display problems.
This is known as Z-fighting and it looks very glitchy and buggy. Often Z-fighting gets worse the further away the surface is from the camera. Engine designers can incorporate fixes to mitigate the problem, but if an artist builds overlapping polygons that are close enough to each other, the problem can still occur. Another example is a wall with a poster on it. The poster will be almost exactly at the same depth from the camera as the wall behind it, so the risk of Z-fighting is very high. The solution is to cut a hole in the wall behind the poster. Doing this also reduces overdraw.
Figure 20. Z-fighting example of overlapping polygons.
In extreme cases, Z-fighting can even happen when objects touch each other. In Figure 20 you can see a box on a floor, and since we haven’t cut a hole in the floor where the box stands, the z-buffer can be confused near the edge where the floor meets the box.
GPUs have become exceptionally fast—so fast that CPUs can have a hard time keeping up with them. Because GPUs are essentially only designed to do one thing, they are much easier to make work fast. Graphics, by their nature, are about computing multitudes of pixels, and that makes it possible to build hardware that computes many pixels in parallel. However, the GPU only draws the things the CPU tells it to draw. If the CPU cannot feed the GPU fast enough, the GPU will go idle. Each time the CPU tells the GPU to draw something is called a draw call. A basic draw call consists of drawing one mesh, including one shader and one set of textures.
Imagine a slow CPU that can feed 100 draw calls per frame and a fast GPU that can draw a million polygons per frame. In this case, the ideal draw call could draw 10,000 polygons. If your meshes are only 100 polygons, the GPU will only be able to draw 10,000 polygons per frame. The GPU will be idle 99 percent of the time. In this case, you can easily increase the number of polygons in your meshes for free.
What constitutes a draw call, and how costly they are, can vary substantially between engines and architectures. Some engines can batch many meshes as a single draw call, but all meshes may need to have the same shader or could have other limitations. New APIs like Vulkan*, and DirectX* 12 are specifically designed to remedy this problem by optimizing the way the program communicates with the graphics driver, increasing the number of draw calls that can be issued per frame.
If your team is writing their own engine, talk to the engine developers about what the draw call limits are. If you are using an off-the-shelf engine like Unreal* or Unity* software, consider doing some performance benchmarks to determine the limits of the engine. You may discover that you can increase your polygon counts without any slowdown.
I hope this serves as a good overview to help understand the different aspects of rendering performance. Each GPU vendor does things a little differently. There are many caveats and specific considerations for each engine and hardware platform. Always maintain an open dialog with the rendering engineers to learn the best practices for your application.
Eskil Steenberg is an independent developer of games and tools, who works as a consultant, as well as on independent projects. All screenshots were taken from active projects using tools developed by Eskil Steenberg. You can find more about his work at Quel Solaar and @quelsolaar on Twitter.
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804