3D Modeling and Parallel Mesh Simplification

By Christopher G. Healey


3D geometric models are common in computer graphics-for example, in computer games, scientific simulations, or architectural design. These models are usually built as a collection of vertices connected together to form a polygon mesh. Complex models can contain tens of millions of polygons. For example, the power plant and DoubleEagle Tanker models (http://www.cs.unc.edu/~walk/models) available from the University of North Carolina at Chapel Hill contain 13 million and 82 million polygons, respectively.

As you can imagine, managing large polygonal models within an application presents numerous challenges for developers-the most pressing of which is how quickly the mesh can be rendered. Interactive applications need to render at least 30 frames a second. A number of clever approaches like view culling, point-based rendering, and geometry caching have been designed to help with this. Another example involves global illumination algorithms like radiosity. The speed of these algorithms depends in large part on the size of the polygon mesh they take as input. Reducing the mesh's size can produce significant speed-ups in the algorithm.

This article describes a collection of mesh simplification algorithms. Mesh simplification reduces the number of polygons in the mesh-often dramatically-while remaining faithful to the topology (that is, the shape) of the original mesh.   In this article, you’ll learn how to simplify your own meshes. You can apply this in any situation where smaller models would help, for example, to improve character models in a game for better rendering speeds, or to simplify a complicated terrain model so it can be held entirely in memory.

Basic Simplification Operations

Any differences between the original and the simplified mesh are called geometric error. You use geometric error to control a simplification algorithm. One option is to define an error threshold, then ask the algorithm to reduce the number of polygons as much as possible without producing geometric error above the set threshold. Another approach is to pick a desired polygon count, then ask the algorithm to reduce the mesh to this size while minimizing the amount of geometric error being produced.

Different algorithms reduce the number of polygons in a mesh in different ways. Two common simplification operations are the edge collapse and vertex collapse.

Edge Collapse

An edge collapse (see Figure 1) takes an edge (v1, v2) in the mesh and collapses it to a single vertex vnew. The edge and any triangles that use the edge are removed. Common locations for vnew include either of the collapsed edge's endpoints (vnew=v1 or vnew=v2) or a position somewhere along the collapsed edge (vnew= v1+w(v2- v1), 0 < w < 1).

Vertex Collapse

A vertex collapse (see Figure 2) combines two vertices (v1, v2) into a single vertex vnew. Unlike an edge collapse, v1 and v2 do not need to be connected. Triangles connected to either v1 or v2 are updated to connect to vnew. You can place vnew somewhere along the line connecting v1 and v1, or in any other position that does not introduce inconsistency like the mesh folding over or polygons interpenetrating one another. Notice that an edge collapse is a special case of a vertex collapse. It's a vertex collapse where v1 and v2 are connected and vnew is somewhere along the edge between v1 and v2.

Progressive Meshes

Edge collapses seem like a promising way to reduce a mesh's size. Collapsing an edge removes the triangles it shares, making the model smaller. The key question is: Which edges should we remove, and in what order? Progressive meshes (PM) solve exactly this problem. The PM algorithm simplifies or refines a mesh using three mesh modification operators: an edge collapse to simplify and reduce the size of the mesh, a vertex split to add detail and increase the size of the mesh, and an edge swap to improve the shape of the mesh [Hoppe 1993]. This article focuses on mesh simplification, so let’s concentrate on the edge collapse operator.

The PM algorithm simplifies an initial mesh Mn with n vertices into a progressive sequence of meshes (Mn−1, Mn−2,  . . . M1), each with one less vertex. This makes it easy to select a mesh with a desired number of vertices. If you want a mesh with m vertices, simply choose Mm from the sequence. It’s also easy to locate a mesh with a maximum geometric error.  Walk down the sequence of meshes until you find the mesh Mm-1 with more error than you want. The previous mesh Mm will be the smallest mesh with an error below your threshold. You can also use the sequence to animate the simplification or refinement of the mesh-a technique known as geomorphing.

Simplification is controlled by an energy function that measures two competing quality measures: how tightly a simplified mesh fits the original (Edist) and the number of vertices the simplified mesh contains (Erep). Improving one measure will normally make the other worse (for example, fewer vertices will reduce the fit to the original mesh). A spring measure Espring is also included in the energy function. Its job is to keep the mesh tight along its edges. Adding these three measures together produces the final energy function:

E = Edist + Erep + Espring

Given a mesh Mm with V = { v1, …, vm } vertices and edges between the vertices defined by K, Edist(V,K) measures tightness as the squared distance from V to the original mesh. Erep (K) is simply a weighted vertex count crep m. crep allows you to control how aggressively to simplify the mesh. A larger crep means the energy function will favor reducing vertices over maintaining a tight fit to the original mesh. Each edge collapse replaces an edge’s endpoints (v1, v2) with a single vertex vnew, as Figure 3 shows.

Figure 3. A geographic mesh of meteorological readings: (a) original mesh; (b) frost coverage; (c) rainfall; (d) mesh simplified to 10% of the original verticesl (e) frost coverage; (f) rainfall

To optimize vnew, all other vertices are held constant. You derive a vnew position that minimizes the squared distance to the original mesh through conjugate gradient iteration. You can localize distance calculations to triangles incident on either v1 or v2. To reduce the number of iterations performed, you have three possible starting positions: v1, v2, and ½(v1+v2). The best overall position is then assigned to vnew.

Figure 4 shows a simple pseudocode overview of the progressive mesh (PM) algorithm.


start with initial K, V for mesh Mn
repeat {
	choose an edge collapse producing new connectivity K’
	optimize vnew position producing new vertices V’
	calculate E(K’,V’)
	if E(K’,V’) < E(K,V)
		set K=K’, V=V’
until convergence


Figure 4. Simple pseudocode of the PM algorithm

This code produces a sequence of edge collapses that continues until no improvements to the energy function can be found.

You can extend the PM algorithm to consider both geometric error and errors in surface attributes like color and texture patterns [Hoppe 1996]. You do this by updating the energy function to include penalties for collapsing edges that straddle different surface colors or different textures. Now, a simplified model maintains its shape and any color and texture patterns on its surface.

Figure 3a–c shows a geographic mesh of North America with frost coverage and rainfall measured at each vertex position. A modified PM algorithm that respects frost and rainfall values was used to reduce the mesh to 10% of its original vertices (Figure 3d–f) [Walter 2001]. Although 90% of the original vertices and their corresponding weather values have been removed, the simplified frost and rainfall patterns (Figure 3e–f) remain faithful to the full-resolution versions (Figure 3b–c).

Simplification Using Quadric Errors

An alternative to collapsing an edge is to connect two vertices (a vertex collapse), then remove any triangles that reduce to lines or points. The quadric simplification approach uses this exact strategy [Garland 1997]. The algorithm chooses a set of vertex collapses** that reduce an initial mesh (Mn) into a sequence of meshes (Mn−1, Mn−2, . . . M1). A quadric error metric (QEM) selects which vertices to collapse.

Unlike an edge collapse, a vertex collapse can replace two vertices that are not connected (Figure 2). This means disconnected regions of the mesh can be joined together. Whether this is what you want depends on your models’ domain.

The main problem is to decide which pairs of vertices to collapse to produce simplified meshes that closely approximate the original mesh. To do so, perform these steps:

  1. Identify all vertex pairs that could be collapsed.
  2. Assign a cost to each of these collapses.
  3. Place the collapses on a heap that is ordered by minimum cost. This means the collapse with the lowest cost will always be at the front of the heap.

Once this is done, you can remove the vertex pair (v1, v2) with the minimum cost from the top of the heap and perform the collapse. Vertex pairs still on the heap that contain either v1 or v2 need to have their costs updated. This re-orders the heap, so the next best collapse is at the front. Continue retrieving, collapsing, and updating until your desired mesh size or error threshold is reached.

A quadric error matrix Qi at each vertex vi defines the cost of a collapse. Qi is built by examining each triangle that includes vi. Once you have QEMs Q1 and Q2 for vertex pair (v1, v2), you can approximate an error matrix for the collapsed vertex vnew as Q = Q1 + Q2. The algorithm first tries to find a position for vnew that minimizes error. If the search fails, it looks for an optimal position along the edge v1 v2. As a fallback, the algorithm chooses the position with the smallest error from among v1, v2, and ½(v1+v2).

Figure 5 shows the pseudocode for the quadric error algorithm.

**In his paper, Garland refers to the vertex collapse operation as a pair contraction.


compute Qi for all vi in Mn
select as valid pairs
	all edges (v1,v2)
	any pair (v1,v2) where v1−v2 < t for threshold value t
compute optimal vnew, cost of vnew for each valid pair
place all pairs on a heap ordered by minimum cost
repeat {
	remove (v1,v2) from the top of the heap
	contract to position vnew by setting v1=vnew
	replace all occurrences of v2 in heap with v1
	update costs of valid pairs involving v1
until target vertex size of error threshold is achieved


Figure 5. Pseudocode for Garland’s QEM

As with PM, you can extend quadric error meshes to respect surface attributes-in particular, colors, texture coordinates, and normal directions [Garland 1998].

Simplification Envelopes

When you simplify a mesh, how can you make sure the shape stays true to the original model? Simplification envelopes offer a simple and elegant way to do this [Cohen 1993]. A mesh is bounded on both sides by two envelopes, two offset surfaces guaranteed to be no more than a maximum distance ε from any point on the original model. You then simplify the mesh by removing vertices, which creates holes in the mesh. If you can fill a hole without falling outside of the simplification envelopes, the removal is allowed. Vertices are removed until no holes can be filled.

By varying ε, a sequence of simplified models can be constructed. Small ε produces a close match to the original model but with limited simplification. Large ε produces a significant reduction in vertex and polygon count but with a less accurate fit to the original model.

You build the simplification envelopes by constructing envelopes for each triangle in the mesh, then combining them. For an edge (v1, v2) and normals to the vertices (n1, n2), find the plane passing through the edge in the direction n1 at v1 and n2 at v2. Every triangle will have three such planes-one for each of its three edges. Extending the triangle’s edges a distance +ε and −ε along each plane produces two simplification envelopes-one above the triangle and one below it (see Figure 6). Envelopes for individual polygons are combined. Any intersections between polygon envelopes are corrected, producing a smooth surface bounding the object.

After you build the two envelopes, simplification can begin. You can use two different approaches to control simplification: a local method that uses vertex removal and a global method that uses overlap.

The local approach simplifies large models by minimizing preprocessing and considering only local changes to the mesh. Here, each vertex in the mesh is placed on a queue. A vertex is selected from the queue and removed from the mesh, producing a “hole.” The algorithm uses triangulation to try to fill the hole. Each new triangle is tested to ensure that it lies within the mesh’s envelopes. If the hole is filled successfully, the new triangles' vertices are saved in the queue. If the hole cannot be filled, the mesh is left unchanged. You continue until the queue is empty. At this point, no additional simplification is possible.

The global approach tries to remove as many vertices as possible at each step in the algorithm. To do this, the vertices of the original mesh are combined into all possible triangles that do not intersect the boundary envelopes. Count the number of mesh vertices each candidate triangle overlaps, place them on a heap in descending order of vertex overlap, then retrieve them one by one. For each candidate, remove all the triangles in the mesh that overlap the candidate, forming a large hole. As before, you use triangulation to try to fill the hole. Holes that you successfully fill are kept. This process continues until the heap is empty.

Figure 7 and figure 8 show the pseudocode for both the local and global algorithms, respectively.


build simplification envelopes
assign all vi in Mn to a queue
repeat {
	retrieve vi from queue
	remove vi from mesh
	if resulting hole can be filled
		add new vj to queue
until queue is empty


Figure 7. The local algorithm


build simplification envelopes
for all vi in Mn {
	build all triangles tj within
	calculate vertex overlap
	add tj to heap
repeat {
	remove tj from heap
	remove triangles tj overlaps
	if resulting hole can be filled
		keep filled hole
until heap is empty


Figure 8. The global algorithm

Parallel Mesh Simplification

Mesh simplifications-especially methods that perform local modifications to a mesh-seem like ideal candidates for parallelization. Various approaches have been suggested to divide simplification into independent subtasks that can be run in parallel. The potential advantages of parallel mesh simplification are two-fold.

  • By dividing the problem across several independent processes, the time needed to complete the simplification is the time needed by the slowest process, plus the time needed to subdivide the mesh and combine the processes’ results.
  • The amount of memory required by each process is reduced, because it operates on a subset of the original mesh. So, when a mesh does not fit in main memory, you can split it into subsets that do, eliminating the need to swap data to and from disk.

Both advantages can provide significant speed-ups, particularly for very large meshes.

Parallel simplification algorithms follow a common pattern for producing their final results (Figure 9):
A parent processor partitions the original mesh into p sub-meshes Si.
Each Si is sent to one of p child processors.
Each child simplifies Si and returns it to the parent.
The parent combines the results to produce a final, simplified mesh.

Parallel Simplification

A good example of the split-simplify-combine technique is a method that uses vertex decimation to simplify a mesh. Vertex decimation is directly compatible with both the PM and QEM simplification algorithms. Other local approaches (for example, the local simplification envelope technique) should also work with only minor modifications.

One recent algorithm that follows the standard pattern of partitioning the original mesh into sub-meshes [Rasch 2000] distributes each sub-mesh to a child processor for parallel simplification. A child processor can create its own “worker” processes. These workers will themselves simplify in parallel, further improving performance.

The parent and child processes are usually run on separate machines. This means the processes have separate memory spaces. Rather than passing mesh information back and forth, you can use an external library (for example, the Adsmith-Library) to simulate a shared-memory environment. The library distributes the mesh data to all the processes, but presents it to each process as a single, large memory block.

The basic algorithm still leaves certain issues open. For example, if multiple workers are simplifying a sub-mesh in parallel, it’s important to make sure that two workers don't suggest different changes to the same area of the mesh. You could do this by asking each worker to lock the triangles that use the vertices it is trying to simplify. Similarly, simplifying boundary triangles can create cracks in the mesh if separate child processors make incompatible changes to edges shared across the boundary. Again, you can manage this by locking boundary triangles or by “re-connecting” mismatched boundaries in the master processor-for example, using a zippering algorithm to “sew” the edges back together [Turk 1994].

This algorithm was evaluated on a variety of models ranging from 15,000 to 922,000 polygons. Small models show linear speed-up for up to four processors. For more than four processors, proper load balancing becomes difficult. Complex models that do not fit in the main memory show super-linear speed-ups, because most of the time spent in the non-parallel case involves swapping data to and from disk. Given an estimate of how long a single processor would need if it had enough memory to store the entire model, you see a 26× improvement for a 36-processor environment.

Distributed Simplification

Partition and reconstruction approaches that use different communication strategies also exist. One example uses a portable message-passing interface (Local Area Multicomputer-Message Passing Interface, or LAM-MPI) to communicate between processes, rather than assuming a large shared memory space [Dehne 2000]. A master process partitions the original mesh into sub-meshes of roughly equal size, then sends them to child processors for simplification. A child processor may create additional children to further distribute the simplification tasks. Results are sent back to the master to be combined into a final result.

Simplification times with 1, 2, 4, 8, and 16 processors have been measured, together with the time to transfer data between the master and the children and the difference in time ΔS between the slowest and fastest child processes. Simplification times show a near linear speed-up as the number of processors increases. As with the shared memory algorithm, speed-ups also improve as the size of the model increases. Transfer times, even for 16 processors, do not exceed 5% of the total execution time. However, ΔS can be as high as 50% for certain tests. This shows a significant load imbalance. In spite of this, partitioning the original mesh to produce perfect load balancing is extremely difficult, and the time needed to do it may be more than the time saved during the improved simplification.

A recent re-evaluation of the message passing algorithm confirms the speed-up in execution times. Geometric error and execution times for 1, 8, and 16 processors was recorded and compared to the QEM algorithm [Tang 2007, Yongquan 2009]. Parallel results are similar to QEM, although in all cases QEM produces smaller absolute errors than the binary space-partitioning simplification used by the message passing algorithm. Execution times show a speed-up of approximately 7× for the 8-processor environment and approximately 11× for 16 processors, reaffirming the effectiveness of a parallel implementation.

Parallel Hierarchical Level-of-Detail

Another way to simplify a model before rendering is by creating hierarchical level-of-detail trees (HLODs) in parallel. HLODs are like any other tree structure. They have a root, internal nodes, and leaf nodes. The original, full-detail model is stored at the root. You simplify by performing the following steps:

  1. Simplify the original model stored at the root.
  2. At the same time, subdivide the original model into multiple sub-model parts.
  3. Create a child node to hold each sub-model.
  4. Ask the child nodes to simplify their sub-model.

Simplification in the child nodes is more aggressive than the simplification at the root node. This means the child nodes will be a more simplified version of the model.

You perform exactly the same "subdivide and simplify" strategy on internal nodes to further simplify the model. When finished, you have an HLOD tree [Cozzi 2008]. Nodes farther down the tree are more simplified. If you take all the leaf nodes and combine their sub-models, you recreate the original model at the highest level of simplification.

Vertex clustering is used to perform simplification. Vertex clustering subdivides a mesh into cells. You choose a single vertex vnew to replace all the vertices {v1, ..., vm} in each cell. Triangles attached to {v1, ..., vm} are updated to use vnew. Triangles that are do not degenerate into lines or points are kept. This strategy works well, since you don't need vertices that are outside the sub-model you are simplifying.

Building HLODs offers not just one, but two opportunities to parallelize. First, the simplification in different nodes can be run in parallel. Second, the recursive steps in the HLOD can also be run in parallel. That is, whenever a subtree is created, you can assign it to a separate thread for further processing.

You can use a hybrid approach that combines both of these advantages (Figure 10) by subdividing the model into eight sub-models (an octree), then processing pairs of subtrees in parallel. The root and each subtree create a separate thread to perform simplification so that subdividing and simplifying can run in parallel. Although you might expect more threads to produce even better results, runtime examination shows that the cost of thread management begins to outweigh the savings from parallel execution, particularly when all the cores in the CPU are busy with other tasks.

An example scenario simplified a city model with 5,646,041 triangles into HLODs of height six. Using separate threads for either simplification or subtree processing produced improvements of around 15-20%. Combining these steps using a hybrid parallel approach sped up the algorithm, generating a savings of approximately 40%. By increasing the number of cores in the CPU and optimizing thread management, you should see even larger improvements.

Parallel View-dependent Mesh Refinement

Parallel methods are not used only for simplifying a mesh. For instance, it is possible to refine a PM in parallel based on where a viewer is looking. The goal is to dynamically reduce detail in areas of the mesh that are small or hidden and maintain detail in areas that are in sight. You can do this by building a “vertex hierarchy” of all possible mesh refinements for Mn. You then use the view frustum, surface orientation, and screen-space size (in pixels) to choose an appropriate level of detail for different regions in the mesh [Hu 2009].

The vertex hierarchy contains the positions of the n vertices in the original mesh, plus their new positions as they are moved during edge collapses. A “vertex frontier” within the hierarchy contains the vertices that are currently being used to render the mesh. At each timestep, you process the frontier's vertices in parallel, searching for locations where new edge collapses or vertex splits are needed because of changes in where a viewer is located and where he is looking. Updates are sent to a triangle buffer that keeps a list of triangles to render.

You can perform load balancing by measuring how long it took to complete each algorithm step during previous passes. Given a fixed time budget (for example, a minimum frames-per-second rendering requirement), you complete as many steps as is possible. Partial steps are held and finished during the next timestep.

Results show that very large models can be simplified and rendered at interactive speeds, even for an error threshold of at most 1 pixel. For example, a 10,000,000-triangle model is simplified to approximately 390,000 triangles and rendered in 22 ms. As the number of cores increases, so too should the performance of the algorithm, because more vertices on the vertex frontier can be updated during each processing step.

Code Resources

Numerous source code implementations exist for each of the three simplification algorithms. Here are a few examples:


Polygon meshes have long been a standard method for representing the shape and surface properties of three-dimensional models. You can use simplification algorithms to set the number of polygons in your models to just the right number to both maintain a desired level of detail and to minimize the number of polygons you need to render.

Numerous simplification algorithms have been proposed that apply basic operations like edge and vertex collapses. Each algorithm reduces the size of a mesh using cost and objective functions to search for a simplified result with a specific polygon count or a guaranteed error tolerance.

More recent work has described ways to parallelize the simplification algorithms. The parallel algorithms community uses a partition, simplify, and recombine approach to process subsets of the original mesh simultaneously. Computer graphics researchers have suggested other uses of parallel processing-for example, to dynamically update a simplified mesh based on a changing view location. Results from both areas show that parallel implementations can provide dramatic improvements, both in execution time and in the number of resources needed to complete the simplification task.

About the Author

Christopher G. Healey received a Bachelor’s degree in Math from the University of Waterloo in Waterloo, Canada, and an M.Sc. and Ph.D. from the University of British Columbia in Vancouver, Canada. Following a postdoctoral fellowship at the University of California at Berkeley, he joined the Department of Computer Science at North Carolina State University, where he is currently an Associate Professor. His research interests include visualization, graphics, visual perception, and areas of applied mathematics, databases, artificial intelligence, and aesthetics related to visual analysis and data management.


Cohen, Varshney, Manocha, Turk, Weber, Agarwal, Brooks, and Wright (1996). Simplification envelopes. In: ACM SIGGRAPH ’93 Conference Proceedings. New Orleans, Louisiana; pp. 119–28.

Cozzi, P. and Loo, B. T. (2008). Parallel Processing City Models for Real-Time Visualization. http://www.seas.upenn.edu/~pcozzi/PreprocessingCityModels.pdf.

Dehne, Langis, and Roth (2000). Mesh simplification in parallel. In: Proceedings 4th International Conference on Algorithms and Architectures for Parallel Processing. Hong Kong; pp. 281–90.

Garland and Heckbert (1997). Surface simplification using quadric error metrics. In: ACM SIGGRAPH ’97 Conference Proceedings. Los Angeles, Cal.; pp. 209–16.

Garland and Heckbert (1998). “Simplifying surfaces with color and texture using quadric error metrics. In: Proceedings IEEE Visualization ’98, Research Triangle Park, NC; pp. 263–9.

Hoppe, DeRose, Duchamp, McDonald, and Stuetzle (1993). Mesh optimization. In: ACM SIGGRAPH ’93 Conference Proceedings, Anaheim, Cal.; pp. 19–26.

Hoppe (1996). Progressive meshes. In: ACM SIGGRAPH ’96 Conference Proceedings, New Orleans, Louisiana; pp. 99–108.

Hu, Sander, and Hoppe (2009). Parallel view-dependent refinement of progressive meshes. In: Proceedings of the 2009 Symposium on Interactive 3D Graphics. Boston, Mass.; pp. 169–76.

Rasch and Schmidt (2000). Parallel mesh simplification. In: Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Algorithms (PDPTA 2000). Las Vegas, Nev.

Tang, Jia, and Li (2007). Simplification algorithm for large polygon model in distributed environment. In: Lecture Notes in Computer Science: Advanced Intelligent Computing Theories and Applications with Aspects of Theoretical and Methodological Issues. Heidelberg, Germany: Springer-Verlag; pp. 960–9.

Turk and Levoy (1994). Zippered polygon meshes from range images. In: ACM SIGGRAPH ’94 Conference Proceedings. Orlando, Flo.; pp. 311–8.

Walter and Healey (2001). Attribute preserving dataset simplification. In: Proceedings IEEE Visualization 2001, San Diego, Cal.; pp. 113–20.

Yongquan, Nan, Pengdong, Chu, Jintao, and Rui (2009). A parallel memory efficient framework for out-of-core mesh simplification. In: Proceedings 11th International Conference on High Performance Computing and Communications (HPCC-09). Seoul, Korea; pp. 666–71.

For more complete information about compiler optimizations, see our Optimization Notice.