Recent posts
https://software.intel.com/en-us/recent/334910
enSimulating Cloth for 3D Games
https://software.intel.com/en-us/articles/simulating-cloth-for-3d-games
<p><strong>by Dean Macri</strong></p>
<hr /><h3>Introduction</h3>
<p>We all live in the real world where things behave according to the laws of physics that we learned about in high school or college. Because of this, we're all expert critics about what looks right or more often wrong in many 3D games. We complain when a character's feet slide across the ground or when we can pick out the repeating pattern in the animation of a flag blowing in the wind. Adding realistic physical simulation to a game to improve these effects can be a giant effort and the rewards for the time invested haven't proven to be worthwhile, yet.<br /><br />
Often, though, it's possible to incrementally add elements to a game that can provide increased realism without extremely high risks. Improving the animation behavior of simple cloth objects like flags in the wind and billowing sails is one area where realism increases without the 18 month development risk of introducing a full-fledged physics engine. Not that I don't want to see more games with all-out physics happening, but I think there are some simple things that can be done with cloth objects in the meantime to improve realism and save modelers time.<br /><br />
At the Game Developers Conference in March 2000, I presented my implementation of two techniques for simulating cloth. I was pointed to another, more recent, technique by someone who attended the class. In this paper I'll recap what I presented about at the conference and include information about the newer technique. Hopefully you'll be able to take the ideas I present here and add some level of support for cloth simulation into your title.<br /><br /><strong>2. Background</strong><br /><br />
Various researchers have come up with different techniques for simulating cloth and other deformable surfaces. The technique that is used by all three methods presented here, and by far the most common, is the idea of a mass-spring system. Simply put, a continuous cloth surface is discretized into a finite number of particles much like a sphere is divided into a group of vertices and triangles for drawing with 3D hardware. The particles are then connected in an orderly fashion with springs. Each particle is connected with springs to its four neighbors along both the horizontal and vertical axes. These springs are called "stretch'" springs because they prevent the cloth from stretching too much. Additional springs are added from each particle to its four neighbors along the diagonal directions. These "shear" springs resist any shearing movement of the cloth. Finally, each spring is connected to the four neighbors along both the horizontal and vertical axes but skipping over the closest particles. These springs are called "bend" springs and prevent the cloth from folding in on itself too easily.<br /><br />
.<img border="0" height="264" src="/sites/default/files/m/d/4/1/d/8/17981_fig1.gif" width="315" /><br /><em>Figure 1 - Stretch (blue), Shear (green), and Bend (red) springs</em><br /><br /><strong>Figure 1</strong> shows a representation of a mass-spring system using the previously mentioned stretch, shear, and bend springs. When rendering this surface, the masses and springs themselves are not typically drawn but are used to generate triangle vertices. The nature of the cloth simulation problem involves solving for the positions of the particles at each frame of a simulation. The positions are affected by the springs keeping the particles together as well as by external forces acting on the particles like gravity, wind, or forces due to collisions with other objects or the cloth with itself.<br /><br />
In the next section we'll look at the problem that we're trying to solve to realistically animate a cloth patch. Much of this will be very familiar to anyone who has already experimented with cloth simulation. Feel free to skip to Section 4 if you just want details on the various implementations I tried.</p>
<hr /><h3>The Cloth Problem</h3>
<p>Like any other physical simulation problem, we ultimately want to find new positions and velocities for objects (cloth particles in our case) using Newton's classic law: <img border="0" height="22" src="/sites/default/files/m/d/4/1/d/8/17996_image80.gif" width="53" /> or more directly <img border="0" height="25" src="/sites/default/files/m/d/4/1/d/8/17997_image81.gif" width="40" />. This says that we can find the acceleration (<img border="0" height="18" src="/sites/default/files/m/d/4/1/d/8/17998_image82.gif" width="13" />) on a particle by taking the total force (<img border="0" height="21" src="/sites/default/files/m/d/4/1/d/8/17999_image83.gif" width="17" />) acting on the particle and dividing by the mass (<em>m</em>) of the particle. Using Newton's laws of motion, we can solve the differential equations <img border="0" height="24" src="/sites/default/files/m/d/4/1/d/8/18000_image84.gif" width="42" />and <img border="0" height="25" src="/sites/default/files/m/d/4/1/d/8/18001_image85.gif" width="41" />to find the velocity (<img border="0" height="18" src="/sites/default/files/m/d/4/1/d/8/18002_image86.gif" width="13" />) and position (<img border="0" height="21" src="/sites/default/files/m/d/4/1/d/8/18003_image87.gif" width="16" />) of the particle. For simple forces, it may be possible to analytically solve these equations, but realistically, we'll need to do numerical integration of the acceleration to find new velocities and integrate those to find the new positions. In Sections 3.1 through 3.3 we'll take a high level look at explicit integration, implicit integration, and adding post-integration deformation constraints for solving the equations of motion for cloth particles. Many excellent in-depth articles have been written about various aspects of physics simulation including cloth simulation. I'd highly recommend the articles by Jeff Lander<sup>i, ii</sup>, and Chris Hecker<sup>iii</sup> if you haven't already read them.<br /><br /><br /><strong>3.1. Explicit Integration</strong><br /><br />
One of the simplest ways to numerically integrate the differential equations of motion is to use the tried-and-true method known as Euler's method. For a given initial position, <img border="0" height="24" src="/sites/default/files/m/d/4/1/d/8/18004_image88.gif" width="20" />, and velocity, <img border="0" height="24" src="/sites/default/files/m/8/0/c/18005_image89.gif" width="16" />, at time <img border="0" height="24" src="/sites/default/files/m/d/4/1/d/8/18006_image90.gif" width="14" />and a time step, <img border="0" height="18" src="/sites/default/files/m/d/4/1/d/8/18007_image91.gif" width="20" />, we can calculate a new position, <img border="0" height="22" src="/sites/default/files/m/d/4/1/d/8/18008_image92.gif" width="18" />, and velocity, <img border="0" height="22" src="/sites/default/files/m/d/4/1/d/8/18009_image93.gif" width="14" />, using a Taylor series expansion of the above differential equations and then dropping some terms (which may introduce error, <img border="0" height="14" src="/sites/default/files/m/d/4/1/d/8/18010_image94.gif" width="12" />):<br /><br /><img border="0" height="26" src="/sites/default/files/m/d/4/1/d/8/18011_image95.gif" width="189" /> (1.1)<br /><br /><img border="0" height="28" src="/sites/default/files/m/d/4/1/d/8/18012_image96.gif" width="200" /> (1.2)<br /><br />
Unfortunately, Euler's method takes no notice of quickly changing derivatives and so does not work very well for the stiff differential equations that result from the strong springs connecting cloth particles. Provot<sup>iv</sup> introduced one method to overcome this problem and Desbrun<sup>v</sup> later expanded on this. We'll examine these in more depth in Section 3.3. Until then, let's look at implicit integration.<br /><br /><strong>3.2. Implicit Integration</strong><br /><br />
Given the problem with Euler's method for stiff differential equations and knowing that the problem still exists for other similar "explicit" integration methods, some researchers have worked with what are known as "implicit" integration methods. Baraff and Witkin<sup>vi</sup> presented a thorough examination of using implicit integration methods for the cloth problem. Implicit integration sets up a system of equations and then solves for a solution such that the derivatives are consistent both at the beginning and the end of the time step. In essence, rather than looking at the acceleration at the beginning of the time step, it finds an acceleration at the end of the time step that would point back to the initial position and velocity.<br /><br />
The formulation I'm using here is from the Baraff and Witkin paper except I've used <img border="0" height="21" src="/sites/default/files/m/d/4/1/d/8/18003_image87.gif" width="16" /> to represent the position of the particles rather than <img border="0" height="18" src="/sites/default/files/m/d/4/1/d/8/18013_image97.gif" width="13" />. The system of equations is<br /><br /><img border="0" height="53" src="/sites/default/files/m/d/4/1/d/8/18014_image98.gif" width="213" /> (1.3)<br /><br />
Here M<sup>-1</sup> is the inverse of a matrix with the mass of the individual particles along the diagonal. If all the particles are the same mass, we can just divide by the scalar mass, <em>m</em>. Like was done in the explicit case, we use a Taylor series expansion of the differential equations to form the approximating discrete system:<br /><br /><img border="0" height="50" src="/sites/default/files/m/d/4/1/d/8/18015_image99.gif" width="238" /> (1.4)<br /><br />
The top row of this system is trivial to find once we've found the bottom row, so by plugging the top row into the bottom row, we get the linear system:<br /><br /><img border="0" height="28" src="/sites/default/files/m/d/4/1/d/8/17983_image100.gif" width="257" /><br /><br /><strong>3.3. Deformation Constraints</strong><br /><br />
When using either explicit integration or implicit integration to determine new positions and velocities for the cloth particles, it is possible to further improve upon the solution using deformation constraints after the integration process. Provot proposed this method in his paper and Desbrun further combined this with a partial implicit integration technique to achieve good performance with large time steps.<br /><br />
The technique is very simple and easy to implement. Once an integration of positions and velocities has been done, a correction is applied iteratively. The correction is formed by assuming that the particles moved in the correct direction but that they may have moved too far. Particles are then pulled together along the correct direction until they are within the limits of the deformation constraints. The process can be applied multiple times until convergence is reached within some tolerance or there is no time left for the process to be able to maintain a given frame rate. Using deformation constraints can take a normally unstable system and stabilize it quite well. I've found that using a fixed number of iterations typically works well.<br /><br />
Now that we've taken a brief look at integration techniques and how to improve upon the results, let's have a look at the implementations I did. The source code for my implementations can be downloaded and used in your application or just examined for ideas.</p>
<p><a href="/en-us/articles/code-samples-license-2/" rel="nofollow">Click here to download source code</a> (366kb zip)</p>
<hr /><h3>Implementation</h3>
<p>I tried implementing a simple cloth patch using three techniques: explicit integration with deformation constraints, implicit integration, and semi-implicit integration with deformation constraints. The sample application depicted in <strong>Figure 2</strong> shows a simple cloth patch that can be suspended by any or all of its four corners.<br /><br /><img border="0" height="251" src="/sites/default/files/m/d/4/1/d/8/17992_image64.gif" width="335" /><br /><em>Figure 2 - Cloth Sample Application</em><br /><br />
Gravity pulls downward on the particles and stretch, shear, and bend springs keep the particles together as a cloth patch. A wireframe version of the cloth is shown in <strong>Figure 3</strong>. Two triangles are produced for every four particles forming a grid square.<br /><br /><img border="0" height="320" src="/sites/default/files/m/d/4/1/d/8/17993_image65.gif" width="427" /><br /><em>Figure 3 - Wireframe view of cloth patch</em><br /><br />
I'll discuss the implementation specifics here with a simple analysis of the results in Section 5.<br /><a name="basics" id="basics"></a><br /><br /><strong>4.1. Basics</strong><br /><br />
For the three implementations, I shared a lot of code. Everything is written in C++ with a rough attempt at modularizing the cloth specific code into a set of physics/cloth related classes. I used a 3D application wizard to create the framework and then added the cloth specific stuff. Information about the 3D AppWizard, for those interested, can be found in the article <em>Creating A Custom Appwizard for 3D Development</em>.<br /><br />
When wading through the source code, you'll find that there are quite a few files. Most of the files that pertain to the cloth simulation are in the files that begin with "Physics_". In addition to these I also created a "ClothObject" class with corresponding filenames which is instantiated and manipulated from the "ClothSample" class.<br /><br />
I experimented with performance with both single-precision and double-precision floating point numbers. To easily change this, I created a <em>typedef</em> in Physics.h for a "Physics_t" type that is used anywhere you would normally use "float" or "double". I found (expectedly) that performance slowed when using double-precision numbers and I didn't notice any improved stability. Your mileage may vary especially if you add support for collision detection and response.</p>
<hr /><h3>Mass-Spring System</h3>
<p>The mass-spring system is implemented as a particle system. This basically means that I don't do any handling of torque or moments of inertia. Within the <em>Physics_ParticleSystem</em> class, I allocate necessary information for the various integration schemes and I allocate large vectors for holding the positions, velocities, forces, etc. of the individual particles. I maintain a linked list of forces that act on the particles. With this implementation there's no way of dynamically changing the number of particles in the system (although forces can be added and removed). For the implicit integration scheme, I allocate some sparse, symmetric matrices to hold the derivatives of the forces and temporary results. For the semi-implicit scheme, I allocate some dense, symmetric matrices to hold the Hessian matrix and inverse matrix, W, for filtering the linear component of the forces.<br /><br />
Regardless of which integration scheme is used we'll use the same overall update algorithm. Pseudo-code for updating the cloth is shown in <strong>Figure 4</strong>. This routine, <em>Update</em>, is called once per frame and in my implementation uses a fixed time step. Ideally, you'll want to use a variable time step. Remember that doing so can have an impact on performance, especially in the semi-implicit implementation of Desbrun's algorithm because a matrix inversion would be done at each frame where the step size changed. Clearing the accumulators is a no-brainer so I'll just dive into the other three steps of the algorithm in further detail.<br /><br /><strong>4.2.1. Calculating forces and derivatives</strong><br /><br />
My implementation only has two types of forces, a spring force and a gravity force. Both are derived from a <em>Physics_Force</em> base class. During the update routine of the particle system, each force is enumerated and told to apply itself to the fo rce and force derivative accumulators. Force derivatives are only needed when using the implicit integration scheme (actually, they're needed for the semi-implicit integration scheme, but are handled differently).<br /><br />
The gravity force is simple and just adds a constant (the direction and magnitude of gravity: 0,-9.8,0 in my case) to the "external" force accumulator. I maintain separate "internal" and "external" accumulators to support the split integration scheme proposed by Desbrun. The downside to this is that I would really need a separate spring force for handling user supplied force to the cloth because the spring force as implemented assumes that it is acting internally to the cloth only.<br /><br />
The spring force is a simple, linear spring with damping. I derived the force from a condition function as was done in the Baraff/Witkin paper. Unlike the Baraff/Witkin paper's use of separate condition functions for stretching, shearing and bending on a per triangle basis, I use just one condition function for a linear spring connecting two particles. The condition function I used was <img border="0" height="26" src="/sites/default/files/m/d/4/1/d/8/17988_image108.gif" width="146" />where <em>p</em><sub>0</sub> and <em>p</em><sub>1</sub> are the two particles affected by the spring and <em>dist</em> is the rest distance of the spring. Forces were calculated as derivatives of the energy function formed by the condition function: <img border="0" height="29" src="/sites/default/files/m/d/4/1/d/8/17989_image109.gif" width="116" />.<br />
The Desbrun paper uses the time step and spring constant to apply damping but I apply damping as derived by the Baraff/Witkin paper. The damping constant I use is a small multiple of the spring constant.<br /><br /><strong>4.2.2. Integrating forces and updating positions and velocities</strong><br /><br />
By far, the trickiest code to understand is that for integrating the forces to determine new velocities and positions for the cloth particles. We'll start with the simplest case, the explicit integration scheme with deformation constraints.</p>
<p><strong>4.2.2.1. Explicit integration with deformation constraints</strong></p>
<p>Using explicit Euler integration is a straightforward application of equations. The acceleration is found by dividing the force for each particle by the particles mass (actually, we store 1/mass and then do a multiplication). Then, the acceleration is multiplied by the time step to update the velocities. The new velocities are multiplied by the time step to update the positions. The new positions are actually stored in a temporary location so that the deformation constraints can be applied. To apply the deformation constraints, each spring force is asked to "fixup" its associated particles. Basically, if the length of the spring has exceeded a maximum value (determined as a multiple of the rest length of the spring), then the particles are pulled closer together. Finally, we take the fixed-up temporary positions, subtract the starting positions and divide by the time step to get the actual velocities needed to achieve the end state. Then we copy the temporary positions to the actual positions vector and we're ready to render.</p>
<p><strong>4.2.2.2. Implicit integration</strong></p>
<p>At the other end of the spectrum in terms of difficulty is doing full implicit integration using equation (1.6). For this, we form a large, linear system of equations and then use an iterative solution method called the pre-conditioned conjugate gradient method. The Baraff/Witkin paper goes into details on this and explains the use of a filtering process for constraining particles. In my implementation, I inlined the filtering function everywhere it was used. I won't go into the ugly details of the conjugate gradient method, but I will explain briefly some of the tricks I used to improve performance. For one, the large sparse matrices that get formed are all symmetric, so I cut storage requirements almost in half by only storing the upper triangle of the matrices. In doing so, I had to think carefully about the matrix-vector multiply routines. Secondly, in cases where we would actually be using a matrix but one that only had non-zero elements along the diagonal, I just stored the matrix as a vector. I added some specialized routines to the <em>Physics_LargeVector</em> class for "inverting" the vector which just replaced each element with one over the element. Finally, I didn't do any dynamic allocation of the temporary sparse matrices because the overhead would have been too severe. So I ended up keeping some temporary matrices as private members of the <em>Physics_ParticleSystem</em> class.</p>
<p><strong>4.2.2.3. Semi-implicit integration with deformation constraints</strong></p>
<p>The last integration method I tried was a semi-implicit method as described by Desbrun. Desbrun divided the internal forces acting on the cloth into linear components and non-linear components. The linear components could then be easily integrated using implicit integration without having to solve a linear system. Instead, a large constant matrix is inverted once and then just a matrix multiply is required to do the integration. The non-linear components are approximated as torque changes on a global scale when using his technique. In addition, deformation constraints are used to prevent overly large stretching. As mentioned previously, I created a <em>Physics_SymmetricMatrix</em> class for storing the Hessian matrix of the linear portion of the internal cloth forces. The Hessian matrix is used in place of <img border="0" height="26" src="/sites/default/files/m/d/4/1/d/8/17990_image110.gif" width="18" />from equation (1.6) and because of the linear nature imposed by Desbrun's splitting of the forces, <img border="0" height="25" src="/sites/default/files/m/d/4/1/d/8/17990_image110.gif" width="18" />is zero. Due to the splitting of the problem into a linear and non-linear portion, we don't need to solve a linear system as we did in the Baraff/Witkin implementation. Rather, we can just "filter" the internal forces by multiplying by the inverse matrix <img border="0" height="26" src="/sites/default/files/m/d/4/1/d/8/17991_image112.gif" width="78" />where <em>I</em> is the identity matrix, <em>dt</em> is the time step, <em>m</em> is the mass of a particle, and <em>H</em> is the Hessian matrix. We then need to compensate for errors in torque introduced by the splitting. I'd refer the reader to the Desbrun article for more information about the technique. As in the explicit integration scheme, once we've integrated the forces and obtained new velocities and positions (again stored in a temporary vector) we can apply the deformation constraints. See above for details.</p>
<hr /><h3>Extra Tidbits</h3>
<p>While the above explanations of the update loops give the core information about how the cloth patch animates, there is some secondary information that is useful to know when looking through the code. I'll go through several different areas and unless otherwise noted, the text refers to all three update methodologies.<br /><br />
Each particle in the mesh can belong to at most six triangles. I generate a normal for each triangle and then add these and normalize to get the normal at each particle. This process doesn't seem to consume much time, but if every processor cycle is critical, you can choose to average less than six normals.<br /><br />
For the semi-implicit implemenation, I need to form the Hessian matrix that corresponds to the way the particles are connected by the springs. I do this once, upfront, because the spring constants don't change and so the Hessian matrix doesn't change. For each spring, it's <em>Prepare Matrices</em> method is called. This method sets the appropriate elements in the Hessian matrix that the spring affects. <em>Prepare Matrices</em> also is called to "touch" elements of the sparse matrices that will be used by the implicit implementation. This enables the memory allocation to happen only once.<br /><br />
I incorporated a <em>very</em> simplistic collision detection for the cloth with the ground plane. If you use the number keys (0,1,2,3) to toggle constraints on the corners, you can get the cloth to move downward. When it hits the floor, I stop all movement in the downward direction and fix the particles to the plane of the floor. There's no friction, so it's not very realistic. For the implicit implementation, I imposed constraints and particle adjustments as describe by Baraff and Witkin, however things tend to jump unstably as the cloth hits the floor. It's possible a smaller time step is needed but I didn't investigate further.<br /><br />
Both the explicit and semi-implicit routines use particles with infinite mass to constrain them. Because of this, the <em>Fixup</em> routine for applying the deformation constraints looks at the inverse mass of each particle and only moves the particle if its mass is non-infinite (which means the inverse mass is non-zero).<br /><br />
While running the demo the following keys affect the behavior of the cloth:</p>
<ul><li>P - Pauses the animation of the cloth</li>
<li>W - Toggles wireframe so you can see the triangles</li>
<li>X - Exits the demo</li>
<li>F - Toggles to fullscreen mode</li>
<li>H - Brings up a help menu showing these keys</li>
<li>R - Resets the cloth to its initial position - horizontal to the floor and a bit above it</li>
<li>0, 1, 2, 3 - Toggles constraints for the four corners of the cloth</li>
</ul><p> </p>
<p>Finally, the configuration of the cloth simulation (number of particles, strength of springs, time step, etc.) is contained in <em>Cloth.ini</em>. I added comments for each entry in the file so look there if you want to play around with things. By default the integration method is explicit.</p>
<hr /><h3>Which Method is Best?</h3>
<p>Since I've covered three different techniques for updating the cloth, I'm sure you're wondering what the best m ethod is. Well, for the case I tried the explicit implementation is clearly the fastest as the results in <strong>Figure 5</strong> show. This table was generated from running the sample code on an Intel® Pentium® III processor-based system running at 600 Mhz with Microsoft Windows* 98 and DirectX* 7.0. The graphics card was a Matrox* G-400 with the resolution set to 1024x768 @ 60Hz and the color depth set to 16-bit. I used a fixed time step of 0.02 seconds which would be appropriate for a frame rate of 50 frames per second.<br /><br /><img border="0" height="198" src="/sites/default/files/m/d/4/1/d/8/17995_image72.gif" width="335" /><br /><em>Figure 5 - Performance results for various cloth sizes</em><br /><br />
Some interesting things to note about the performance that isn't shown in the figure are:</p>
<ul><li>Initialization time for the implicit method can be fairly large as the sparse matrices are allocated.</li>
<li>Initialization time for the semi-implicit method can be considerably larger than that for the implicit method because a large matrix (1089x1089 in the 33x33 patch case) needs to be inverted. The same amount of computation would be required any time the time step changed.</li>
<li>The implicit method is the only one that uses the actual spring strengths to hold the cloth together. Because of this, it may be necessary to increase the spring constants when using the implicit method.</li>
<li>Desbrun claimed being able to vary the strength of the spring constant by a factor of 10<sup>6</sup> without causing instability. I was only able to achieve a factor of 10<sup>5</sup> which makes me think that other simulation specifics (like particle masses) may have been different.</li>
<li>For the explicit and semi-implicit cases I needed to make the mass of the particles fairly large to achieve stability with a time step of 0.02 seconds. This could cause the cloth to have unusual properties if incorporated with other physics simulation involving inertia and collisions. In your game you may want to maintain separate masses for the updating of the cloth and the interaction of the cloth with the world.</li>
<li>Because I haven't implemented real collision detection it's uncertain how collision with other objects will affect the stability and hence the performance of the various implementations.</li>
<li>I maintained a linked list of spring forces that needed to be applied and then have their deformation constraints applied. Performance could be improved by storing these in an array that could be more quickly walked through.</li>
</ul><p> </p>
<p>Even though explicit integration seems to work best for my test case, the benefits of implicit integration should not be overlooked. Implicit integration can stably handle extremely large forces without blowing up. Explicit integration schemes cannot make such a claim. And while deformation constraints can be used with explicit integration to provide realistic looking cloth, implicit integration would have to be used if a more physically accurate simulation of cloth was required.</p>
<hr /><h3>Conclusion</h3>
<p>I breezed through some of the math and background with the hope that the accompanying source code would be even more valuable than a theoretical explanation which can be found in other more academic papers. Feel fr ee to take parts of the code and incorporate it in your title. There's a lot more that can be done than what I've presented here. Start simple and add a wind force and remember that it should affect triangles created by the particles not the particles themselves. Or try adding a user controllable mouse force to drag the cloth around. Depending on whether you want to use cloth simulation for eye candy in your game (like flags blowing in the wind or the sail on a ship) or as a key element, you'll probably need collision detection at some point. Keep in mind that cloth-cloth collision detection can be difficult to do efficiently.<br /><br />
Well, I've taken a brief look at real-time simulation of realistic looking cloth and hopefully have presented something of use to you in your game development. I look forward to seeing new games that incorporate various aspects of physics simulation with cloth simulation as one of them.</p>
<p><a href="/sites/default/files/m/d/4/1/d/8/41050_clothsample.zip" rel="nofollow">Click here to download source code</a> (366kb zip)</p>
<hr /><h3>References</h3>
<p>i Jeff Lander. Lone Game Developer Battles Physics Simulator. On <a href="http://www.gamasutra.com/" rel="nofollow">www.gamasutra.com</a>*, February 2000.<br /><br />
ii Jeff Lander. Graphic Content: Devil in the Blue-Faceted Dress: Real-time Cloth Animation. In <em>Game Developer Magazine</em>. May 1999.<br /><br />
iii Chris Hecker. Physics Articles at <a href="http://chrishecker.com/Rigid_Body_Dynamics" rel="nofollow">http://chrishecker.com/Rigid_Body_Dynamics</a>* originally published in <em>Game Developer Magazine</em>. October 1996 through June 1997.<br /><br />
iv Xavier Provot. Deformation Constraints in a Mass-Spring Model to Describe Rigid Cloth Behavior. In <em>Graphics Interface</em>, pages 147-155, 1995.<br /><br />
v Mathieu Desbrun, Peter Schroder and Alan Barr. Interactive Animation of Structured Deformable Objects. In <em>Graphics Interface '99</em>. June 1999.<br /><br />
vi D. Baraff and A. Witkin. Large Steps in Cloth Simulation. <em>Computer Graphics (Proc. SIGGRAPH)</em>, pages 43-54, 1998.</p>
<hr /><h3>About the Author</h3>
<p><img align="left" border="0" src="/sites/default/files/m/d/4/1/d/8/16047_mugshot.jpg" /> Dean Macri's research has focused on tessellating NURBS surfaces in real-time, simulating cloth surfaces in real-time and procedurally generating 3D content. Currently, he is helping game developers achieve maximum performance in their titles.</p>
<p> </p>
<hr />Tue, 06 Mar 12 22:04:30 -0800Dean Macri (Intel)142181An Introduction to Neural Networks with an Application to Games
https://software.intel.com/en-us/articles/an-introduction-to-neural-networks-with-an-application-to-games
<strong>by Dean Macri</strong>
<h3>Introduction</h3>
<p><em>Speech recognition, handwriting recognition, face recognition</em>: just a few of the many tasks that we as humans are able to quickly solve but which present an ever increasing challenge to computer programs. We seem to be able to effortlessly perform tasks that are in some cases impossible for even the most sophisticated computer programs to solve. The obvious question that arises is <em>"What's the difference between computers and us?"</em>.<br /><br />We aren't going to fully answer that question, but we are going to take an introductory look at one aspect of it. In short, the biological structure of the human brain forms a massive parallel network of simple computation units that have been trained to solve these problems quickly. This network, when simulated on a computer, is called an artificial neural network or neural net for short.<br /><br />Figure 1 shows a screen capture from a simple game that I put together to investigate the concept. The idea is simple: there are two players each with a paddle, and a ball that bounces back and forth between them. Each player tries to position his or her paddle to bounce the ball back towards the other player. I used a neural net to control the movement of the paddles and through <em>training</em> (we'll cover this later) taught the neural nets to play the game well (perfectly to be exact).<br /><br /><img src="/sites/default/files/m/d/4/1/d/8/38780_neural1.jpg" border="0" /><br /><strong>Figure 1 -- Simple Ping-pong Game for Experimentation</strong><br /><br />In this article, I'll cover the theory behind one subset of the vast field of neural nets: back-propagation networks. I'll cover the basics and the implementation of the game just described. Finally, I'll describe some other areas where neural nets can be used to solve difficult problems. We'll begin by taking a simplistic look at how neurons work in your brain and mine.</p>
<hr /><h3>Neural Network Basics<br /></h3>
<p><strong><em>Neurons in the Brain</em></strong><br />Shortly after the turn of the 20<sup>th</sup> century, the Spanish anatomist Ramón y Cajál introduced the idea of <em>neurons</em> as components that make up the workings of the human brain. Later, work by others added details about <em>axons</em>, or output connections between neurons, and about <em>dendrites</em>, which are the receptive inputs to a neuron as seen in Figure 2.<br /><br /><br /><br /><br /><img src="/sites/default/files/m/d/4/1/d/8/38792_neural2.jpg" border="0" /><br /><strong>Figure 2 -- Simplified Representation of a Real Neuron</strong><br /><br />Put simplistically, a neuron functionally takes many inputs and combines them to either produce an excitatory or inhibitory output in the form of a small voltage pulse. The output is then transmitted along the axon to many inputs (potentially tens of thousands) of other neurons. With approximately 10<sup>10</sup> neurons and 6x10<sup>13</sup> connections in the human brain¹ it's no wonder that we're able to perform the complex processes we do. In nervous systems, massive parallel processi ng compensates for the slow (millisecond+) speed of the processing elements (neurons).<br /><br />In the remainder of this article, we'll cover how artificial neurons, based on the model just described, can be used to mimic behaviors common to humans and other animals. While we can't simulate 10 billion neurons with 60 trillion connections, we can give you a simple worthy opponent to enrich your game play.</p>
<hr /><h3>Artificial Neurons</h3>
<p>Using the simple model just discussed, researchers in the middle of the 20<sup>th</sup> century derived mathematical models for simulating the workings of neurons within the brain. They chose to ignore several aspects of real neurons such as their pulse-rate decay and came up with an easy-to-understand model. As illustrated in Figure 3, a neuron is depicted as a computation block that takes inputs (X<sub>0</sub>, X<sub>1</sub> X<sub>n</sub>) and weights (W<sub>0</sub>, W<sub>1</sub> W<sub>n</sub>), multiplies them and sums the results to produce an induced local field, v, which then passes through a decision function, φ(v), to produce a final output, y.<br /><br /><img src="/sites/default/files/m/d/4/1/d/8/38798_neural3.jpg" border="0" /><br /><strong>Figure 3 -- Mathematical model of a neuron</strong><br /><br />Put in the form of a mathematical equation, this reduces to:<br /><br /><img src="/sites/default/files/m/d/4/1/d/8/38799_neural4.jpg" border="0" /><br /><br />I introduced two new terms, <em>induced local field</em> and <em>decision function</em>, while describing the components of this model so let's take a look at what these mean. The induced local field of a neuron is the output of the summation unit, as indicated in the diagram. If we know that the inputs and the weights can have values that range from -? to +?, then the range of the induced local field is the same. If just the induced local field was propagated to other neurons, then a neural network could perform only simple, linear calculations. To enable more complex computation, the idea of a <em>decision function</em> was introduced. McCulloch and Pitts introduced one of the simplest decision functions in 1943. Their function is just a threshold function that outputs one if the induced local field is greater than or equal to zero and outputs zero otherwise. While some simple problems can be solved using the McCulloch-Pitts model, more complex problems require a more complex decision function. Perhaps the most widely used decision function is the <em>sigmoid</em> function given by:<br /><br /><img src="/sites/default/files/m/d/4/1/d/8/38800_neural5.jpg" border="0" /><br /><br />The sigmoid function has two important properties that make it well-suited for use as a decision function:</p>
<ul><li>It is everywhere differentiable (unlike the threshold function), which enables an easy way to train networks, as we'll see later. </li>
<li>Its output includes ranges that exhibit both nonlinear and linear behavior. </li>
</ul><p> </p>
<p>Other decision functions like the hyperbolic tangent ?(v)=tanh(v), are sometimes used as well. For the examples we'll cover, we'll use the sigmoid decision function unless otherwise noted.</p>
<hr /><h3>Connecting the Neurons</h3>
<p>We've covered the basic building blocks of neural networks with our look at the mathematical model of an artificial neuron. A single neuron can be used to solve some relatively simple problems, but for more complex problems we have to examine a <em>network</em> of neurons, hence the term: neural network.<br /><br />A neural network consists of one or more neurons connected into one or more <em>layers</em>. For most networks, a layer contains neurons that are <em>not</em> connected to one another in any fashion. While the interconnect pattern between layers of the network (its "topology") may be regular, the weights associated with the various inter-neuron links may vary drastically. Figure 4 shows a three-layer network with two nodes in the first layer, three nodes in the second layer, and one node in the third layer. The first-layer nodes are called <em>input nodes</em>, the third-layer node is called an <em>output node</em>, and nodes in the layers in between the input and output layers are called <em>hidden nodes</em>.<br /><br /><img src="/sites/default/files/m/d/4/1/d/8/38801_neural6.jpg" border="0" /><br /><strong>Figure 4 -- A Three-Layer Neural Network</strong><br /><br />Notice the input labeled, x<sub>6</sub>, on the first node in the hidden layer. The fixed input (x<sub>6</sub>) is not driven by any other neurons, but is labeled as being a constant value of one. This is referred to as a <em>bias</em> and is used to adjust the firing characteristics of the neuron. It has a weight (not shown) associated with it, but the input value will never change. Any neuron can have a bias added by fixing one of its inputs to a constant value of one. We haven't covered the training of a network yet, but when we do, we'll see that the weight affecting a bias can be trained just like the weights of any other input.<br /><br />The neural networks we'll be dealing with will be structurally similar to the one in Figure 4. A few key features of this type of network are:</p>
<ul><li>The network consists of several layers. There is one input layer and one output layer with zero or more hidden layers </li>
<li>The network is <em>not</em> recurrent which means that the outputs from any node only feed inputs of a following layer, not the same or any previous layer. </li>
<li>Although the network shown in Figure 4 is fully connected, it is not necessary for every neuron in one layer to feed every neuron in the following layer. </li>
</ul><p> </p>
<hr /><h3>Neural Networks for Computation</h3>
<p>Now that we've taken a brief look at the structure of a neural network, let's take a quick look at how computation can be performed using a neural network. Later in the paper we'll learn how to go about adjusting weights or <em>training</em> a network to perform a desired computation.<br /><br />At the simplest level, a single neuron produces one output for a given set of inputs and the output is always the same for that set of inputs. In mathematics, this is known as a function or <em>mapping</em>. For that neuron, the exact relationship between inputs and outputs is given by the weights affecting the inputs and by the particular decision function used by the neuron.<br /><br />Let's look at a simple example that's common ly used to illustrate the computational power of neural networks. For this example, we will assume that the decision function used is the McCulloch-Pitts threshold function. We want to examine how a neural network can be used to compute the truth table for an AND logic gate. Recall that the output of an AND gate is one if both inputs are one and zero otherwise. Figure 5 shows the truth table for the AND operator.<br /><br /><img src="/sites/default/files/m/d/4/1/d/8/38802_neural7.jpg" border="0" /><br /><strong>Figure 5 -- Truth Table for AND Operator</strong><br /><br />We want to construct a neural network that has two inputs, one output, and calculates the truth table given in Figure 5.<br /><br /><img src="/sites/default/files/m/d/4/1/d/8/38803_neural8.jpg" border="0" /><br /><strong>Figure 6 -- Neuron for Computing an AND Operation</strong><br /><br />Figure 6 shows a possible configuration of a neuron that does what we want. The decision function is the McCulloch-Pitts threshold function mentioned previously. Notice that the bias weight (w<sub>0</sub>) is -0.6. This means that if both X<sub>1</sub> and X<sub>2</sub> are zero then the induced local field, v, will be -0.6 resulting in a 0 for the output. If either X<sub>1</sub> or X<sub>2</sub> is one, then the induced local field will be 0.5+(-0.6)= -0.1 which is still negative resulting in a zero output from the decision function. Only when both inputs are one will the induced local field go non-negative (0.4) resulting in a one output from the decision function.<br /><br />While this use of a neural network is overkill for the problem and has a fairly trivial solution, it's the start of illustrating an important point about the computational abilities of a single neuron. We're going to examine this problem and another one to understand the concept of <em>linearly separable</em> problems.<br /><br />Look at the "graph" in Figure 7. Here, the x-axis corresponds to input 0 and the y-axis corresponds to input 1. The outputs are written into the graph and correspond to the truth table from Figure 5. The gray shaded area represents the region of values that produce a one as output (if we assume the inputs are valid along the real line from zero to one).<br /><br /><img src="/sites/default/files/m/d/4/1/d/8/38804_neural9.jpg" border="0" /><br /><strong>Figure 7 -- Graph of an AND Function</strong><br /><br />The key thing to note is that there is a line (the lower left slope of the gray triangle) that separates input values that yield an output of one from input values that yield an output of zero. Problems for which such a "dividing line" can be drawn (such as the AND problem), are classified as <em>linearly separable</em> problems.<br /><br />Now let's look at another Boolean operation, the exclusive-or (XOR) operation as given in Figure 8.<br /><br /><img src="/sites/default/files/m/d/4/1/d/8/38781_neural10.jpg" border="0" /><br /><strong>Figure 8 -- Truth Table for the XOR Operator</strong><br /><br />Here, the output is one only if one, but not both, of the inputs is one. The "graph" of this operator is shown in Figure 9.<br /><br /><img src="/sites/default/files/m/d/4/1/d/8/figure_9.JPG" border="0" /><br /><strong>Figure 9 -- Graph of an XOR Function</strong><br /><br />Notice that the gray region surrounding the "one" outputs is separated from the zero outputs by not one line, but two lines (the lower and upper sloping lines of the gray region). This problem is <em>not</em> linearly separable. If we try to construct a single neuron that can calculate this function, we won't succeed.<br /><br />Early researchers thought that this was a limitation of <em>all</em> computation using artificial neurons. It is only with the addition of multiple layers that it was realized that neurons that were linear in behavior could be combined to solve problems that were not linearly separable. Figure 10 shows a simple, three-neuron network that can solve the XOR problem. We're still assuming that the decision function is the McCulloch-Pitts threshold function.<br /><br /><img src="/sites/default/files/m/d/4/1/d/8/38783_neural12.jpg" border="0" /><br /><strong>Figure 10 -- Network for Calculating XOR Function</strong><br /><br />All the weights are fixed at 1.0 with the exception of the weight labeled as w=-2. For completeness, let's quickly walk through the outputs for the four different input combinations.</p>
<ul><li>If both inputs are 0, then neurons 0 and 1 both output 0 (because of their negative biases). Thus, the output of neuron 2 is also 0 due to its negative bias and zero inputs. </li>
<li>If X<sub>0</sub> is 1 and X<sub>1</sub> is 0, then neuron 0 outputs 0, neuron 1 outputs 1 (because 1.0+(-0.5)=0.5 is greater than 0) and neuron 2 then outputs a 1 also. </li>
<li>If X<sub>0</sub> is 0 and X<sub>1</sub> is 1, then neuron 0 outputs 0, neuron 1 outputs 1, and neuron 2 outputs 1. </li>
<li>Finally, if both inputs are 1, then neuron 0 outputs a 1 that becomes a -2 input to neuron 2 (because of the negative weight). Neuron 1 outputs a 1 which combines with -2 and the -0.5 bias to produce an output of 0 from neuron 2. </li>
</ul><p> </p>
<p>The takeaway from this simple example is that to solve non-linearly separable problems, multi-layer networks are needed. In addition, while the McCulloch-Pitts threshold function works fine for these easy to solve problems, a more mathematically friendly (i.e. differentiable) decision function is needed to solve most real world problems. We'll now get into the way a neural network can be <em>trained</em> (rather than being programmed or structured) to solve a particular problem.</p>
<hr /><h3>Learning Processes</h3>
<p>Let's go way back to the definition of the output of a single neuron (we've added a parameter for a particular set of data, <em>k</em>):<br /><br /><img src="/sites/default/files/m/d/4/1/d/8/38784_neural13.jpg" border="0" /></p>
<h3>Equation 1</h3>
<p>Note here that <em>x</em><em> = y</em>, the output from neuron <em>i</em>, if neuron <em>j</em> is not an input neuron. Also, <em>w</em> is the weight connecting output of neuron <em>i</em> as an input to neuron <em>j</em>.<br /><br />We want to determine how to change the values of the various weights, <em>w</em><em>(k)</em>, when the output, <em>y(k)</em>, doesn't agree with the result we expect or require from a given set of inputs, <em>x</em><em>(k)</em>. Formally , let d(k) be the desired output for a given set of inputs, <em>k</em>. Then, we can look at the error function, <em>e(k)=d(k)-y(k)</em>. We want to modify the weights to reduce the error (ideally to zero). We can look at the <em>error energy</em> as a function of the error:<br /><br /><img src="/sites/default/files/m/d/4/1/d/8/38785_neural14.jpg" border="0" /></p>
<h3>Equation 2</h3>
<p>Adjusting the weights now becomes a problem of minimizing <em>?(k)</em>. We want to look at the gradient of the error energy with respect to the various weights,<img src="/sites/default/files/m/d/4/1/d/8/38805_neural_ex1.jpg" border="0" />. Combining <strong>Equation 1</strong> and <strong>Equation 2</strong> and using the chain rule (and recalling that <em>y</em><em>(k)=?(v</em><em>(k)) and v</em><em>(k)=?w</em><em>(n)y</em><em>(n)</em> ), we can expand this derivative to something more manageable:<br /><br /><img src="/sites/default/files/m/d/4/1/d/8/38786_neural15.jpg" border="0" /></p>
<h3>Equation 3</h3>
<p>Each of the terms in <strong>Equation 3</strong> can be reduced so we get:<br /><br /><img src="/sites/default/files/m/d/4/1/d/8/38787_neural16.jpg" border="0" /></p>
<h3>Equation 4</h3>
<p>Where <em>?'()</em> signifies differentiation with respect to the argument. Adjustments to the weights can be written using the <em>delta rule</em>:<br /><img src="/sites/default/files/m/d/4/1/d/8/38788_neural17.jpg" border="0" /></p>
<h3>Equation 5</h3>
<p>Here, ? is a <em>learning-rate</em> parameter that varies from 0 to 1. It determines the rate at which weights are changed to move "up the gradient". If ? is 0, no learning will take place. We can re-write <strong>Equation 5</strong> to include what is known as the local gradient, <em>?</em><em>(k)</em>:<br /><img src="/sites/default/files/m/d/4/1/d/8/38789_neural18.jpg" border="0" /></p>
<h3>Equation 6</h3>
<p>Here, <img src="/sites/default/files/m/d/4/1/d/8/38806_neural_ex2.jpg" border="0" /><strong>Equation 6</strong> can be used directly to update the weights of a neuron in the output layer of a neural network. For neurons in hidden and inputs layers of a network, the calculations are slightly more complex. To calculate the weight changes for these neurons, we use what is known as the <em>back-propagation formula</em>. I won't go through the details of the derivation, but the formula for the local gradient reduces to:<br /><img src="/sites/default/files/m/d/4/1/d/8/38790_neural19.jpg" border="0" /></p>
<h3>Equation 7</h3>
<p>In this formula, <em>w</em><em>(k)</em> represents the weights connecting the output of neuron, <em>j</em>, to an input of neuron <em>n</em>. Once we've calculated the local gradient, ?<sub>j</sub>, for this neuron, we can use <strong>Equation 6</strong> to calculate the weig ht changes.<br /><br />To compute the weight changes for all the neurons in a network, we start with the output layer. Using <strong>Equation 6</strong> we first compute the weight changes for all the neurons in the output layer of the network. Then, using <strong>Equation 6</strong> and <strong>Equation 7</strong> we compute the weight changes for the hidden layer closest to the output layer. We use these equations again for each additional hidden layer working from outputs toward inputs and from right to left, until weight changes for all the neurons in the network have been calculated. Finally we apply the weight changes to the weights, at which point we can recompute the network output to see if we've gotten closer to the desired result.<br /><br />Network training can occur in several different ways:</p>
<ul><li>The weight changes can be accumulated over several input patterns and then applied after all input patterns have been presented to the network. </li>
<li>The weight changes can be applied to the network after each input pattern is presented. </li>
</ul><p> </p>
<p>Method 1 is most commonly used. When method 2 is used, the patterns are presented to the network in a random order. This is necessary to keep the network from possibly becoming "trapped" in some presentation-order-sensitive local energy minimum. <br /><br />Before looking at an example problem, let me wrap up this section by noting that I've only discussed one type of learning process: <em>back-propagation</em> using <em>error-correction</em> learning. Other types of learning processes include <em>memory-based</em> learning, <em>Hebbian</em> learning and <em>competitive</em> learning. Refer to the references at the end of this article for more information on these techniques.</p>
<hr /><h3>Putting Neural Nets to Work</h3>
<p>Let's take a closer look at the game I described in the introduction. <strong>Figure 11</strong> shows a screen capture of the game after several generations of training.<br /><br /><img src="/sites/default/files/m/d/4/1/d/8/38795_neural20.jpg" border="0" /><br /><strong>Figure 11 -- Ping-Pong Sample Application</strong><br /><br />The training occurs by shooting the ball from the center with a random direction (the speed is fixed). The neural network is given as input the (x,y) position of the ball as well as the direction and the <em>y</em> position of the paddle (either red or blue depending upon which paddle the ball is heading towards). The network is trained to output a <em>y</em> direction that the paddle should move.<br /><br />I created a three-layered network with three nodes in the input layer, ten nodes in the hidden layer, and one node in the output layer. The input nodes each get the same five inputs corresponding to the (x,y) position and direction of the ball and the y position of the paddle. These nodes are fully connected to the nodes in the hidden layer, which are in turn fully connected to the output node. Figure 12 shows the layout of the network with inputs and outputs. Weights, biases, and decision functions are not shown.<br /><br /><img src="/sites/default/files/m/d/4/1/d/8/38797_neural21.jpg" border="0" /><br /><strong>Figure 12 -- Neural Network for Ping Pong Game</strong><br /><br />The network learns to move the pad dle in the same <em>y</em>-direction that the ball is heading. After several thousand generations of training, the neural network learns to play perfectly (the exact number of generations varies because the network weights are initialized to random values).<br /><br />I experimented with using a paddle speed that was slower than the speed of the ball so that the networks would have to do some form of prediction. With the network from Figure 12 some learning took place but the neural nets weren't able to learn to play perfectly. Some additional features would have to be added to the network to enable it to fully learn this problem.<br /><br />In this example, the neural network is the only form of AI that the computer controlled opponent has. By varying the level of training, the computer opponent can vary from poor play to perfect play. Deciding when to stop the training is a non-trivial challenge. One easy solution would be to train the network for some number of iterations up front (say 1000) and then each time the human player wins, train the network an additional 100 iterations. Eventually this would produce a perfect computer-controlled opponent, but should also produce a progressively more challenging opponent.</p>
<hr /><h3>Non-trivial Applications of Neural Nets</h3>
<p>While the ping-pong example provides an easy to understand application of neural nets to artificial intelligence, real-world problems require a bit more thought. I'll briefly mention a few possible uses of neural nets, but realize that there isn't going to be a one-size-fits-all neural network that you can just plug into your application and solve all your problems. Good solutions to specific problems require considerable thought and experimentation with what variables or "features" to use as network input and outputs, what size and organization of network to use, and what training sets are used to train the network.<br /><br />Using a neural network for the complete AI in a game probably isn't going to work well for anything beyond the simple ping-pong example previously discussed. More likely than not, you're going to use a traditional state machine for the majority of AI decisions but you may be able to use neural nets to complement the decisions or to enhance the hard-coded state machine. An example of this might be a neural net that takes as input such things as health of the character, available ammunition, and perhaps health and ammunition of the human opponent. Then, the network could decide whether to fight or flee at which point the traditional AI would take over to do the actual movement, path-finding, etc. Over several games, the network could improve its decision making process by examining whether each decision produced a win or a loss (or maybe less globally, an increase or decrease in health and/or ammunition).<br /><br />One area that intrigues me and which has had some research devoted to it is the idea of using neural networks to perform the actual physics calculations in a simulation². I think this has promise because training a neural network is ultimately a process of finding a function that fits several sets of data. Given the challenge of creating physical controllers for physically simulated games, I think neural networks are one possibility for solutions there as well.<br /><br />The use of neural nets for pattern recognition of vario us forms is their ultimate strength. Even in the problems described, the nets would be used for recognizing patterns, whether health and ammunition, forces acting on an object, or something else, and then take an appropriate action. The strength lies in the ability of the neural nets to be trained on a set of well-known patterns and then be able to extract meaningful decisions when presented with unknown patterns. This feature of extrapolation from existing data to new data can be applied to the areas of speech recognition, handwriting recognition and face recognition mentioned in the introduction. And it can also be beneficial to "fuzzy" areas like finding trends in stock market analysis.</p>
<hr /><h3>Conclusion</h3>
<p>I've tried to keep the heavy math to a minimum, the details about sample code pretty much out of the picture, and still provide a solid overview of back-propagation neural networks. Hopefully this article has provided a simple overview of neural networks and given you some simple sample code to examine to see if neural networks might be worth investigating for decision-making in your applications. I'd recommend checking out some of the references to gain a more solid understanding of all the quirks of neural networks. I encourage you to experiment with neural networks and come up with novel ways in which they can add realism to upcoming game titles or enhance your productivity applications. I welcome feedback and I'm available to answer questions that pertain to this topic. Feel free to e-mail me at <a href="mailto:dean.p.macri@intel.com" rel="nofollow">dean.p.macri@intel.com</a>.</p>
<hr /><h3>References</h3>
¹Haykin,S., 1999. <em>Neural Networks: A Comprehensive Foundation,</em> 2<sup>nd</sup> Ed. New Jersey: Prentice Hall.<br /><br />²Grzeszczuk,R.,Terzopoulos,D., and Hinton,G, 1998. <em>NeuroAnimator: Fast Neural Network Emulation and Control of Physics-Based Models.</em> Computer Graphics (SIGGRAPH '98 Proceedings), pp.9-20.
<hr />Fri, 09 Sep 11 15:31:25 -0700Dean Macri (Intel)142201Using NURBS Surfaces in Real-time Applications
https://software.intel.com/en-us/articles/using-nurbs-surfaces-in-real-time-applications
<h3>Alternatives to Polygonal Models</h3>
<p><a target="_blank" href="/sites/default/files/m/d/4/1/d/8/40951_real_time_nurbs.pdf" rel="nofollow"><img src="/sites/default/files/m/d/4/1/d/8/18046_print_button_3d.gif" alt="Printable PDF" border="0" height="40" width="115" /></a><br /><br />Despite the widespread use of polygonal models for representing 3D geometry, the quest goes on to find suitable alternatives, particularly since the limitations of polygonal data have become glaringly obvious to current-generation developers. Because PC developers need to create content that scales across many levels of processor performance (including both host processor and 3D graphics accelerator), they're forced to either create multiple models or to use mesh reduction algorithms for dynamically producing the lower detail models. Creating multiple models clearly taxes the efforts of 3D artists, who must spend even more time modeling, manipulating, and animating models composed of large numbers of polygons. As games become more content intensive (not just in terms of the levels of detail, but more actual game content), the time required to produce the content grows considerably. Alternatives to polygonal models offer artists an acceptable means to streamline the creation process and save time along the way.<br /><br />This article deals with one of the more promising alternatives to polygonal modeling: NURBS (Non-Uniform Rational B-Spline) surfaces. First, I'll introduce you to the concepts and terminology associated with parametric curves and surfaces. Next, I'll describe in detail how to render NURBS surfaces and discuss some of the difficulties encountered when using NURBS surfaces in place of polygonal models. Finally, if I've done my job well, this article will whet your appetite for the exciting types of 3D content that can be created using parametric surfaces and inspire you to investigate developing this type of content.</p>
<h3>Parametric Curve Basics</h3>
<p>Let's start with the basics. Normal "functions" we've seen presented in algebra or calculus (or whatever mathematics course we've taken recently or not so recently) are defined as the dependent variable (often <em>y</em>) given as a function of the independent variable (usually <em>x</em>) so that we have an equation such as: <em>y = 2x</em><sup>2</sup><em> – 2x + 1</em>. By plugging in various values for <em>x</em> we can calculate corresponding values for <em>y</em>. We can create a graph of the function by plotting the corresponding <em>x</em> and <em>y</em> values on a two-dimensional grid, as shown in <strong>Figure 1</strong>.</p>
<div><img src="/sites/default/files/m/d/4/1/d/8/18028_image18.gif" alt="Figure 1" border="0" height="218" width="367" /></div>
<div><strong>Figure 1</strong></div>
<p>Parametric functions also match values of <em>x</em> with values of <em>y</em> but the difference is that both <em>x</em> and <em>y</em> are given as functions of a third variable (often represented by u) called the parameter. So we could have a set of equations expressed as follows:</p>
<blockquote><em>y = 2u</em><sup>2</sup><em> – 2u +1</em><br /><em>x = u</em></blockquote>
<p> </p>
<p>These equations produce the same curve that the "implicit" function given above produces. An additional restriction often added to parametric functions is that the functions are only defined for a given set of values of the parameter. In our simple example, <em>u</em> could be any real number but for many sets of equations, the equations will only be considered valid on a range such as <em>0</em>£ <em>u</em> £<em>1</em>.</p>
<hr /><h3>B-Spline Basis Functions</h3>
<p>Now we're going to define a powerful set of parametric functions called the b-spline basis functions (the b in b-spline stands for "basis" so this term is kind of redundant). These equations are defined for a given knot vector <strong><em>U</em></strong> = {<em>u</em><sub>0</sub><em>, u</em><sub>1</sub><em>, …, u</em><sub>n</sub>} as:</p>
<div><img src="/sites/default/files/m/d/4/1/d/8/18029_image21.gif" alt="Equation 1" border="0" height="87" width="365" /></div>
<p> </p>
<h3>Equation 1</h3>
<p>Whoa, that's scary! Let's take a close look at it to see what makes it useful. The <em>p</em> subscript in the second equation is the degree of the function (points are zero'th degree, lines are first degree, and so on). The first equation expresses that for zero'th degree curves, the function is either constant zero or constant one depending on the parameter, <em>u</em>, and where it falls in the knot vector. Looking at this pictorially for the knot vector <strong><em>U</em></strong> = {0,1,2} and <em>B</em><sub>0,0</sub>, <em>B</em><sub>1,0</sub>, and <em>B</em><sub>2,0</sub> we get the plots shown in <strong>Figure 4</strong>.</p>
<div><img src="/sites/default/files/m/d/4/1/d/8/18030_image22.gif" alt="Equation 4" border="0" height="203" width="364" /><p> </p>
<img src="/sites/default/files/m/d/4/1/d/8/18031_image23.gif" alt="Equation 4" border="0" height="206" width="384" /></div>
<p> </p>
<div><img src="/sites/default/files/m/d/4/1/d/8/18032_image24.gif" alt="Equation 4" border="0" height="203" width="381" /></div>
<p><br /><br />For degrees other than zero, we must recursively calculate the value of the function using a linear combination of the functions that are one degree less than the degree for which we're calculating. For first degree functions, we use a linear combination of the zero'th degree functions. For second degree functions, we use a linear combination of the first degree functions (which are also defined as a linear combination of the zero'th degree functions), and so on. As an example, for the knot vector <strong><em>U</em></strong> = {0,1,2, 3} we produce the plots shown in <strong>Figure 5</strong> for B<sub>0,1</sub>, B<sub>1,1</sub>, B<sub>2,1</sub>, and B<sub>3,1</sub>.</p>
<div><img src="/sites/default/files/m/d/4/1/d/8/18033_image25.gif" alt="Equation 4" border="0" height="203" width="381" /></div>
<p> </p>
<div><img src="/sites/default/files/m/d/4/1/d/8/18034_image26.gif" alt="Equation 4" border="0" height="203" width="381" /></div>
<p> </p>
<div><img src="/sites/default/files/m/d/4/1/d/8/18035_image27.gif" alt="Equation 4" border="0" height="203" width="381" /></div>
<p> </p>
<div><strong>Figure 5</strong></div>
<p>Interestingly enough, with the four control points, <strong>P</strong><sub>0</sub>, <strong>P</strong><sub>1</sub>, <strong>P</strong><sub>2</sub>, and <strong>P</strong><sub>3</sub> defined in our previous example, we can now represent the curve, <strong>C</strong> from <strong>Figure 3</strong>, as a parametric curve by the equation:<br /><br /><strong>C</strong>(u) = B<sub>0,1</sub>(u) * <strong>P</strong><sub>0</sub> + B<sub>1,1</sub>(u) * <strong>P</strong><sub>1</sub> + B<sub>2,1</sub>(u) * <strong>P</strong><sub>2</sub> + B<sub>3,1</sub>(u) * <strong>P</strong><sub>3</sub> with knot vector <strong><em>U</em></strong><strong> =</strong> {0,1,2,3}.<br /><br />This can be expressed more compactly as:</p>
<div><img src="/sites/default/files/m/d/4/1/d/8/18036_image28.gif" alt="Equation 2" border="0" height="45" width="124" /></div>
<p> </p>
<hr /><h3>Parametric Surfaces</h3>
<p>Now that we know how to describe parametric curves using a set of control points (which is what <strong>P</strong><sub>0</sub>, <strong>P</strong><sub>1</sub>, <strong>P</strong><sub>2</sub>, and <strong>P</strong><sub>3</sub> were in the previous example), we can begin to understand parametric surfaces. The control points that we're going to use for parametric surfaces will be 3-dimensional points. Let's construct an example using the points shown in <strong>Figure 6</strong>.</p>
<div><img src="/sites/default/files/m/d/4/1/d/8/18026_fig6.gif" border="0" height="149" width="280" /></div>
<div><strong>Figure 6</strong></div>
<p><br /><br />Starting with16 points labeled <strong>P</strong><sub>0,0</sub> through <strong>P</strong><sub>3,3,</sub> we want to "blend" these points together to form a surface. This process is actually quite easy. To generate a surface point that we'll call <strong>S</strong>, start with two knot vectors, <strong><em>U</em></strong> and <strong><em>V</em></strong>, to create two sets of b-spline basis functions, <em>B</em><sub>i,p</sub>(u) and <em>B</em><sub>j,q</sub>(v). Here <em>p</em> and <em>q</em> tell us the degrees of the surface (for example: linear, quadratic, cubic) in each direction. Now, we can define the function for the surface that corresponds to the function for a curve shown in <strong>Equation 2</strong></p>
<div><img src="/sites/default/files/m/d/4/1/d/8/18037_image29.gif" border="0" height="46" width="206" /></div>
<p> </p>
<h3>Equation 3</h3>
<p>Simple enough? Let's look at it in greater depth just to be sure that the process is clear. To calculate a surface point, <strong><em>S</em></strong>(u,v), we loop over all the control points (with the two summation signs in the equation) and scale each control point, <strong><em>P</em></strong><strong><em>i,j</em></strong>, by the appropriate blending functions evaluated at <em>u</em> and <em>v</em>. Keep in mind that for a surface with many control points, some of the blending functions will be equal to zero over large regions of the surface. In particular, for a surface of degree <em>n x m</em>, at most (<em>n</em>+1)*(<em>m</em>+1) blending functions will be non-zero at a given (<em>u,v</em>) parameter value.<br /><br />We can generate different surfaces by using different knot vectors and changing the degrees of the blending functions (<em>p</em> and <em>q</em>). For example, if you generate a surface that is 3<sup>rd</sup> degree in both dimensions with knot vectors <strong><em>U</em></strong> = {0,0,0,0,1,1,1,1} and <strong><em>V</em></strong> = {0,0,0,0,1,1,1,1], the result would look like the image in <strong>Figure 7</strong> if we use the control point mesh from <strong>Figure 6</strong>.</p>
<div><img src="http://software.intel.com/file/m/3709" border="0" height="268" width="358" /></div>
<div><strong>Figure 7</strong></div>
<p>If you're wondering why we chose these particular knot vectors, the reason is simple. By having the repeated knot values at the beginning and end of the vectors, the resulting surface interpolates (in other words it passes through) the corner and edge control points. In contrast, the surface drawn does <em>not</em> pass through the middle control points, although it does approach them.<br /><br />Before getting to the sample code, let's cover one more thing. The basis functions that we've described have an interesting property (actually it's by design). If you expand them for a given degree, <em>n</em>, and a fixed knot vector, you end up with a polynomial equation of the form: A<sub>0</sub> + A<sub>1</sub>u + A<sub>2</sub>u<sup>2</sup> + A<sub>3</sub>u<sup>3</sup> + … + A<sub>n</sub>u<sup>n</sup> where A<sub>I</sub> are coefficients that are determined exclusively by the knot vector and degree. Polynomials are good functions used for approximating (or, in some cases, representing exactly) other functions. However, there are some three dimensional surfaces that can't easily be approximated using polynomials as bases; specifically, the conics: spheres, cylinders, cones, and so on. To more easily and accurately represent these surfaces, you can use a ratio of polynomials. For two polynomial equations, <strong><em>F</em></strong> and <strong><em>G</em></strong>, a rational polynomial <strong><em>R</em></strong> would be defined by:</p>
<div><img src="http://software.intel.com/file/m/3710" border="0" height="41" width="48" /></div>
<p>Using the b-spline functions from <strong>Equation 1</strong>, we can define a "rational" parametric surface by adding to the control points a fourth component (the first three are <em>x</em>, <em>y</em>, and <em>z</em>) that "weights" each control point. We'll call the fourth component <em>w</em>. In this manner, the equation for the surface becomes:</p>
<div><img src="http://software.intel.com/file/m/3711" border="0" height="93" width="232" /></div>
<h3>Equation 4</h3>
<p>In case you were wondering, this is the equation for a rational b-spline surface. If the knot vector used for the basis functions is a non-uniform knot vector, then this is the equation for a non-uniform rational b-spline surface: a NURBS surface! <strong>Equation 4</strong> is the equation for a generalized parametric surface. Other common parametric surfaces are just subsets of these surfaces. Specifically, a non-rational, uniform or non-uniform, b-spline surface is one where the weights, <em>w</em><sub>i,j,</sub> are all equal to 1. This causes the division to accomplish nothing (and hence we don't have to evaluate the denominator at all). Also, you may have heard of a Bézier surface which is a non-r ational b-spline surface with a uniform knot vector that is all zeros followed by all ones. So, for a 3<sup>rd</sup> degree Bézier surface, the knot vector would be <strong><em>U</em></strong> = {0,0,0,0,1,1,1,1}.<br /><br />Rational parametric surfaces offer one more nicety that isn't available for non-rational surfaces. Any affine transformation (translation, rotation, scale, shear, <em>and</em> perspective projection) can be applied to the control points of a rational parametric surface and then the surface points generated in the transformed space will be correct. This means that if you have a small number of control points then you can transform the control points and generate a large number of surface points without having to transform all the generated surface points. Using non-rational surfaces, you would at least have to perform the projection transformation of the generated surface points.</p>
<p> </p>
<h3>Equation 2</h3>
<p>In our example, <em>n</em> = 3 and <em>p</em> = 1.<br /><br />To verify that this approach works, pick a value for <em>u</em>, say 1.5. Looking at the plots in <strong>Figure 5</strong> we can see that:</p>
<blockquote><em>B</em><sub>0,1</sub>(1.5) = 0<br /><em>B</em><sub>1,1</sub>(1.5) = 0.5<br /><em>B</em><sub>2,1</sub>(1.5) = 0.5<br /><em>B</em><sub>3,1</sub>(1.5) = 0</blockquote>
<p><br /><br />Looking at just the <em>x</em> values of the points, we get:<br /><br /></p>
<blockquote>C<sub>X</sub>(1.5) = <em>B</em><sub>0,1</sub>(1.5) * P<sub>0,X</sub> + <em>B</em><sub>1,1</sub>(1.5) * P<sub>1,X</sub> + <em>B</em><sub>2,1</sub>(1.5) * P<sub>2,X</sub> + <em>B</em><sub>3,1</sub>(1.5) * P<sub>3,X</sub><br />= 0 * 0 + 0.5*1 + 0.5*2 + 0*0<br />= 1.5</blockquote>
<p><br /><br />Looking at the <em>y</em> values of the points, we get :<br /><br /></p>
<blockquote>C<sub>Y</sub>(1.5) = <em>B</em><sub>0,1</sub>(1.5) * P<sub>0,Y</sub> + <em>B</em><sub>1,1</sub>(1.5) * P<sub>1,Y</sub> + <em>B</em><sub>2,1</sub>(1.5) * P<sub>2,Y</sub> + <em>B</em><sub>3,1</sub>(1.5) * P<sub>3,Y</sub><br />= 0 * 0 + 0.5*2 + 0.5*2 + 0*0<br />= 2</blockquote>
<p><br /><br />Therefore, <strong>C</strong>(1.5) = (1.5, 2) which is just what we expect it to be!<br /><br />We've covered a lot of ground and still haven't even looked at parametric surfaces yet. That's okay because by now you should have a decent understanding of the nature of parametric surfaces. We know that a parametric function is a set of equations that produce one or more values for a given parameter. In our examples, we produced <em>x</em> and <em>y</em> values and could easily have produce <em>z</em> values to generate points in 2-space or 3-space. I've also shown how several parametric functions can be used to "blend" points in 2-space (again, blending in 3-space would be a trivial extension of this process). We also learned what a knot vector is and how knot vectors can be used together with the b-spline basis functions to create some interesting "blending" functions.</p>
<p> </p>
<hr /><h3>Implementing a NURBS surface renderer</h3>
<p>At this point, we can take <strong>Equation 4</strong> and write some code to do a straight forward implementation of this. This would not be too difficult, but there are some optimizations that we can make first so that our implementation will perform better and after all, it's real-time performance that we want. First, let's discuss "tessellation". Tessellation is the process of taking the continuous, mathematical equation of a surface and approximating it with polygons (we'll use triangles). This process can be accomplished in a number of ways with the potentia l for vastly different visual results.<br /><br />For simplicity, we're going to use what's called uniform tessellation. Uniform tessellation means we step equally between the minimum and maximum values for the parameters over which the surface is valid. For example, assume that the surface is valid for the ranges <em>u</em>Î[0,3] and v Î[2,3]. What we can do is divide these into some number of subdivisions and then just loop over these values calculating surface points that will be used as vertices of triangles. If we decide to use 20 subdivisions, we would calculate <strong><em>S</em></strong>(u,v) at <em>u</em>=0, <em>u</em>=0.15, <em>u</em>=0.30, …, <em>u</em>=3 for each <em>v</em>=2<em>, v</em>=2.05<em>, v</em>=2.10, <em>v</em>=2.15, …, <em>v</em>=3.<br /><br />So, we'd end up generating 441 points (21 times 21 because we include the end points) that we could then connect into triangles and render using a 3D API, such as OpenGL* or Direct3D*. To speed up the calculation of <strong><em>S</em></strong>(u,v), we can calculate <em>B</em><sub>i,p</sub>(u) and <em>B</em><sub>j,q</sub>(v) at the subdivision points and store these in an array. This calculation can be performed once, so that it will not have to be performed in the inner loop of calculating surface points. Instead, a lookup of the pre-computed values and a multiplication is the only task that would be required. If at some point we change the number of subdivisions we want, we can just recalculate the stored arrays of basis functions evaluated at the new subdivisions.</p>
<h3>What About Surface Normals?</h3>
<p>So now that we have a general idea of a way to tessellate a NURBS surface (or any other parametric surface, for that matter), what else do we need? For one, we need a way to generate surface normals so that we can let the 3D API (Direct3D* in the sample code) do lighting calculations for us. How do we generate these? Well, remember those Calculus classes that we all loved? One of the things we learned is that the derivative of a function is the instantaneous slope of the line tangent to the function at the point where the derivative and function are evaluated. By creating two tangent lines (one in the <em>u</em> and one in the <em>v</em> parameter) we can take a cross product and wind up with a surface normal. Simple enough, you say, but what's the derivative of the function <strong><em>S</em></strong>(u,v)?<br /><br />Well, there are two partial derivatives: one with respect to <em>u</em> and one with respect to <em>v,</em> and they're ugly! Using the chain-rule:</p>
<div><img src="/sites/default/files/m/d/4/1/d/8/18041_image33.gif" border="0" height="158" width="532" /><br /><br /><br /><br /><img src="/sites/default/files/m/d/4/1/d/8/18042_image34.gif" border="0" height="157" width="518" /></div>
<div><br /><br /><h3>Equation 5</h3>
<p>And, not only is that ugly, we don't really know how to take the derivatives of <em>B</em><sub>i,p</sub><em>(u)</em> and <em>B</em><sub>j,q</sub><em>(v)</em>. It's possible to take a derivative of <em>B</em><sub>i,p</sub><em>(u)</em> (and <em>B</em><sub>j,q</sub><em>(v) )</em> from it's definition, but there's an easier way. It's possible to come up with a set of equations for calculating the coefficients of the polynomial equation that <em>B</em><sub>i,p</sub><em>(u)</em> is equivalent to. Then, taking the derivative of <em>B</em><sub>I,p</sub><em>(u)</em> is as simple as multiplying powers by coefficients and reducing the powers by one (if you recall d(Ax<sup>n</sup> + Bx<sup>m</sup>)/dx = nAx<sup>n-1</sup>+mBx<sup>m-1</sup>). You still have to use <strong>Equation 5</strong> to compute the derivatives of <strong><em>S</em></strong>(u,v) but it's really not that bad – you're going to be performing the computation of some of the terms any way, and the ones with the derivatives are calculated the same way as the non-derivative terms. We need to be able to calculate the coefficients of the b-spline basis functions when they're represented as follows:</p>
</div>
<div><img src="/sites/default/files/m/d/4/1/d/8/18043_image35.gif" border="0" height="26" width="406" /></div>
<p>Using a lot of paper and a bit of head scratching, I derived the following formulas to compute the coefficients, <em>C</em><sub>i,p,k</sub><em>(u)</em>.</p>
<div><img src="/sites/default/files/m/d/4/1/d/8/18044_image36.gif" border="0" height="178" width="548" /></div>
<p> </p>
<h3>Equation 6</h3>
<p>This seems complex, but unless the knot vector changes, you don't have to re-compute these coefficients after the first time. Also note that Ci,p,k is only dependent on u for the knot span that u is in not on u itself, so we can just evaluate the Ci,p,k for each knot span and store those values. Now we can write the derivative of Bi,p(u) as:</p>
<div><img src="/sites/default/files/m/d/4/1/d/8/18045_image37.gif" border="0" /></div>
<p> </p>
<hr /><h3>Sample Code</h3>
<p>At this point we know what we need to know to talk about the sample code you can download and how to implement this fun stuff. First, everything in the sample code is written in C++ and spread across many files of which mainly two are specific to this article: <strong>DRGNURBSSurface.h</strong> and <strong>DRGNURBSSurface.cpp</strong>. Actually, you'll also dive into <strong>NURBSSample.cpp</strong> if you want to play with the surface control points and knot vectors. <strong>DRGNURBSSurface.h</strong> contains a class definition for a class called <em>CDRGNURBSSurface</em> (for the curious, C is for "class", DRG is for "Developer Relations Group" which is what the group I'm in at Intel used to be called). The methods of this class of interest to us are <em>Init(), ComputeBasisCoefficients(),</em><em>ComputeCoefficient(), SetTessellations(), EvaluateBasisFunctions(), TessellateSurface(),</em> and <em>TessellateSurfaceSSE().</em><br /><br />The sample requires the Microsoft DirectX 7 SDK to build or run. If your system meets this requirement, <a href="/protected-download/267266/142397" rel="nofollow">download the sample code</a> (ZIP, 122KB).<br /><br />Going through these in order, <em>Init()</em> is called to initialize a newly created <em>CDRGNURBSSurface</em> object. The function takes a pointer to a <em>CDRGWrapper</em> class that is part of the framework we wrote for getting at the Direct3D* API. <em>Init()</em> also takes two surface degrees, <em>u</em> and <em>v</em>, and the number of control points in the <em>u</em> and <em>v</em> directions. It takes an array of <em>Point4D</em> structures that contain the weighted control points (<em>x</em>, <em>y</em>, <em>z</em>, and <em>w</em>) stored in <em>u</em>-major order (this means that <em>v</em> values are consecutive in the array). It takes two float arrays that contain the <em>u</em> knots and the <em>v</em> knots. Finally, it takes two optional values that specify the number of tessellations in the <em>u</em> and <em>v</em> directions of the surface. <em>Init()</em> does some calculations to determine how many knots are in the knot vectors and then allocates memory to store some of the information needed to render the surface. Finally, <em>Init()</em> makes a local copy of the incoming data (control points and knots) and then calls C<em>omputeBasisCoefficients()</em>.<br /><br /><em>ComputeBasisCoefficients()</em> calls <em>ComputeBasisCoefficient()</em> which uses the formulas from <strong>Equation 6</strong> to compute the coefficients of the polynomials formed from the knot vectors and the degrees of the surface. <em>ComputeBasisCoefficient()</em> calls itself recursively due to the definitions in <strong>Equation 6</strong>. The coefficients are stored in arrays to be used by <em>EvaluateBasisFunctions()</em>. Because the <em>C</em><sub>i,p,k</sub><em>(u)</em> are only dependent on the knot span that <em>u</em> belongs in, <em>ComputeBasisCoefficient()</em> takes as an argument this knot span (referred to as an "interval" in the code) rather than the actual value of <em>u</em>.<br /><br />After <em>Init()</em> has called <em>ComputeBasisCoefficients()</em> to do the one-time calculation of the polynomial coefficients, <em>SetTessellations()</em> is called to set the number of <em>u</em> and <em>v</em> tessellations that will be used for rendering the surface. <em>SetTessellations()</em> can be called at any time after initialization to change the fineness of tessellation of the surface. The sample application calls <em>SetTessellations()</em> whenever the plus key (+) or minus key (-) is pressed to increase or decrease the tessellation of the surface. <em>SetTessellations()</em> allocates memory that's dependent on the number of tessellations used for rendering the surface, sets up some triangle indices for rendering the surface, and then calls <em>EvaluateBasisFunctions()</em>.<br /><br /><em>EvaluateBasisFunctions()</em> uses the coefficients computed in <em>ComputeBasisCoefficients()</em> and a technique called "Horner's method" to evaluate the polynomials that are the expanded form of the basis functions. Horner's method says that f = a<sub>n</sub>x<sup>n</sup>+a<sub>n-1</sub>x<sup>n-1</sup>+…+a<sub> 1</sub>x + a<sub>0</sub> can be evaluated using <em>n</em> multiplications and <em>n</em> additions by rewriting as f = a<sub>0</sub>+x*(a<sub>1</sub>+x*(a<sub>2</sub>+…x*(a<sub> n-1</sub>+x*a<sub>n</sub>)…)) . If you think you'll be calling <em>EvaluateBasisFunctions()</em> often because your tessellations will be changing, then other optimizations could be made here (e.g. using a technique called "forward differences" to eliminate the multiplications in the inner loop). Additionally, this method could be optimized using the Streaming SIMD Extensions of the Intel® Pentium® III processor.<br /><br />At this point, everything is initialized for tessellating a NURBS surface. Now, at each frame that the sample application renders, the <em>Render()</em> method of the CDRGNURBSSurface object is called and in turns calls <em>TessellateSurface()</em> or <em>TessellateSurfaceSSE()</em> depending on whether or not we've told the object to use the Streaming SIMD Extensions of an Intel® Pentium® III processor.<br /><br /><em>TessellateSurface()</em> (or <em>TessellateSurfaceSSE()</em>) uses <strong>Equation 4</strong> and <strong>Equation 5</strong> to compute the surface points and derivatives at the tessellation steps. A cross-product of the derivatives is used to compute the normal to the surface. We don't check for degenerate normals (see the pitfalls section below) so you'll need to modify these routines if degenerate normals become an issue. During the tessellation, a row of triangle vertices is generated. We alternate between putting the vertices in the odd or the even indices of the vertices buffer. Starting with the second row of generated vertices, we call Direct3D* to render a triangle strip using the strip indices generated in <em>SetTessellations()</em>. We alternate between the sets of indices as well due to the winding order of the triangle strip.</p>
<hr /><h3>Real-Time Optimizations</h3>
<p>We already talked about some optimizations that can be done to evaluate NURBS surfaces more quickly. The first, which is used by the sample code, is to use uniform tessellation and pre-evaluate the basis functions and their derivatives at the tessellation points. We also mentioned the possibility of transforming surface control points into projected space and doing our surface tessellation in that space. While this works, lighting can be difficult (or impossible) if you use anything other than directional lights because distance is not preserved in perspective projected space. If you're using light maps in your engine I would highly recommend transforming control points and generating vertices in projected space. You can modify <em>TessellateSurface()</em> to do the divide by homogeneous <em>w</em> and viewport scaling to generate vertices in screen space.<br /><br />To keep memory requirements minimal, we render the surface by generating two rows of surface points and then passing a triangle strip to the API (Direct3D* in our case). If a surface didn't need to be re-tessellated at every frame, then we could generate all the surface points and store these in an array. Depending on the application, it may still be quicker to tessellate the surface at every frame rather than having to fetch the generated vertices from memory (with corresponding cache misses). You'll need to experiment with your particular application to see what works best.<br /><br />Aside from the algorithmic optimizations just discussed, we can achieve better performance by using the new Streaming SIMD Extensions supported by the Intel® Pentium® III processor. These extensions allow us to do mathematical operations on four floating point values at one time (for more information on the Streaming SIMD Extensions of the Intel® Pentium® III processor, visit <a target="_blank" href="http://developer.intel.com/design/archives/processors/pentiumiii/index.htm">http://developer.intel.com/design/archives/processors/pentiumiii/index.htm</a>). Since for NURBS surfaces we're dealing with four coordinates (<em>x</em>, <em>y</em>, <em>z</em>, and <em>w</em>) we can do the same operations to all four at once. <em>TessellateSurfaceSSE()</em> uses intrinsic functions provided by the Intel C/C++ Compiler version 4.0 to evaluate all four coordinates of a NURBS surface point at once.<br /><br />Other optimizations are possible depending on the quality vs. speed tradeoffs acceptable by a particular application. For example, one could choose to generate normals only every other surface point (or less frequently) and then linearly interpolate normals in between.</p>
<hr /><h3>More notes on the sample code</h3>
<p>I should mention a few last things about the sample code contained in the download. The sample requires the Microsoft DirectX 7 SDK to build or run and was written using C++ and built using Microsoft Visual C++ 6.0. If your system meets these requirements, and if you have the Intel C/C++ compiler version 4.0 included with version 4 of the Intel VTune product, <a href="/protected-download/267266/142397" rel="nofollow">download the sample code</a> (ZIP, 122KB).<br /><br />If you don't have the Intel C/C++ compiler version 4.0 included with version 4 of the Intel VTune product, you'll need to change a line in DRGNURBSSurface.h. The line reads "#define SUPPORT_PENTIUM_III 1" and should be changed to "#define SUPPORT_PENTIUM_III 0". You can then rebuild everything using the Microsoft compiler (or other C++ compiler) and get to see the code working. You won't be able to enable the tessellation routine that uses the Streaming SIMD Extensions of the Intel® Pentium® III processor, though.<br /><br />While running the application, pressing 'H' will bring up a help screen of available keys. Most are self explanatory. One worth mentioning is the 'M' key that causes the display to switch between two different "Objects". The objects are either:</p>
<ul><li>A single NURBS surface with 100 control points</li>
<li>Nine NURBS surfaces with 16 control points each</li>
</ul><p> </p>
<p>You'll notice when viewing the nine surfaces that there are hard creases between the surfaces. This doesn't happen with the single surface. When changing the tessellation level, for the single NURBS surface, there are actually 9 times as many points generated as what the number indicates. This is done to keep a somewhat consistent look between the shapes of the two different "Objects".</p>
<h3>Additional Details and Potential Pitfalls</h3>
<p>I've discussed the math behind parametric surfaces and the basics of rendering them and hopefully made them seem appealing as an alternative to polygonal models. What I haven't addressed are some of the problems that are unique to parametric surfaces and some of the trickier aspects of using parametric surfaces in place of polygonal models.<br /><br />Some of the more common issues with parametric surfaces are:</p>
<ul><li><strong>Texture Mapping</strong> – A simple approach to texture mapping a parametric surface is to use the <em>u</em> and <em>v</em> parameter values as texture coordinates (scaled appropriately to the 0 to 1 range). This works fine in some cases (and is what the sample code does), but there may be cases that this won't work for (if the knot vector is very non-uniform, then the texture will be stretched and squashed). To fix this problem, a second parametric surface can be used to generate texture coordinates. This increases overhead substantially, but may be the only solution (and it provides the most flexibility). Many rendering packages allow artists to apply textures to a parametric surface by using a second surface to map the texture coordinates. Keep this in mind as you use parametric surfaces for your applications.</li>
<li><strong>Cracking</strong> – When two parametric surfaces meet at an edge (or one parametric surface meets a polygonal surface) it's possible for a crack to appear between the surfaces if their degrees of tessellation differ (or it they're just different sizes). This problem can be solved on a per application basis by adding connectivity information to the surfaces. It's not trivial to fix, but it's not impossible.</li>
<li><strong>Collision Detection</strong>– If you're doing collision detection in your application, you have several choices with parametric surfaces:
<ul><li>Do collision detection on the mesh of control points by treating the mesh as a polygonal mesh – this is approximate and may be too course in some instances.</li>
<li>Store all the generated triangles and do collision detection on these – while more accurate, it's more memory intensive as well as computationally intensive</li>
<li>Depending on what types of objects may be colliding, you can solve the parametric surface equations with equations representing the other objects (even lines are difficult, though) and then just plug-and-chug to find collision points</li>
<li>Use a combination of (a) and (b) by starting with (a) and then refining the surface to triangles to determine an exact hit.</li>
</ul></li>
<li><strong>Clipping</strong> – For surfaces that are partially within the viewing frustum, it can be difficult to clip prior to generating triangles. The problem is that you can't just clip control points because doing so would make the tessellation of the surface difficult to impossible. The ea siest solution is to just generate triangles and then clip the triangles – the downside to this is the possibility of generating many more triangles than needed.</li>
<li><strong>Back-surface Culling</strong> – Aside from clipping, it is also difficult to easily cull back-facing surfaces or portions of surfaces for similar reasons to the clipping problem. For example, a sphere can be defined with one surface but only half of the sphere is ever visible at one time. It would be nice to be able to cull the back-facing portion of the sphere before tessellation, but this is difficult to do.</li>
<li><strong>Tessellation</strong> – Although a uniform tessellation algorithm is easy to implement and can run fast, in some instances other algorithms may provide better performance/quality. Surfaces that have very curvy areas as well as very flat areas may be better tessellated with a non-uniform tessellation algorithm.</li>
<li><strong>Non-local refinement not supported</strong> – When refining a surface (i.e. adding detail), you must add control points in complete rows and columns so the control mesh remains a regular grid of points. This causes excessive control points to be added just to add detail in a small, localized region of a surface. Note that this is not an implementation issue, but rather an issue with NURBS surfaces (and other parametric surfaces).</li>
<li><strong>Degenerate Normals</strong> – Because it's possible to have control points that are at the same location, it's possible for the derivatives of the surface to vanish (i.e. go to zero). This causes the calculation of surface normals to fail. To solve this, it is necessary to look at surrounding points and derivatives if one of the tangents gets too close to zero.</li>
</ul><p> </p>
<hr /><h3>Conclusion</h3>
<p><a target="_blank" href="/sites/default/files/m/d/4/1/d/8/40951_real_time_nurbs.pdf" rel="nofollow"><img src="/sites/default/files/m/d/4/1/d/8/18046_print_button_3d.gif" alt="Printable PDF" border="0" height="40" width="115" /></a><br /><br />We've covered a lot of information in this article. We've been introduced to parametric curves and surfaces and should have a decent understanding of the concepts behind them. We learned what's involved in rendering parametric surfaces and can see how the data requirements are smaller than the polygonal models that can be generated. And we should now have an idea how to implement some of the creative types of 3D content we talked about in the introduction.<br /><br />Given that the field of study of parametric surfaces is enormous we've only lightly touched the surface (no pun intended) of what's possible. Experimenting with parametric surfaces is exciting. I encourage you to check out the sample code and get a feel for how you can incorporate NURBS surface rendering into your 3D engine today.</p>
<h3>References and Further Reading</h3>
<p>Piegl, Les and Tiller, Wayne. <em>The NURBS Book, 2nd Edition</em>, Berlin, Germany: Springer-Verlag, 1996.<br />Foley, j., van Dam, A., Feiner, S., and Hughes, J. <em>Computer Graphics: Principles and Practice,</em> Reading, MA: Addison-Wesley, 1990.</p>
<h3>About the Author</h3>
<p>Dean Macri is a Senior Technical Marketing Engineer w ith Intel's Developer Relations Division. He is currently researching real-time physics with an emphasis on cloth simulation. He welcomes e-mail regarding NURBS and other parametric surfaces, or anything mathematical and related to real-time 3D graphics. He can be reached at <a href="mailto:dean.p.macri@intel.com" rel="nofollow">dean.p.macri@intel.com</a>.</p>
<hr />Fri, 09 Sep 11 14:28:51 -0700Dean Macri (Intel)142397