A Performance Optimization Study for the DreamWorks Animation Fluid Solver

by John J. O'Neill, Charles Congdon, Ram Ramanujam
Intel Corporation

Silviu Borac, Ron Henderson
DreamWorks Animation



We present a performance optimization study for an incompressible fluid simulation code designed for animated special effects. The solver uses a marker-and-cell (MAC) method to simulate the motion of liquids and gases. The main challenges in the optimization were identifying efficient data structures for managing the particle data, and rewriting the iterative pressure projection step so as to introduce threading without compromising the convergence rate. On average we achieved an improvement of over 300% in the performance of the serial implementation and parallel effciency of 65% on an 8-core desktop by threading portions of the application.


Fluid simulation is a widely used technique in special effects to create smoke, fire, splashes and similar visual elements. Figure 1 shows a typical example from DreamWorks* Animation's Megamind where a liquid surface is animated using a fluid simulation. While there are a variety of methods used to simulate fluid motion, most of them involve algorithms that are computationally intensive and run sequentially over a number of frames to produce an animated result. For this reason the performance of a given algorithm, in addition to its visual characteristics, is a critical factor in how useful the algorithm may be as an artistic tool.

In this report we discuss a performance optimization study for a fluid simulation package developed at DreamWorks* Animation and based on techniques now commonly used in computer graphics [1, 2, 3]. Our report includes a description of the algorithm, an analysis of the serial performance, and a discussion of modifications required to achieve good scalability for threading.

Figure 1. In DreamWorks* Animation's Megamind we see the character Roxanne Ritchie suspended over a tank of aligators. The dynamic fluid surface was animated using the solver discussed in this report.


Download the PDF

To read the rest of this white paper, download  A Performance Optimization Study for the DreamWorks Animation Fluid Solver [PDF 838KB]


Related Articles and Videos

Article:  Optimizing Without Breaking A Sweat: Optimizing DreamWorks Animation applications
Article:  Rethinking the Pipeline: DreamWorks Animation Advances the Art

Video: Marty Johnson discusses Shrek's Law and how Intel worked with DreamWorks Animation to optimize their rendering process
Video: The Technology Behind DreamWorks' Monsters vs Aliens


For more complete information about compiler optimizations, see our Optimization Notice.


Jerry Baugh (Intel)'s picture

Excellent article! Must be fascinating to work with DreamWorks Animation on this.

Charles Congdon (Intel)'s picture

@Steve - Sorry about the confusion. You are right that 'red,' 'green,' and 'blue' are confusing, and that they are simply a way of annotating subsets of the overall data set, and are not really referring to colors. Choosing something like 'alpha,' 'beta,' and 'gamma' like you suggest might have been a better choice (although made for a more challenging illustration).

As for the number of planes, there is a good reason to choose as few as possible, and three in particular. All threads at once work on only one set of slices, say ‘red,’ before moving on to the next set of slices (Algorithm 3). Increasing the number of sets of slices means that there is less work to do in each set, and so even less work for each thread (which of course drops even more as the number of threads increases). At some point it just becomes all overhead for a given problem size.

The number three comes from the fact that a thread working on a cell in a ‘red’ slice is *updating* the residual values in the adjacent cells (see Figure 5 and Algorithm 2). So, to keep updates by one thread from causing a race condition for an adjoining thread, they must thus be working at least three columns apart. The only reason to make them work further apart (and thus use more sets of slices) is if work on a given cell effects, or is effected by, more than its immediate neighbors.

I hope this makes sense.

anonymous's picture

Excellent paper, and excellent analysis techniques. I need to go back to my Physics texts and study the fluid dynamics a bit to fully understand it, but all-in-all, it's very interesting.

I have one question, though. In the multi-threading section at the end, you partition the particle array in two-dimensional planes, and then refer to them as the 'red', 'green', and 'blue' planes. This is somewhat confusing to me, inasmuch as we are dealing with a highly visual end-product, which consists of RGBA pixels, in the final analysis.

I gather that you don't mean to imply that you are somehow altering the color of any of the pixels (the computation is purely positional, and not chromatic), but rather simply annotating the three computational planes by assigning them one of the specified colors. If so, it might have been more appropriate in this case, to avoid confusion, to annotage them with something a bit more generic, such as the 'alpha', 'beta' and 'gamma' planes.

Also, I assume that there is no inherent reason why you chose three planes -- could you not choose eight or sixteen.. or any finite number less than the number of physical cores you have available for processing?

Add a Comment

Have a technical question? Visit our forums. Have site or software product issues? Contact support.