by Geoff Koch
Intel® Threading Tools help Autodesk optimize its Maya* digital software. Find out more in this case study.
Times are good for Toronto-based Autodesk. Maya* software, the company’s flagship product, dominates the digital content creation (DCC) market for 3D modeling, animation, effects and rendering. Maya-wielding artists create stunning images across industries, from digital publishing and design visualization to broadcasting and game development.
And don't forget feature films. Despite its understated Canadian roots, Autodesk tools have helped build visual effects in some of the most ostentatious Hollywood blockbusters in recent memory, including the "Lord of the Rings" trilogy and the "Star Wars*" prequels.
But for Autodesk and other DCC companies, the digital media bar only gets higher. Photorealistic images and the suspension of disbelief are becoming hard and fast expectations for filmgoers and gamers. Building and rendering these ever more detailed images remains one of the ultimate cycle-suckers in all of computing.
"Leading studios rely on Maya software to deliver state-of-the-art results every day, and productivity is a huge factor for them," said Autodesk engineer, Martin Watt. "Whether it's something that improves the interactive drawing rate or something that reduces the time taken to render thousands of frames, anything we can possibly provide is key."
Image courtesy Duncan Brinsmead, Autodesk
The Autodesk-Intel engineering team worked to thread Maya Fluid Effects. The compute-intensive tool includes a true 3D solver and a highly interactive 2D solver that allows 2D fluids motion to be simulated in near real-time. Above, a Maya-created atmospherics example is pictured. Atmospherics – including smoke from a mushroom cloud – are one of the fluid types that can be simulated.
The problem is that CPU frequencies aren't increasing like they used to. As chip feature-sizes have shrunk and transistor counts have soared, heat dissipation has become a devilishly difficult problem.
To compensate, microprocessor vendors are increasing chip-level parallelism, mostly by implementing multi-core architectures. Instead of scaling up and offering ever faster (and hotter) CPUs, these companies are scaling out and dividing computing tasks among multiple processing cores.
The good news is that this aggressive move to multi-core architectures will continue delivering what most have come to expect from the chip industry – the production of increasingly powerful hardware platforms well into the future. The challenge, however, is that taking advantage of this new power will require software companies to increase concurrency in their code.
Aware of these changes in the chip industry, in spring 2005 Autodesk decided to take a stab at parallelizing, or threading, with Maya Fluid Effect*. The target was the core of Maya Fluid Effects – a fluids solver that cranks numbers through the Navier-Stokes equations, well known to the physics crowd for describing fluid flow.
Autodesk was collaborating with Intel engineers on the threading project as part of an Intel effort to help customers improve parallel content of their software. Yet despite the onsite presence of Intel threading pros, Watt was wary.
"Debugging threaded applications is a notoriously difficult problem," Watt said. "In the past, we have spent a large amount of time making parts of our code threadsafe."
The fluids solver, which performs a set of uniform transformations over a regular array of data, turned out to be especially amenable to threading. Because Maya software has long run on multi-processor workstations, the algorithm existed so that multiple processors could work in parallel on a section of the array without interfering with each other.
Much of the success threading Maya software flowed from the use of Intel® Threading Tools. For example, Intel® Thread Checker was used to quickly track down several subtle threading problems. With a project deadline looming, the alternative of manually debugging the large and complex Maya application was unappealing and perhaps impossible.
Intel® Thread Profiler helped analyze bottlenecks and solve a particularly embarrassing problem that cropped up early in the project. One part of the threaded fluids solver was actually running slower than the original unthreaded code.
"Intel Thread Profiler quickly pinpointed the problem areas and showed us the reasons for the slowdown, so we were able to restructure the code for better threading performance," Watt said.
The team pursued additional parallelism by inserting several OpenMP* directives into the fluids solver’s code. These parallelizing directives look like comments to non-OpenMP compilers, which simply ignore the directives while compiling. In contrast, OpenMP compilers, such as the Intel® C++ Compilers, find the directives and implement parallelism.
"OpenMP is useful for prototyping work when threading algorithms, since it is so easy to apply to existing code," Watt said. "It is also easy to enable and disable, thus providing a fast way to compare the results with or without threading enabled."
Image courtesy Martin Watt, Autodesk
Martin Watt, Autodesk engineer, was part of the Autodesk-Intel engineering team. "In the past, we have spent a large amount of time making parts of our code threadsafe," Watt said. "For the threading work in Maya 7, we were able to make use of Intel Thread Checker and Intel Thread Profiler to analyze our code for thread-safety and measure the resulting performance."
The results of the team's prototyping work were striking. According to Watt, the threaded fluids solver showed more than an 85 percent speed-up when running on a dual Intel® Xeon® processor-based workstation. Further, there was more than a 115 percent speed-up when running on a dual Intel Xeon processor-based workstation with Hyper-Threading Technology.
The threading investment will pay similar dividends on multi-core platforms, which are expected to proliferate in the years ahead. Intel, for example, forecasts that a significant percent of its server processors and its mobile and desktop Pentium® processor family shipments will be multi-core-based by the end of 2006.
Additional tools used by the Autodesk-Intel engineering team included the Intel® VTune™ Performance Analyzer to determine where the fluids solver was spending the most CPU clock ticks. The code was then studied to see how much of it might be amenable to threading. This data, in turn, allowed the team to apply Amdahl's Law to estimate the possible threading-related performance gains.
Amdahl's Law quantitatively measures a commonsense coding truth – the less that a program is locked into linear and sequential segments, the greater the opportunity to boost performance through parallelism.
"Why do I want to apply Amdahl's Law? To judge how much more performance gain might be lying on the table," said Robert Reed, one of the Intel engineers on the project.
The threaded and performance-enhanced code was finished in time for inclusion in the Maya 7 release. Though just one anecdote of a quick and straightforward threading success, the Intel-Autodesk example bodes well for end users who are about to be deluged by multi-core platforms – from handsets and laptops to workstations and servers.
To reap continued performance gains on these platforms, users will need access to an expanding portfolio of threaded programs. One such Maya software user is Computer Graphics Supervisor Rob van den Bragt at The Mill, a London-based Oscar*-winning effects company.
"Due to the complexity of fluid simulations, we need all the power we can get," said van den Bragt. "The additional performance improvements made by threading the fluids solver in Maya 7 are therefore welcomed with open arms."
The Intel collaboration made a lasting mark on Watt and his Autodesk engineering colleagues who are tasked with making sure performance-hungry customers like van den Bragt stay happy.
"We can see that the future clearly belongs to machines with an increasing number of cores, and threading will be required to take advantage of all that power," said Watt. "We want to optimize our code to ensure that we are able to take full advantage of as many threads as the hardware can support."
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804