Graduate Intern at Intel - Parallel Ray-Tracing

Ray-tracing is a classic example of an embarrassingly parallel algorithm; since each pixel is typically independent of the rest, theoretically every pixel can be done in parallel (given enough cores). However, practically speaking the algorithm is usually parallelized on significantly fewer cores and the work is shared and distributed among them because the amount of time to render can differ greatly between pixels.

This project started as a three-week assignment for a graphics class at BYU. However, the version I made for that class was rather feature deficient. I later enhanced the project with additional features (depth-of-field, median-split volume hierarchy, area lights from shapes, etc.) with the intent of using it for my masters’ thesis. I wanted to continue to enhance the project to a full-fledged path-tracer and parallelize it with multiple CPUs, GPUs, and a distributed network. However, I changed my thesis topic, so I decided to use this project for my internship at Intel instead (no sense in wasting all that work).

The project was extremely easy to parallelize both on Windows and on Linux, since as I mentioned the computation of each pixel is completely independent of the others. I started by adding a parallelized for-loop (one version used OpenMP and another version used CILK Plus) to the outermost for-loop, and voilá! I did have to make sure a few variables were thread-private, and I also had to use dynamic scheduling (because not all pixels rendered at the same rate), but those were trivial additions.

My first iteration of this project, back in school, used a simple ColorImage class to store the computed values of the pixels. However, I wanted to visualize the image as it was rendered, so before I started parallelizing it I again turned to OpenCV (as I did with the Mandelbrot demo I wrote about in an earlier post). This slowed down the runtime a bit, but it made for a more visually appealing product. Unfortunately I worked on this project before I was successfully able to run OpenCV on the RedHat MTL machine, so I ended up removing all the OpenCV code and using the ColorImage instead. As a replacement I added a poor-man’s progress bar for the command line-I displayed the percent complete in text (e.g., “Rendering: 55%”). Adding the “progress bar” required the addition of a critical section and an atomic increment of the counter variable, but since the computation times of the various pixels was so disparate, the critical section and atomic increment didn’t slow down the runtime noticeably. Since CILK does not provide atomics and critical sections, I used TBB for the desired effect (OpenMP has those constructs built in). Though TBB and CILK were designed to work together in theory, actually doing so in practice required a little trial and error to find the semantics that worked.

As with other projects, I split out the portions of code that were common to all three versions of the project (serial, OpenMP, and CILK) that each ran on Windows and on Linux, leaving behind only a single function for each version that contained the parallelized for loop. Despite all the features I added, there are quite a few other additions I would like to make to this project, time permitting: 1) Put the OpenCV code back in; 2) Add command-line arguments for input file, output file, toggle acceleration structure, toggle anti-aliasing, toggle soft shadows, toggle translucency and glossiness, number of threads (OpenMP only), recursive ray depth, samples per pixel, image width and height; 3) Fix code for shapes as area lights (doesn’t currently work correctly); 4) Add path-tracing capabilities; 5) Get more RIB files.

Despite the simplicity of converting the serial project to a parallel one, this demo was perhaps the hardest one of the three I worked on while interning at Intel-it had the largest codebase, the most different pieces working together, and all the many pieces were often very tightly coupled. Basically, there were a lot of little things that could go wrong that wouldn’t necessarily manifest themselves in every image (especially if you didn’t know what the image was supposed to look like to begin with). But in the end, I got enough of the project working to see some very nice images. And since the parallelization was so simple, the speed-up was nearly linear both in OpenMP and in CILK Plus.

For more complete information about compiler optimizations, see our Optimization Notice.