Rendering: AOBench

 

Ambient Occlusion is an algorithm that approximates the reflection of light off non-reflective surfaces. Since calculating true light reflection is incredibly expensive and impractical given today's hardware, algorithms like ambient occlusion are used to get convincingly close. Ambient occlusion finds intersections with objects in the scene and a ray from the origin to each pixel on the screen. If there is an intersection (a "hit"), it searches for intersections again, but using the "hit" as an origin. Depending on how many intersections it finds, it will be lighter or darker, to imitate shadows, which is the goal of ambient occlusion. Intel® Cilk™ Plus cilk_for is used to render multiple horizontal lines in parallel, while Intel Cilk Plus Array Notation is used to speed up the search for intersections with the "hit" as an origin. In the scalar implementation, the auto-vectorizer does a somewhat poor job of vectorizing the ambient occlusion calculation (intersections with the "hit" as an origin), which can be fixed by adding a single Intel Cilk Plus SIMD Notation line.

Partially based on code from Syoyo Fujita

 

Code Change Highlights:

cilk_for is used to call each row of pixels in parallel, while pragma simd is used to find intersections inside the ambient occlusion function.
  • cilk_for
  • linear version:
    for (int y = 0; y<height; ++y) { render_x(width, height, y, ambient_occlusion_scalar, fimg); }
    cilk_for version:
    cilk_for (int y = 0; y<height; ++y) { render_x(width, height, y, ambient_occlusion_scalar, fimg); }
  • pragma simd
  • The autovectorizer does an OK job of vectorizing the inner loop of ambient_occlusion_scalar, but because there are so many different kinds of objects, it will only use a vector length of 2 when 4 floats will fit. pragma simd provides the programmer with the opportunity to enhance the compiler's knowledge by giving more details, such as the vectorlength.
    scalar version:
    int ntheta = c_num_ao_samples; int nphi = c_num_ao_samples; float occlusion = 0.0f; for (int j = 0; j<ntheta; ++j) { for (int i = 0; i<nphi; ++i) { float theta = sqrtf(random_table[2*(nphi*j+i)]); float phi = 2.0f * static_cast<float>(M_PI) * random_table[2*(nphi*j+i)+1]; ... } }
    pragma simd version:
    int ntheta = c_num_ao_samples; int nphi = c_num_ao_samples; float occlusion = 0.0f; for (int j = 0; j<ntheta; ++j) { #pragma simd vectorlength(4) for (int i = 0; i<nphi; ++i) { float theta = sqrtf(random_table[2*(nphi*j+i)]); float phi = 2.0f * static_cast<float>(M_PI) * random_table[2*(nphi*j+i)+1]; ... } }
  • cilk_for + pragma simd
  • Simply combine the above cilk_for that iterates over each line with the pragma simd that calculates ambient occlusion for each pixel.
    combined version:
    cilk_for (int y = 0; y<height; ++y) { render_x(width, height, y, ambient_occlusion_simd, fimg); }

 

Performance Data:

Note: Modified Speedup shows performance speedup with respect to serial implementation.

Modified Speedup Compiler
(Intel® 64)
Compiler options System specifications
SIMD: 1.01x
cilk_for: 3.5x
Both: 3.3x
Intel® C++ Compiler 15.0 for Windows /O2 /Oi /Ot /fp:fast /QxHost /Qip /MD Microsoft Windows Server 2008*
Intel® Core™ i5 3550 CPU@ 3.50GHz (ES)
4GB memory
SIMD: 1.13x
cilk_for: 4.48x
Both: 4.72x
Intel® C++ Compiler 15.0 for Linux -O2 -fp-model fast -xHost -ip RHEL 7 (x64)
4rd Generation Intel Core™ i7-4790 CPU @ 3.60GHz
32GB memory

 

Build Instructions:

  • For Microsoft Visual Studio* 2010 and 2012 users:
  • Open the solution .sln file
    [Optional] To collect performance numbers (will run example 5 times and take average time):
    • Project Properties -> C/C++ -> Preprocessor -> Preprocessor Definitions: add PERF_NUM
    Choose a configuration (for best performance, choose a release configuration):
    • Intel-debug and Intel-release: uses Intel® C++ compiler
    • VSC-debug and VSC-release: uses Visual C++ compiler (only linear/scalar will run)
  • For Windows* Command Line users:
  • Enable your particular compiler environment
    For Intel® C++ Compiler:
    • Open the appropriate Intel C++ compiler command prompt
    • Navigate to project folder
    • Compile with Build.bat [perf_num]
      • perf_num: collect performance numbers (will run example 5 times and take average time)
    • To run: Build.bat run [help|0|1|2|3|4]
    For Visual C++ Compiler (only linear/scalar will run):
    • Open the appropriate MicrosoftVisual Studio* 2010 or 2012 command prompt
    • Navigate to project folder
    • To compile: Build.bat [perf_num]
      • perf_num: collect performance numbers (will run example 5 times and take average time)
    • To run: Build.bat run>
  • For Linux* or OS X* users:
  • You must have SDL 1.2 and its extension, SDL_ttf 2.0 installed
    You can use your package manager (eg apt-get, aptitude, yum, homebrew, etc) to install the development (dev) packages.
    • Ubuntu users: apt-cache search libSDL-dev and apt-cache search libsdl-ttf
      sudo apt-get install <package-name>
    • OS X users with Homebrew installed: brew install sdl_ttf
    Or use the included source code in sdl_source or download the source code from the SDL 1.2 website and SDL_ttf 2.0 website
    You may also need SDL_ttf's dependency, FreeType 2.4.8
    • To install on Linux, simply ./configure; make; make install each source package, in order of SDL, freetype, SDL_ttf
    • To install on OS X, ./configure --without-x; make; make install for SDL because OS X does not use the X Window System by default

     

    Set the icc environment: source <icc-install-dir>/bin/compilervars.sh {ia32|intel64}
    Navigate to project folder
    For Intel® C++ compiler:
    • To compile: make [icpc] [perf_num=1]
      • perf_num=1: collect performance numbers (will run example 5 times and take average time)
    • To run: make run [option=help|0|1|2|3|4]
    For gcc (only linear/scalar will run):
    • Compile with make gcc [perf_num=1]
      • perf_num=1: collect performance numbers (will run example 5 times and take average time)
    • To run: make run
    If make run throws error font location not valid: set the "FONT_LOCATION" env-var with
    export FONT_LOCATION=<existing font> or
    make run FONT_LOCATION=<existing font>.
有关编译器优化的更完整信息,请参阅优化通知