Image Processing: Mandelbrot

The Mandelbrot set is a well known application of visual mathematics. It is a good example of simple math with complex (imaginary) numbers on a complex plane leading to visually impressive results. It works by keeping track of how many iterations of the equation zn+1 = zn2 + c will occur before a complex number is no longer bound. This can go on forever, and the image generated is unique at every depth, but for the sake of running time there is usually a maximum depth that the iteration can occur. The simulations in this example are run serially, with Intel® Cilk™ Plus pragma simd for vectorization, with Intel Cilk Plus cilk_for for parallelization, and with both vectorization and cilk_for.

In addition, this code uses an image base class (image_base.h) and BMPImage class (bmp_image.cpp) to write the results to a viewable image. Do not worry about how this is implemented, as it is unaffected by the code changes listed below.

 

Code Change Highlights:

All changes occur while traversing the complex plane, using one for loop for the y-axis, and one for loop for the x-axis.

  • cilk_for
  • linear version:
    // Traverse the sample space in equally spaced steps with width * height samples for (int j = 0; j < height; ++j) { for (int i = 0; i < width; ++i) { double z_real = x0 + i*xstep; double z_imaginary = y0 + j*ystep; double c_real = z_real; double c_imaginary = z_imaginary; double depth = 0; // Figures out how many recurrences are required before divergence, up to max_depth while(depth < max_depth) { if(z_real * z_real + z_imaginary * z_imaginary > 4.0) { break; // Escape from a circle of radius 2 } double temp_real = z_real*z_real - z_imaginary*z_imaginary; double temp_imaginary = 2.0*z_real*z_imaginary; z_real = c_real + temp_real; z_imaginary = c_imaginary + temp_imaginary; ++depth; } output[j*width + i] = static_cast<unsigned char>(static_cast<double> (depth) / max_depth * 255); } }
    cilk_for version:
    // Traverse the sample space in equally spaced steps with width * height samples cilk_for (int j = 0; j < height; ++j) { cilk_for (int i = 0; i < width; ++i) { double z_real = x0 + i*xstep; double z_imaginary = y0 + j*ystep; double c_real = z_real; double c_imaginary = z_imaginary; double depth = 0; // Figures out how many recurrences are required before divergence, up to max_depth while(depth < max_depth) { if(z_real * z_real + z_imaginary * z_imaginary > 4.0) { break; // Escape from a circle of radius 2 } double temp_real = z_real*z_real - z_imaginary*z_imaginary; double temp_imaginary = 2.0*z_real*z_imaginary; z_real = c_real + temp_real; z_imaginary = c_imaginary + temp_imaginary; ++depth; } output[j*width + i] = static_cast(static_cast(depth) / max_depth * 255); } }
  • pragma simd
  • scalar version:
    // Traverse the sample space in equally spaced steps with width * height samples for (int j = 0; j < height; ++j) { for (int i = 0; i < width; ++i) { double z_real = x0 + i*xstep; double z_imaginary = y0 + j*ystep; double c_real = z_real; double c_imaginary = z_imaginary; double depth = 0; // Figures out how many recurrences are required before divergence, up to max_depth while(depth < max_depth) { if(z_real * z_real + z_imaginary * z_imaginary > 4.0) { break; // Escape from a circle of radius 2 } double temp_real = z_real*z_real - z_imaginary*z_imaginary; double temp_imaginary = 2.0*z_real*z_imaginary; z_real = c_real + temp_real; z_imaginary = c_imaginary + temp_imaginary; ++depth; } output[j*width + i] = static_cast(static_cast(depth) / max_depth * 255); } }
    pragma simd version:
    // Traverse the sample space in equally spaced steps with width * height samples for (int j = 0; j < height; ++j) { #pragma simd for (int i = 0; i < width; ++i) { double z_real = x0 + i*xstep; double z_imaginary = y0 + j*ystep; double c_real = z_real; double c_imaginary = z_imaginary; double depth = 0; // Figures out how many recurrences are required before divergence, up to max_depth while(depth < max_depth) { if(z_real * z_real + z_imaginary * z_imaginary > 4.0) { break; // Escape from a circle of radius 2 } double temp_real = z_real*z_real - z_imaginary*z_imaginary; double temp_imaginary = 2.0*z_real*z_imaginary; z_real = c_real + temp_real; z_imaginary = c_imaginary + temp_imaginary; ++depth; } output[j*width + i] = static_cast(static_cast(depth) / max_depth * 255); } }
  • cilk_for + pragma simd
  • linear/scalar version:
    // Traverse the sample space in equally spaced steps with width * height samples for (int j = 0; j < height; ++j) { for (int i = 0; i < width; ++i) { double z_real = x0 + i*xstep; double z_imaginary = y0 + j*ystep; double c_real = z_real; double c_imaginary = z_imaginary; double depth = 0; // Figures out how many recurrences are required before divergence, up to max_depth while(depth < max_depth) { if(z_real * z_real + z_imaginary * z_imaginary > 4.0) { break; // Escape from a circle of radius 2 } double temp_real = z_real*z_real - z_imaginary*z_imaginary; double temp_imaginary = 2.0*z_real*z_imaginary; z_real = c_real + temp_real; z_imaginary = c_imaginary + temp_imaginary; ++depth; } output[j*width + i] = static_cast(static_cast(depth) / max_depth * 255); } }
    cilk_for + pragma simd version:
     // Traverse the sample space in equally spaced steps with width * height samples cilk_for (int j = 0; j < height; ++j) { #pragma simd for (int i = 0; i < width; ++i) { double z_real = x0 + i*xstep; double z_imaginary = y0 + j*ystep; double c_real = z_real; double c_imaginary = z_imaginary; double depth = 0; // Figures out how many recurrences are required before divergence, up to max_depth while(depth < max_depth) { if(z_real * z_real + z_imaginary * z_imaginary > 4.0) { break; // Escape from a circle of radius 2 } double temp_real = z_real*z_real - z_imaginary*z_imaginary; double temp_imaginary = 2.0*z_real*z_imaginary; z_real = c_real + temp_real; z_imaginary = c_imaginary + temp_imaginary; ++depth; } output[j*width + i] = static_cast(static_cast(depth) / max_depth * 255); } }

Performance Data:

Note: Modified Speedup shows performance speedup with respect to serial implementation.

Modified Speedup Compiler (Intel® 64) Compiler options System specifications
simd: 1.7x
cilk_for: 5.1x
Both: 9.7x
Intel® C++ Compiler 15.0 for Windows /O2 /Oi /Ot /fp:fast /QxHost /Qip /MD Microsoft Windows Server 2012* (x64)
2nd Generation Intel® Xeon® E3 1280 CPU @ 3.50GHz
8GB memory
simd: 1.6x
cilk_for: 5.0x
Both: 7.6x
Intel® C++ Compiler 15.0 for Linux -O2 -fast -fp-model fast -xHost
-ip
RHEL* 7 (x64)
4th Generation Intel® Core™ i7-4790 CPU @ 3.60GHz
32GB memory

Build Instructions:

  • For Microsoft Visual Studio* 2010 and 2012 users:
  • Open the solution .sln file
    [Optional] To collect performance numbers (will run example 5 times and take average time):
    • Project Properties -> C/C++ -> Preprocessor -> Preprocessor Definitions: add PERF_NUM
    Choose a configuration (for best performance, choose a release configuration):
    • Intel-debug and Intel-release: uses Intel® C++ compiler
    • VSC-debug and VSC-release: uses Visual C++ compiler (only linear/scalar will run)
  • For Windows* Command Line users:
  • Enable your particular compiler environment
    For Intel® C++ Compiler:
    • Open the appropriate Intel C++ compiler command prompt
    • Navigate to project folder
    • Compile with Build.bat [perf_num]
      • perf_num: collect performance numbers (will run example 5 times and take average time)
    • To run: Build.bat run [help|0|1|2|3|4]
    For Visual C++ Compiler (only linear/scalar will run):
    • Open the appropriate MicrosoftVisual Studio* 2010 or 2012 command prompt
    • Navigate to project folder
    • To compile: Build.bat [perf_num]
      • perf_num: collect performance numbers (will run example 5 times and take average time)
    • To run: Build.bat run
  • For Linux* or OS X* users:
  • From a terminal window, navigate to the project folder
    Using Intel C++ compiler:
    • Set the icc environment: source <icc-install-dir>/bin/compilervars.sh {ia32|intel64}
    • To compile: make [icpc] [perf_num=1]
      • perf_num=1: collect performance numbers (will run example 5 times and take average time)
    • To run: make run [option=help|0|1|2|3|4]
    Using gcc (only linear/scalar will run):
    • To compile: make gcc [perf_num=1]
      • perf_num=1: collect performance numbers (will run example 5 times and take average time)
    • To run: make run
Para obtener información más completa sobre las optimizaciones del compilador, consulte nuestro Aviso de optimización.