The Mandelbrot set is a well known application of visual mathematics. It is a good example of simple math with complex (imaginary) numbers on a complex plane leading to visually impressive results. It works by keeping track of how many iterations of the equation `zn+1 = zn2 + c` will occur before a complex number is no longer bound. This can go on forever, and the image generated is unique at every depth, but for the sake of running time there is usually a maximum depth that the iteration can occur. The simulations in this example are run serially, with Intel® Cilk™ Plus `pragma simd` for vectorization, with Intel Cilk Plus `cilk_for` for parallelization, and with both vectorization and `cilk_for`.

In addition, this code uses an image base class (`image_base.h`) and `BMPImage` class (`bmp_image.cpp`) to write the results to a viewable image. Do not worry about how this is implemented, as it is unaffected by the code changes listed below.

### Code Change Highlights:

All changes occur while traversing the complex plane, using one for loop for the y-axis, and one for loop for the x-axis.

• `cilk_for`
• linear version:
`// Traverse the sample space in equally spaced steps with width * height samples for (int j = 0; j < height; ++j) { for (int i = 0; i < width; ++i) { double z_real = x0 + i*xstep; double z_imaginary = y0 + j*ystep; double c_real = z_real; double c_imaginary = z_imaginary; double depth = 0; // Figures out how many recurrences are required before divergence, up to max_depth while(depth < max_depth) { if(z_real * z_real + z_imaginary * z_imaginary > 4.0) { break; // Escape from a circle of radius 2 } double temp_real = z_real*z_real - z_imaginary*z_imaginary; double temp_imaginary = 2.0*z_real*z_imaginary; z_real = c_real + temp_real; z_imaginary = c_imaginary + temp_imaginary; ++depth; } output[j*width + i] = static_cast<unsigned char>(static_cast<double> (depth) / max_depth * 255); } }`
`cilk_for` version:
`// Traverse the sample space in equally spaced steps with width * height samples cilk_for (int j = 0; j < height; ++j) { cilk_for (int i = 0; i < width; ++i) { double z_real = x0 + i*xstep; double z_imaginary = y0 + j*ystep; double c_real = z_real; double c_imaginary = z_imaginary; double depth = 0; // Figures out how many recurrences are required before divergence, up to max_depth while(depth < max_depth) { if(z_real * z_real + z_imaginary * z_imaginary > 4.0) { break; // Escape from a circle of radius 2 } double temp_real = z_real*z_real - z_imaginary*z_imaginary; double temp_imaginary = 2.0*z_real*z_imaginary; z_real = c_real + temp_real; z_imaginary = c_imaginary + temp_imaginary; ++depth; } output[j*width + i] = static_cast(static_cast(depth) / max_depth * 255); } }`
• `pragma simd`
• scalar version:
`// Traverse the sample space in equally spaced steps with width * height samples for (int j = 0; j < height; ++j) { for (int i = 0; i < width; ++i) { double z_real = x0 + i*xstep; double z_imaginary = y0 + j*ystep; double c_real = z_real; double c_imaginary = z_imaginary; double depth = 0; // Figures out how many recurrences are required before divergence, up to max_depth while(depth < max_depth) { if(z_real * z_real + z_imaginary * z_imaginary > 4.0) { break; // Escape from a circle of radius 2 } double temp_real = z_real*z_real - z_imaginary*z_imaginary; double temp_imaginary = 2.0*z_real*z_imaginary; z_real = c_real + temp_real; z_imaginary = c_imaginary + temp_imaginary; ++depth; } output[j*width + i] = static_cast(static_cast(depth) / max_depth * 255); } }`
pragma simd version:
`// Traverse the sample space in equally spaced steps with width * height samples for (int j = 0; j < height; ++j) { #pragma simd for (int i = 0; i < width; ++i) { double z_real = x0 + i*xstep; double z_imaginary = y0 + j*ystep; double c_real = z_real; double c_imaginary = z_imaginary; double depth = 0; // Figures out how many recurrences are required before divergence, up to max_depth while(depth < max_depth) { if(z_real * z_real + z_imaginary * z_imaginary > 4.0) { break; // Escape from a circle of radius 2 } double temp_real = z_real*z_real - z_imaginary*z_imaginary; double temp_imaginary = 2.0*z_real*z_imaginary; z_real = c_real + temp_real; z_imaginary = c_imaginary + temp_imaginary; ++depth; } output[j*width + i] = static_cast(static_cast(depth) / max_depth * 255); } }`
• `cilk_for + pragma simd`
• linear/scalar version:
`// Traverse the sample space in equally spaced steps with width * height samples for (int j = 0; j < height; ++j) { for (int i = 0; i < width; ++i) { double z_real = x0 + i*xstep; double z_imaginary = y0 + j*ystep; double c_real = z_real; double c_imaginary = z_imaginary; double depth = 0; // Figures out how many recurrences are required before divergence, up to max_depth while(depth < max_depth) { if(z_real * z_real + z_imaginary * z_imaginary > 4.0) { break; // Escape from a circle of radius 2 } double temp_real = z_real*z_real - z_imaginary*z_imaginary; double temp_imaginary = 2.0*z_real*z_imaginary; z_real = c_real + temp_real; z_imaginary = c_imaginary + temp_imaginary; ++depth; } output[j*width + i] = static_cast(static_cast(depth) / max_depth * 255); } }`
cilk_for + pragma simd version:
` // Traverse the sample space in equally spaced steps with width * height samples cilk_for (int j = 0; j < height; ++j) { #pragma simd for (int i = 0; i < width; ++i) { double z_real = x0 + i*xstep; double z_imaginary = y0 + j*ystep; double c_real = z_real; double c_imaginary = z_imaginary; double depth = 0; // Figures out how many recurrences are required before divergence, up to max_depth while(depth < max_depth) { if(z_real * z_real + z_imaginary * z_imaginary > 4.0) { break; // Escape from a circle of radius 2 } double temp_real = z_real*z_real - z_imaginary*z_imaginary; double temp_imaginary = 2.0*z_real*z_imaginary; z_real = c_real + temp_real; z_imaginary = c_imaginary + temp_imaginary; ++depth; } output[j*width + i] = static_cast(static_cast(depth) / max_depth * 255); } }`

### Performance Data:

Note: Modified Speedup shows performance speedup with respect to serial implementation.

Modified Speedup Compiler (Intel® 64) Compiler options System specifications
simd: 1.7x
cilk_for: 5.1x
Both: 9.7x
Intel® C++ Compiler 15.0 for Windows /O2 /Oi /Ot /fp:fast /QxHost /Qip /MD Microsoft Windows Server 2012* (x64)
2nd Generation Intel® Xeon® E3 1280 CPU @ 3.50GHz
8GB memory
simd: 1.6x
cilk_for: 5.0x
Both: 7.6x
Intel® C++ Compiler 15.0 for Linux -O2 -fast -fp-model fast -xHost
-ip
RHEL* 7 (x64)
4th Generation Intel® Core™ i7-4790 CPU @ 3.60GHz
32GB memory

### Build Instructions:

• For Microsoft Visual Studio* 2010 and 2012 users:
• Open the solution `.sln` file
[Optional] To collect performance numbers (will run example 5 times and take average time):
• Project Properties -> C/C++ -> Preprocessor -> Preprocessor Definitions: add `PERF_NUM`
Choose a configuration (for best performance, choose a release configuration):
• Intel-debug and Intel-release: uses Intel® C++ compiler
• VSC-debug and VSC-release: uses Visual C++ compiler (only linear/scalar will run)
• For Windows* Command Line users:
• Enable your particular compiler environment
For Intel® C++ Compiler:
• Open the appropriate Intel C++ compiler command prompt
• Navigate to project folder
• Compile with `Build.bat [perf_num]`
• `perf_num`: collect performance numbers (will run example 5 times and take average time)
• To run: `Build.bat run [help|0|1|2|3|4]`
For Visual C++ Compiler (only linear/scalar will run):
• Open the appropriate MicrosoftVisual Studio* 2010 or 2012 command prompt
• Navigate to project folder
• To compile: `Build.bat [perf_num]`
• `perf_num`: collect performance numbers (will run example 5 times and take average time)
• To run: `Build.bat run`
• For Linux* or OS X* users:
• From a terminal window, navigate to the project folder
Using Intel C++ compiler:
• Set the icc environment: `source <icc-install-dir>/bin/compilervars.sh {ia32|intel64}`
• To compile: `make [icpc] [perf_num=1]`
• `perf_num=1`: collect performance numbers (will run example 5 times and take average time)
• To run: `make run [option=help|0|1|2|3|4]`
Using gcc (only linear/scalar will run):
• To compile: `make gcc [perf_num=1]`
• `perf_num=1`: collect performance numbers (will run example 5 times and take average time)
• To run: `make run`
Для получения подробной информации о возможностях оптимизации компилятора обратитесь к нашему Уведомлению об оптимизации.
Возможность комментирования русскоязычного контента была отключена. Узнать подробнее.