Computational Fluid Dynamics (CFD): Fluid Animate

Не опубликовано

Fluid animate is one of a class of algorithms for calculating fluid flow. Specifically, it utilizes the Smoothed-Particle Hydrodynamics model. In this model, the fluid is represented as a gridless collection of particles that will move based on forces applied to the sample. Density and pressure of the fluid is calculated based on proximity of the particles' neighbors, and the particles will move accordingly. This provides several benefits, such as built-in conservation of mass (the particles themselves represent mass) and solely being a calculation of nearest neighbor, as opposed to linear systems of equations. The kernel is broken into 4 parts, as seen in advance_frame. However, the heaviest work is being done in compute_forces(), which is parallellized with Intel® Cilk™ Plus cilk_for. Additionally, the Vec3 class can be rewritten to take advantage of Intel® Cilk™ Plus Array Notation.

 

This code originally written as part of the Princeton Parsec benchmark suite by Richard O. Lee and later modified by Christian Bienia and Christian Fensch.

 

Code Change Highlights:

  • cilk_for
  • in fluid_animate.cpp, linear version:
    compute_densities() and compute_accelerations(): for(int cell_z = 0; cell_z < numCells_z; ++cell_z) { for(int cell_y = 0; cell_y < numCells_y; ++cell_y) { for(int cell_x = 0; cell_x < numCells_x; ++cell_x, ++cell_index) { ... // kernel work } } }
    In the linear code, it is possible for two cells to share a neighbor, which could potentially cause a data race in parallel; so make sure to only work on cells that cannot share neighbors
    in fluid_animate.cpp, cilk_for version:
    compute_densities_cilk() and compute_accelerations_cilk(): // Divide cells into partitions such that they are at least 3 cells apart for (int mod_z = 0; mod_z < 3; ++mod_z) { // Because they are 3 cells apart, there can be no overlapping neighbors // This means there is no potential data races, so it can be safely run in parallel cilk_for(int cell_z = mod_z; cell_z < numCells_z; cell_z+=3) { int index_z = cell_z*numCells_y*numCells_x; for(int cell_y = 0; cell_y < numCells_y; cell_y++) { int index_y = cell_y*numCells_x + index_z; for(int cell_x = 0; cell_x < numCells_x; cell_x++) { int cell_index = index_y + cell_x; ... // kernel work } } } }
  • Array Notation (AN)
  • Changing the Vec3 class to hold its x, y, and z values as a 3-item array allows for Array Notation math in Vec3 operations
    in fluid_animate.h, scalar version:
    class Vec3: class Vec3 { public: float x, y, z; ... Vec3 operator + (Vec3 const &v) const { return Vec3(x+v.x, y+v.y, z+v.z); } Vec3 operator - (Vec3 const &v) const { return Vec3(x-v.x, y-v.y, z-v.z); } Vec3 operator * (float s) const { return Vec3(x*s, y*s, z*s); } Vec3 operator / (float s) const { return Vec3(x/s, y/s, z/s); } }
    in fluid_animateAN.h, array notation version:
    class Vec3AN: class Vec3AN { public: // Add an extra float for so that vector math will line up on the cache line float vec[4]; ... Vec3AN operator + (Vec3AN const &v) const { Vec3AN tmp(*this); tmp.vec[:] += v.vec[:]; return tmp; } Vec3AN operator - (Vec3AN const &v) const { Vec3AN tmp(*this); tmp.vec[:] -= v.vec[:]; return tmp; } Vec3AN operator * (float s) const { Vec3AN tmp(*this); tmp.vec[:] *= s; return tmp;} Vec3AN operator / (float s) const { Vec3AN tmp(*this); tmp.vec[:] /= s; return tmp; } }

Performance Data:

Note: Modified Speedup shows performance speedup with respect to serial implementation.

Modified Speedup Compiler (Intel® 64) Compiler options System specifications
AN: 1.1x
cilk_for: 3.2x
Both: 3.6x
Intel® C++ Compiler 15.0 for Windows /O3 /QxAVX /Qipo Microsoft Windows Server* 2012 (x64)
2rd generation Intel Xeon E3 1280 CPU @ 3.50GHz
8GB memory
AN: 1.1x
cilk_for: 3.2x
Both: 3.8x
Intel® C++ Compiler 15.0 for Linux -O3 -xAVX -std=c++11 -ipo Red Hat* Enterprise Linux 7 (x64)
Intel® Core™ i7-4790 CPU @ 3.60GHz
32GB memory

Build Instructions:

  • For Microsoft Visual Studio* 2010 and 2012 users:
  • Open the solution .sln file
    [Optional] To collect performance numbers (will run example 5 times and take average time):
    • Project Properties -> C/C++ -> Preprocessor -> Preprocessor Definitions: add PERF_NUM
    [Optional] To write results to a file:
    • Project Properties -> Configuration Properties -> Debugging -> Command Arguments: add -f <filename>
    Choose a configuration (for best performance, choose a release configuration):
    • Intel-debug and Intel-release: uses Intel® C++ compiler
    • VSC-debug and VSC-release: uses Visual C++ compiler (only linear/scalar will run)
  • For Windows Command Line users:
  • Enable your particular compiler environment
    For Intel® C++ Compiler:
    • Open the appropriate Intel® C++ compiler command prompt and navigate to project folder
    • To compile: Build.bat [perf_num]
      • perf_num: collect performance numbers (will run example 5 times and take average time)
    • To run: Build.bat run [-o help|0|1|2] [-f <filename>]
      • -f <filename>: name of a file to write final value (omit .fluid)
    For Visual C++ Compiler (only linear/scalar will run):
    • Open the appropriate Microsoft Visual Studio* 2010 or 2012 command prompt and navigate to project folder
    • To compile: Build.bat [perf_num]
      • perf_num: collect performance numbers (will run example 5 times and take average time)
    • To run: Build.bat run [-f <filename>]
      • -f <filename>: name of a file to write final value (omit .fluid)
  • For Linux* or OS X* users:
  • From a terminal window, navigate to the project folder
    Using Intel® C++ compiler:
    • Set the icc environment: source <icc-install-dir>/bin/compilervars.sh {ia32|intel64}
    • To compile: make or make icpc [perf_num=1]
      • perf_num=1: collect performance numbers (will run example 5 times and take average time)
    • to run: make run [option=help|0|1|2] [outfile=<filename>]
      • outfile=<filename>: name of a file to write final value (omit .fluid)
    Using gcc (only linear/scalar will run):
    • To compile: make gcc [perf_num=1]
      • perf_num=1: collect performance numbers (will run example 5 times and take average time)
    • To run: make run [outfile=<filename>]
      • outfile=<filename>: name of a file to write final value (omit .fluid)
Пожалуйста, обратитесь к странице Уведомление об оптимизации для более подробной информации относительно производительности и оптимизации в программных продуктах компании Intel.
Возможность комментирования русскоязычного контента была отключена. Узнать подробнее.