Image Processing: Averaging Filter

An Averaging filter is a commonly used filter in the field of image processing and is mainly used for removing any noise in a given image. A noise in an image is any presence of pixel values which doesn’t blend well with the actual content of photo like salt-pepper grains on an image. Averaging filter makes use of the concept that any given pixel value will not change drastically from its immediate neighbors. In other words, the current value of a pixel depends more on its immediate neighbors. Averaging filter size decides how many immediate neighbors are considered for the computation of the current pixel values. The most commonly used filter size is 3x3.

This sample has demonstrates how to increase the performance of Averaging filter using Intel(R) Cilk(TM) Plus. Both threading and SIMD solutions are explored in the performance tuning and their corresponding contributions in the speedup are evaluated.

 

Code Change Highlights:

Below are some snapshots of the code changes done in the application code to gain performance.
  • cilk_for
  • linear version: AveragingFilter.cpp, Line Number 53
    for(int i = 1; i < (h+1); i++) { int x = ((resized_width * i) + 1); for(int j = x; j < (x + w); j++) { unsigned int red = 0, green = 0, blue = 0; for(int k1 = (-1); k1 <= 1; k1++) { int pos = j + (k1 * resized_width); for(int k2 = (-1); k2 <= 1; k2++) { red += resized_indataset[(pos + k2)].red; green += resized_indataset[(pos + k2)].green; blue += resized_indataset[(pos + k2)].blue; } } resized_outdataset[j].red = red/9; resized_outdataset[j].green = green/9; resized_outdataset[j].blue = blue/9; } }
    cilk_for version: AveragingFilter.cpp, Line Number 139
    cilk_for(int i = 1; i < (h+1); i++) { int x = ((resized_width * i) + 1); for(int j = x; j < (x + w); j++) { unsigned int red = 0, green = 0, blue = 0; for(int k1 = (-1); k1 <= 1; k1++) { int pos = j + (k1 * resized_width); for(int k2 = (-1); k2 <= 1; k2++) { red += resized_indataset[(pos + k2)].red; green += resized_indataset[(pos + k2)].green; blue += resized_indataset[(pos + k2)].blue; } } resized_outdataset[j].red = red/9; resized_outdataset[j].green = green/9; resized_outdataset[j].blue = blue/9; } }
  • Array Notation
  • scalar version: AveragingFilter.cpp, Line Number 53
    for(int i = 1; i < (h+1); i++) { int x = ((resized_width * i) + 1); for(int j = x; j < (x + w); j++) { unsigned int red = 0, green = 0, blue = 0; for(int k1 = (-1); k1 <= 1; k1++) { int pos = j + (k1 * resized_width); for(int k2 = (-1); k2 <= 1; k2++) { red += resized_indataset[(pos + k2)].red; green += resized_indataset[(pos + k2)].green; blue += resized_indataset[(pos + k2)].blue; } } resized_outdataset[j].red = red/9; resized_outdataset[j].green = green/9; resized_outdataset[j].blue = blue/9; } }
    array notation version: AveragingFilter.cpp, Line Number 101
    for(int i = 1; i < (h+1); i++) { int x = ((resized_width * i) + 1); for(int j = x; j < (x + w); j+=8) { int row2index = 3*j - 3; int jump = (resized_width * 3); int row1index = row2index - jump; int row3index = row2index + jump; out[(row2index + 3):24] = (in[row1index:24] + in[row1index+3:24] + in[row1index+6:24] + in[row2index:24] + in[row2index+3:24] + in[row2index+6:24] + in[row3index:24] + in[row2index+3:24] + in[row2index+6:24])/9; } }
  • cilk_for + Array Notation
  • scalar version: AveragingFilter.cpp, Line Number 53
    for(int i = 1; i < (h+1); i++) { int x = ((resized_width * i) + 1); for(int j = x; j < (x + w); j++) { unsigned int red = 0, green = 0, blue = 0; for(int k1 = (-1); k1 <= 1; k1++) { int pos = j + (k1 * resized_width); for(int k2 = (-1); k2 <= 1; k2++) { red += resized_indataset[(pos + k2)].red; green += resized_indataset[(pos + k2)].green; blue += resized_indataset[(pos + k2)].blue; } } resized_outdataset[j].red = red/9; resized_outdataset[j].green = green/9; resized_outdataset[j].blue = blue/9; } }
    cilk_for + array notation version: AveragingFilter.cpp, Line Number 186
    cilk_for(int i = 1; i < (h+1); i++) { int x = ((resized_width * i) + 1); for(int j = x; j < (x + w); j+=8) { int row2index = 3*j - 3; int jump = (resized_width * 3); int row1index = row2index - jump; int row3index = row2index + jump; out[(row2index + 3):24] = (in[row1index:24] + in[row1index+3:24] + in[row1index+6:24] + in[row2index:24] + in[row2index+3:24] + in[row2index+6:24] + in[row3index:24] + in[row2index+3:24] + in[row2index+6:24])/9; } }

 

Performance Data:

Note: Modified Speedup shows performance speedup with respect to serial implementation.

Modified Speedup Compiler (Intel® 64) Compiler options System specifications
AN: 1.72x
cilk_for: 2.53x
Both: 3.04x
Intel C++ Compiler 15.0 for Windows /Qrestrict /QxAVX /O2 /Qipo Windows Server 2012*
2nd Generation Intel Xeon® E3 1280 CPU @ 3.50GHz
8GB memory
AN: 1.43x
cilk_for: 1.90x
Both: 2.70x
Intel C++ Compiler 15.0 for Linux -restrict -xAVX -O2 -ipo Ubuntu* 10.04 (x64)
3rd Generation Intel Core™ i7-2600K CPU @ 3.40GHz
8GB memory

Build Instructions:

  • For Microsoft Visual Studio* 2010 and 2012 users:
  • Open the solution .sln file
    [Optional] To collect performance numbers (will run example 5 times and take average time):
    • Project Properties -> C/C++ -> Preprocessor -> Preprocessor Definitions: add PERF_NUM
    Choose a configuration (for best performance, choose a release configuration):
    • Intel-debug and Intel-release: uses Intel® C++ compiler
    • VSC-debug and VSC-release: uses Visual C++ compiler (only linear/scalar will run)
  • For Windows* Command Line users:
  • Enable your particular compiler environment
    For Intel® C++ Compiler:
    • Open the appropriate Intel C++ compiler command prompt
    • Navigate to project folder
    • Compile with Build.bat [perf_num]
      • perf_num: collect performance numbers (will run example 5 times and take average time)
    • To run: Build.bat run
    For Visual C++ Compiler (only linear/scalar will run):
    • Open the appropriate MicrosoftVisual Studio* 2010 or 2012 command prompt
    • Navigate to project folder
    • To compile: Build.bat [perf_num]
      • perf_num: collect performance numbers (will run example 5 times and take average time)
    • To run: Build.bat run>
  • For Linux* or OS X* users:
  • Set the icc environment: source <icc-install-dir>/bin/compilervars.sh {ia32|intel64}
    Navigate to project folder
    For Intel® C++ compiler:
    • To compile: make [icpc] [perf_num=1]
      • perf_num=1: collect performance numbers (will run example 5 times and take average time)
    • To run: make run
    For gcc (only linear/scalar will run):
    • Compile with make gcc [perf_num=1]
      • perf_num=1: collect performance numbers (will run example 5 times and take average time)
    • To run: make run
For more complete information about compiler optimizations, see our Optimization Notice.