Image Processing: Averaging Filter with Intel® SDLT

An Averaging filter is a commonly used filter in the field of image processing and is mainly used for removing any noise in a given image. A noise in an image is any presence of pixel values which doesn’t blend well with the actual content of photo like salt-pepper grains on an image. Averaging filter makes use of the concept that any given pixel value will not change drastically from its immediate neighbors. In other words, the current value of a pixel depends more on its immediate neighbors. Averaging filter size decides how many immediate neighbors are considered for the computation of the current pixel values. Most commonly used filter size 3x3.

This sample demonstrates how to increase the performance of Averaging filter using Intel® SIMD Data Layout Template library (Intel® SDLT). Intel® SDLT library provides an Array of Structure interface to the user but stores the data in Structure of Array format in memory.

 
  • System Requirements
  • Hardware:
    • Any Intel® processor with Intel® Advanced Vector Extensions (Intel® AVX) support like 2nd Generation Intel Core i3, i5, or i7 processors and Intel Xeon® E3 or E5 processor family or newer
    Software requirements on Microsoft* Windows* :
    • Microsoft Visual Studio 2012 * , 2013* or 2015* Professional Edition or above
    • Intel® Parallel Studio XE 2016 Composer Edition Update 1(or higher) for C++ Windows*
    Software requirements on Linux*:
    • GNU* GCC 4.5 or above
    • Intel® Parallel Studio XE 2016 Composer Edition Update 1(or higher) for C++ Linux*
    Note:sample application requires Intel C++ compiler to build.
 

Code Change Highlights:

Below is the code snippet of the code change done in the application to gain performance.
  • Intel® SDLT
  • scalar version: AveragingFilter.cpp, Line Number 41
                ALIGN
                void process_image_serial(rgb8 *indataset, rgb8 *outdataset, int w, int h) {
                    #ifdef __INTEL_COMPILER
                    __assume_aligned(indataset, ALIGNMENT);
                    __assume_aligned(outdataset, ALIGNMENT);
                    #endif
                    int reduced_width = w - 2;
                    t1.start();
                    for(int i = 1; i < (h-1); i++)
                    {
                        int x = (w * i) + 1;
                        for (int j = x; j < (x + reduced_width); j++)
                        {
                            const rgb16 p00(indataset[j - w - 1]);
                            const rgb16 p01(indataset[j - w]);
                            const rgb16 p02(indataset[j - w + 1]);
                            const rgb16 p10(indataset[j - 1]);
                            const rgb16 p11(indataset[j]);
                            const rgb16 p12(indataset[j + 1]);
                            const rgb16 p20(indataset[j + w - 1]);
                            const rgb16 p21(indataset[j + w]);
                            const rgb16 p22(indataset[j + w + 1]);
                            rgb16 sum = p00 + p01 + p02 + p10 + p11 + p12 + p10 + p11 + p12;
                            const rgb16 sum1 = sum / 9;
                            outdataset[j] = rgb8(sum1);
                        }
                    }
                    t1.stop();
                    average += t1.get_ticks();
                    return;
                }
                
    Intel® SDLT version: AveragingFilter.cpp, Line Number 94
                SDLT_NOINLINE ALIGN
                void process_image_sdlt(Container::accessor indataset, Container::accessor outdataset, int w, int h)
                {
                    int reduced_width = w - 2;
                    t1.start();
    
                    SDLT_INLINE_BLOCK
                    {
                        for (int i = 1; i < (h - 1); i++)
                        {
                            int x = ((w * i) + 1);
                            #pragma simd
                            for (int j = x; j < (x + reduced_width); j++)
                            {
                                const rgb16 p00(unproxy(indataset[j - w - 1]));
                                const rgb16 p01(unproxy(indataset[j - w]));
                                const rgb16 p02(unproxy(indataset[j - w + 1]));
                                const rgb16 p10(unproxy(indataset[j - 1]));
                                const rgb16 p11(unproxy(indataset[j]));
                                const rgb16 p12(unproxy(indataset[j + 1]));
                                const rgb16 p20(unproxy(indataset[j + w - 1]));
                                const rgb16 p21(unproxy(indataset[j + w]));
                                const rgb16 p22(unproxy(indataset[j + w + 1]));
                                rgb16 sum = p00 + p01 + p02 + p10 + p11 + p12 + p10 + p11 + p12;
                                const rgb16 sum1 = sum / 9;
                                outdataset[j] = rgb8(sum1);
                            }
                        }
                    }
                    t1.stop();
                    average += t1.get_ticks();
                    return;
                }
                

Performance Gain with Intel® SDLT

Modified Speedup Compiler(Intel® 64) Compiler options System specifications
SDLT: 3.3x Intel® Parallel Studio XE 2016 Composer Edition Update 1 for C++ Windows* /Qrestrict /QxAVX /O2 /Qstd=c++11 Windows Server 2012*
6th Gen Intel® Core™ i7 - 6700 CPU @ 3.40GHz + 8GB memory
SDLT: 3.8x Intel® Parallel Studio XE 2016 Composer Edition Update 1 for C++ Linux* -restrict -xAVX -O2 -std=c++11 Ubuntu* 12.04
4th Gen Intel® Core™ i5 - 4670T CPU @ 2.30GHz + 8GB memory

Build Instructions :

  • For Microsoft Visual Studio 2012*, 2013* or 2015* users :
  • Open Visual Studio and load the solution.sln file
    [Optional] To collect performance numbers (will run example 5 times and take average time):
    • Project Properties - > C/C++ - > Preprocessor - > Preprocessor Definitions: add PERF_NUM
    Choose a configuration(for best performance, choose a release configuration) :
    • Intel - debug: Debug configuration using Intel C++ compiler
    • Intel - release: Release configuration using Intel C++ compiler
  • For Windows Command Line users :
  • Enable your particular compiler environment
    For Intel C++ Compiler :
    • Open the appropriate Intel C++ compiler command prompt
    • Navigate to project folder
    • Compile with Build.bat [perf_num]
      • perf_num: collect performance numbers (will run example 5 times and take average time)
    • To run: Build.bat run
  • For Linux* users:
  • From a terminal window, navigate to the project folder
    Using Intel® C++ compiler :
    • Set the environment : source < icc - install - dir> / bin / compilervars.sh ia32 or intel64
    • To compile: make [icpc] [perf_num=1]
      • perf_num=1: collect performance numbers (will run example 5 times and take average time)
    • To run: make run
Last Updated: 
Tuesday, November 10, 2015
For more complete information about compiler optimizations, see our Optimization Notice.