Image Processing: Discrete Cosine Transforms

Discrete Cosine Transform(DCT) and Quantization are the first two steps in JPEG compression standard. This sample demonstrates how DCT and Quantizing stages can be implemented to run faster using Intel® Cilk™ Plus. In order to see the effect of quantization on the image, the output of Quantization phase is passed on to the de-quantizer followed by Inverse DCT and stored as an output image file. DCT is a lossy compression algorithm which is used to represent every data point value using infinite sum of cosine functions which are linearly orthogonal to each other. DCT is the first step of compression in the JPEG standard. The program shows the possible effect of quality reduction in the image when we do DCT followed by quantization like in JPEG compression. To visibly see the effects if any, the inverse operations (Dequantization and Inverse Discrete Cosine Transform (IDCT)) are done and output is saved as bitmap image.

This sample uses a serial implementation of the 2D-DCT (Two Dimensional DCT) algorithm, Array Notation(AN) version of the algorithm for explicit vectorization and finally the cilk_for + Array Notation version which includes both threading and vectorization solution.

 

Code Change Highlights:

Below are some snapshots of the code changes done in the application code to gain performance.
  • cilk_for
  • linear version: DCT.cpp, Line Number 293
    for(int i = 0; i < (size_of_image)/64; i++) { startindex = (i * 64); process_image_serial(indata, outdata, startindex); }
    cilk_for version: DCT.cpp, Line Number 303
    cilk_for(int i = 0; i < (size_of_image)/64; i++) { startindex = (i * 64); process_image_serial(indata, outdata, startindex); }
  • Array Notation
  • scalar version: matrix.cpp, Line Number 81
    matrix_serial matrix_serial::operator*(matrix_serial &y){ int size = y.row_size; matrix_serial temp(size); for(int i = 0; i < size; i++) { for(int j = 0; j < size; j++) { temp.ptr[(i * size) + j] = 0; for(int k = 0; k < size; k++) temp.ptr[(i * size) + j] += (ptr[(i * size) + k] * y.ptr[(k * size) + j]); } } return temp; }
    array notation version: matrix.cpp, Line Number 17
    matrix_AN matrix_AN::operator*(matrix_AN &y){ int size = row_size; matrix_AN temp(size); for(int i = 0; i < size; i++) { temp.ptr[(i * size):size] = 0; for(int j = 0; j < size; j++) { temp.ptr[(i * size):size] = temp.ptr[(i * size):size] + (ptr[(i * size) + j] * y.ptr[(j * size):size]); } } return temp; }
  • cilk_for + Array Notation
  • Combine cilk_for and Array Notation implementation as shown above to compute the DCT and IDCT of the image.

 

Performance Data:

Note: Modified Speedup shows performance speedup with respect to serial implementation.

Modified Speedup Compiler (Intel® 64) Compiler options System specifications
AN: 2.05x
cilk_for: 4.37x
Both: 8.01x
Intel C++ Compiler 15.0 for Windows /O2 /Oi /fp:fast /QxAVX Windows Server 2012*
2nd Generation Intel Xeon® E3 1280 CPU @ 3.50GHz
8GB memory
AN: 2.35x
cilk_for: 3.63x
Both: 8.53x
Intel C++ Compiler 15.0 for Linux -O2 -fp-model fast -xAVX Ubuntu* 10.04
3rd Generation Intel Core™ i7-2600K CPU @ 3.40GHz
8GB memory

Build Instructions:

  • For Microsoft Visual Studio* 2010 and 2012 users:
  • Open the solution .sln file
    [Optional] To collect performance numbers (will run example 5 times and take average time):
    • Project Properties -> C/C++ -> Preprocessor -> Preprocessor Definitions: add PERF_NUM
    Choose a configuration (for best performance, choose a release configuration):
    • Intel-debug and Intel-release: uses Intel® C++ compiler
    • VSC-debug and VSC-release: uses Visual C++ compiler (only linear/scalar will run)
  • For Windows* Command Line users:
  • Enable your particular compiler environment
    For Intel® C++ Compiler:
    • Open the appropriate Intel C++ compiler command prompt
    • Navigate to project folder
    • Compile with Build.bat [perf_num]
      • perf_num: collect performance numbers (will run example 5 times and take average time)
    • To run: Build.bat run
    For Visual C++ Compiler (only linear/scalar will run):
    • Open the appropriate MicrosoftVisual Studio* 2010 or 2012 command prompt
    • Navigate to project folder
    • To compile: Build.bat [perf_num]
      • perf_num: collect performance numbers (will run example 5 times and take average time)
    • To run: Build.bat run>
  • For Linux* or OS X* users:
  • Set the icc environment: source <icc-install-dir>/bin/compilervars.sh {ia32|intel64}
    Navigate to project folder
    For Intel® C++ compiler:
    • To compile: make [icpc] [perf_num=1]
      • perf_num=1: collect performance numbers (will run example 5 times and take average time)
    • To run: make run
    For gcc (only linear/scalar will run):
    • Compile with make gcc [perf_num=1]
      • perf_num=1: collect performance numbers (will run example 5 times and take average time)
    • To run: make run
Last Updated: 
Segunda-feira, 30 setembro, 2013
Para obter informações mais completas sobre otimizações do compilador, consulte nosso aviso de otimização.