3D Running Average SSE algorithm

Submit New Article

January 25, 2009 10:00 PM PST


3D Running Average SSE algorithm is implemented for FP (SP) input data. Averaging window is fixed as 11 - this value was requested by customer who initiated this work. Basing on current implementation ideas, it is simple to build versions for other averaging windows as well.

Please, find attached:

  1. PowerPoint presentation, describing this algorithm.
  2. ZIP file containing C code project implementation, included into simple benchmarking application. The project is built for MS VisStudio-2005.

Command line has the form <appName Xsize YSize Zsize AveragingWindow NumberOfRunnings>, for example <Averaging 96 116 56 11 30>. As signed above, AveragingWindow currently can be 11 only.

Speed-up (Run-time serial/Run-time SSE) gained by this implementation is ~4x for Merom/Penryn platforms. Improvement gained by multithreading (by OpenMP) is not high, as application comes to memory bandwidth restrictions.