| January 25, 2009 10:00 PM PST | |
3D Running Average SSE algorithm is implemented for FP (SP) input data. Averaging window is fixed as 11 - this value was requested by customer who initiated this work. Basing on current implementation ideas, it is simple to build versions for other averaging windows as well.
Please, find attached:
- PowerPoint presentation, describing this algorithm.
- ZIP file containing C code project implementation, included into simple benchmarking application. The project is built for MS VisStudio-2005.
Command line has the form <appName Xsize YSize Zsize AveragingWindow NumberOfRunnings>, for example <Averaging 96 116 56 11 30>. As signed above, AveragingWindow currently can be 11 only.
Speed-up (Run-time serial/Run-time SSE) gained by this implementation is ~4x for Merom/Penryn platforms. Improvement gained by multithreading (by OpenMP) is not high, as application comes to memory bandwidth restrictions.
For more complete information about compiler optimizations, see our Optimization Notice.

