If you are familiar with the Intel® Integrated Performance Primitives (Intel® IPP) library you know that it
Attached presentation describes SSE3/SSE4 implementation of 3D Convolution for 16bit original data.
3D Running Average SSE algorithm is implemented for FP (SP) input data. Averaging window is fixed as 11 - this value was requested by customer who initiated this work.
The uploaded presentation describes the SSE implementation of imge 2x shrink, when one pixel contains 4 bytes: 3 color components R, G & B, and the 4th components - weight A.