Attached presentation describes SSE3/SSE4 implementation of 3D Convolution for 16bit original data.
SSE Speed-up (comparing with serial code) is ~3x, OpenMP on 2way Harpertown (Penryn) machine rises it ~6x, therefore overall speed-up SSE+OpenMP is ~18x.
Please, find attached:
- PowerPoint presentation, describing this algorithm.
- ZIP file containing C code project implementation, included into simple benchmarking application. The project is built for MS VisStudio-2005.
Command line has the form <appName XYSize Zsize NumberOfRunnings>, for example <Conv3D 512 512 3>.