| January 25, 2009 10:00 PM PST | |
Attached presentation describes SSE3/SSE4 implementation of 3D Convolution for 16bit original data.
SSE Speed-up (comparing with serial code) is ~3x, OpenMP on 2way Harpertown (Penryn) machine rises it ~6x, therefore overall speed-up SSE+OpenMP is ~18x.
Please, find attached:
- PowerPoint presentation, describing this algorithm.
- ZIP file containing C code project implementation, included into simple benchmarking application. The project is built for MS VisStudio-2005.
Command line has the form <appName XYSize Zsize NumberOfRunnings>, for example <Conv3D 512 512 3>.
For more complete information about compiler optimizations, see our Optimization Notice.
Comments (2) 
| April 8, 2010 4:57 PM PDT
Sayan Ghosh |
Hi, Thanks for such a nice example. We've been wanting to use OpenMP + SSE in our Kernel Convolution routines in our lab, and used your code for benchmarking. But we couldn't find any speedup between the serial and the OPenMP + SSE approach (even after removing OpenMP pragmas, the stats remain the same). I am using an Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz to run the code. It would be great if you could provide any advice. Regards, Sayan |
Trackbacks (0)
Leave a comment 
To obtain technical support, please go to Software Support.

video