16bit 3D Convolution: SSE4+OpenMP implementation on Penryn CPU

Attached presentation describes SSE3/SSE4 implementation of 3D Convolution for 16bit original data.

SSE Speed-up (comparing with serial code) is ~3x, OpenMP on 2way Harpertown (Penryn) machine rises it ~6x, therefore overall speed-up SSE+OpenMP is ~18x.

Please, find attached:

  1. PowerPoint presentation, describing this algorithm.
  2. ZIP file containing C code project implementation, included into simple benchmarking application. The project is built for MS VisStudio-2005.

Command line has the form <appName XYSize Zsize NumberOfRunnings>, for example <Conv3D 512 512 3>.

Package icon conv3d.zip10.02 KB
Office presentation icon convol3d16bit.ppt399.5 KB
For more complete information about compiler optimizations, see our Optimization Notice.




Thanks for such a nice example. We've been wanting to use OpenMP + SSE in our Kernel Convolution routines in our lab, and used your code for benchmarking. But we couldn't find any speedup between the serial and the OPenMP + SSE approach (even after removing OpenMP pragmas, the stats remain the same).
I am using an Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz to run the code. It would be great if you could provide any advice.


I don't see the fuss about 3D. Call me old-fashioned, but I think a good script, good acting and good direction are what is important to a movie or TV show. Avatar was terrible, but it looked good. But that's not enough for me. I like watching online video sites and keep up to date with http://www.twitter.com/dozenvideo but I can't see what the 3D hype is all about. I think in 2 years, 3D will be yesterday's news. Don't believe the hype, 3D is just another gimmick.

Add a Comment

Have a technical question? Visit our forums. Have site or software product issues? Contact support.