Exemple of image Filter SSE acceleration


I am working on image processing and starting to optimize a filtering algorithm.
I wonder if there is an exemple of a simple 3x3 pixel-domain filter using SSE(4) for x86?

Please take a look at:


or search for an artcle at:


PS: I remember that there is an example of a motion estimation with some set of SSE instructions and performance
improvements are pretty good.

