I am just starting with SSE optimizations.I tried a very simple task of adding an array
of 2-dimensional vectors.I made three versions - without utilizing
any sort of SIMD instructions (http://pastebin.com/m3e8838c2), using
SSE2 instructions via intel intrinsics (http://pastebin.com/m783f8e7d)
and using SSE2 instructions through GCC vector intrinsics
(http://pastebin.com/m6f36194e). The best times obtained were without
using any SIMD instructions. I used the gcc 4.2 compiler with -march=prescott and
When I tried compiling without the -O3 flag, the code with the gcc
vector intrinsics was 1.5 times faster than the one without SIMD
instructions, and intel intrinsics code was the slowest :-(.
Any help will be greatly appreciated.
performance issues with SSE2
Tue, 06/10/2008 - 11:41