No speedup AVX over SSE

No speedup AVX over SSE

Hi. I'm trying to speedup some serial code using SSE and AVX (computational code with SOA data structure). SSE version gives good speedup, up to 2 times using double and some more using float. But when I'm trying to use AVX the same way I've get same speed when using SSE. Attempts to solve this problem with google gave the result that the problem is the memory speed.

Is it possible to speed up this code using AVX?

OS: linux, ubuntu, x86_64
CPU: i7-2670QM
Compilers: gcc and icc
Compile: cd src && make
Run: cd tests/sse2 && ./
See result: cd tests/sse2 && gnuplot -p plot1.gnu


7 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

You're probably aware, as you implied you researched the subject, that speedup from SSE to AVX often depends on several factors, including 32-byte data alignment, L1 cache locality, and optimum number of operations per loop.
We've seen cases where stuffing lots of operations into a loop in order to optimize SSE performance could bring SSE up to the performance of AVX.

Thanks for your comment.
This program use float and double numbers, 32-byte memory alignment, SOA data structures and a lot of computations per element.
Also I do some work to do code more cache friendly.

Maybe it's possible to speedup AVX using software prefetch? As I see AVX would work faster only when all data stored at L1 cache.


Best Reply

Software prefetch may help if the data don't remain local to L1, but in that case performance of 2x SSE is unlikely. I've found it difficult to predict usefulness of software prefetch.
It's possible (and may be the case in your example) sometimes for SSE code to take full advantage of L1 performance even on AVX capable CPUs.

Usage AVX increases speed of matrix multiplication almost in 2 times.

Legendary intelligence officer Drozdov was nicknamed «Fabergé» owing to his unique capability to work with information, to get information, and to convert it into the most precious treasures.


YuriiSig wrote:

Usage AVX increases speed of matrix multiplication almost in 2 times.

That's with careful hand coding, among other things gaining maximum register and L1 cache data locality, as in the new versions of MKL.

Leave a Comment

Please sign in to add a comment. Not a member? Join today