AVX with faster memory

AVX with faster memory

I am currently running an algorithm using avx on an i5-2500kand aH61 motherboard with DDR3-1033 memory. I get an 8% speedup over the equivalent algorithm using SSE.

It seems that the data flow to the i5 when using AVX is bottlenecked by either the cloggged memory channel or because the memory chip can not supply data fasteror both. I am going to get a DDR3-2133 memory set to speed up the supply of data to the chip to determine whether Ican get more than the 8% speedup out of the AVX.

Would someone who has tried this effect of quicker memory on the AVX performance be kind enough to share their results.

From what I have read I may also use the P67 motherboard instead of the H61 as some have indicated a better memory performance with the use of the P67 alone(i.e. no upgrade in memory frequency). Comments anyone ?

2 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

If the memory bandwidth is a bottleneck of the algorithm vectorization
with either SSE or let alone AVX is not going to provide notable performance
speed up, it is because execution units are idling most of the time anyways.

The better algorithms structure to improve locality of data (and hence
cache-ability), the greater performance it gets, and the bigger benefit from
vectorization it can realize.

The use of 256-bit vectorization with AVX will show the greatest benefit
over the 128-bit vectorization with SSE in algorithms that are consistently
hitting L1 cache while accessing the data. That is achieved by the increased locality
amount of computes being done on the data fetched from the memory. The
simplest example would be the matrix multiply algorithm on big matrices school
book algorithm vs. memory blocking optimization.


Leave a Comment

Please sign in to add a comment. Not a member? Join today