Penalty for 256-bit loads and stores with cache line splits

Penalty for 256-bit loads and stores with cache line splits

Hi,

I was wondering what the penalty, in clock cycles, is for doing 256-bit loads and stores when there is
a cache line split?

Thanks!

-Jeremy

2 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

For Sandy Bridge, the compilers avoid a cache line split on AVX-256 by always splitting explicitly into AVX-128 instructions, which are expected to be faster in that case. You would have to write intrinsics to test it. Your guess about other AVX CPUs is as good as mine.

Login to leave a comment.