I have some code, where I iterate over an array in reverse order. I already use SSE,AVX (depending on what CPU supports). Normally prefetching of CPU should be finde, if I iterate over an arry from begin to end. But what about end to begin, so reverse? Does the CPU realize this pattern?
Or should I give hints, with _mm_prefetch? If so, how do I use this intrinsic. Should I always give L1 as cache level. And how many iterations before should I prefetch data?