The other day I was experimenting with SSE intrinsics and came across an issue that warrants mentioning. I have not submitted an issue with Premier Support as it has little impact on me, but this may be of interest to the other readers (including Intel) of this forum.
One of the techniques used in hand optimizing of applications is to manipulate the scheduling of memory access, reads as well as writes. In attempting to manage the scheduling of reads and writes, the programmer will re-order instruction sequences. In this case it is by re-ordering the _mm_xxx intrinsic instruction sequences in the source code.
The problem is,with compiler optimizations enabled, the sequence of the _mm_xxx intrinsics are re-arranged from that in the source code. In situations where the programmer is not attempting to schedule memory references, the re-arrangement of code is generally a good thing. But in the cases where the programmer is attempting to schedule the sequences ofmemory references the optimization code interferes with the programmer's declared sequence.
The correction for this behavior is NOT to disable optimizations (e.g. with #pragma...). The reason being that although disabling optimization fixes the instruction sequencing, it also disables the optimization of the integer registerization of indexes used in address calculations.
I think the proper way to handle this is to have an option and #pragma that specifies that you wish to maintain the code sequence of the _mm_xxx intrinsic while optimizing anything else.