automatic non-temporal store and LOOP COUNT

automatic non-temporal store and LOOP COUNT

I have not seen documented the new automatic non-temporal (cache bypass) store feature of ifort 10. I came upon it by accident. When data are stored into part of an array section declared with a size > 160KB (roughly, according to my experiment), the default for vectorization in 10.0 is as if a directive cdir$ vector nontemporal had been given. This is probably not what you want, if you store a section of < 16KB. The compiler will use non-temporal store even when it splits (distributes) the loop without issuing a PARTIAL VECTOR report, even when the stored values are read back in the second part of a distributed loop. The problem can be corrected by use of the LOOP COUNT directive, e.g. cdec$ loop count (1000), directing the compiler to optimize for a loop count of that size.

1 post / 0 new
For more complete information about compiler optimizations, see our Optimization Notice.