I've been some some incorrect results with the IVDEP pragma, and I don't think it's because I'm using it wrong. I'm doing something very similar to what's show in Example 1 in http://software.intel.com/sites/products/documentation/doclib/stdxe/2013...
except in Fortran, not C. Basically, when I do something directly analogous to that simple loop, it works. When I try the backward loop version, it gets the wrong answer some of the time (depending on compiler optimization and vectorization flags) if the vectorized loop is called inside an outer loop.
All the compilers I tried are OK with -O1. With -O2 (and without -no-vec), all compilers report that the relevant loops (lines 34, 47, and 73) were <em>not</em> vectorized because it would be inefficient. When I remove the !DIR IVDEP pragmas all compilers and options work right.
ifort 11.1.080: bad with -O2, OK with "-no-vec -O2".
ifort 18.104.22.1681 and 13.0, bad with just -O2 and also "-no-vec -O2"
So, can anyone is this really a compiler bug, or am I somehow subtly misusing IVDEP?