ivdep wrong results with backward loop

ivdep wrong results with backward loop

I've been some some incorrect results with the IVDEP pragma, and I don't think it's because I'm using it wrong.  I'm doing something very similar to what's show in Example 1 in http://software.intel.com/sites/products/documentation/doclib/stdxe/2013...

except in Fortran, not C.  Basically, when I do something directly analogous to that simple loop, it works.  When I try the backward loop version, it gets the wrong answer some of the time (depending on compiler optimization and vectorization flags)  if the vectorized loop is called inside an outer loop.

All the compilers I tried are OK with -O1.  With -O2 (and without -no-vec), all compilers report that the relevant loops (lines 34, 47, and 73) were <em>not</em> vectorized because it would be inefficient.  When I remove the !DIR IVDEP pragmas all compilers and options work right.

ifort 11.1.080: bad with -O2, OK with "-no-vec -O2".

ifort 12.1.6.361 and 13.0, bad with just -O2 and also "-no-vec -O2"

So, can anyone is this really a compiler bug, or am I somehow subtly misusing IVDEP?

AttachmentSize
Download ivdep-1.f901.97 KB
Download ivdep-out.txt2 KB
8 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

I wouldn't call it subtle, but the compiler takes your use of IVDEP as an assertion that loop reversal won't break your code, as the only reason the compiler found for avoiding that change was the "vector dependence." Perhaps the compiler might see a "proven dependence" if you put the calculation of il closer to the loop.
In past discussions, I've been told that it's not feasible for ifort to deal with auto-vectorization of loops which need to run backwards, although it's implemented in a few compilers (e.g. Oracle).
With such short loops as you have in this example, auto-vectorization can't be of much use, except possibly in cases of known alignment (hence the compiler comments about "seems inefficient").

Thanks. The reason I called it subtle is because the forward loop with k > 0 (in the example I linked to) is equivalent to the backward loop with k < 0 (my case): both would break if you reversed the loop. So I figured if IVDEP is OK for the example, it should be OK for me.

In practice, ifort reverses only backward running loops, so you won't see this problem in a loop which works forward.
Automatic reversal with checking of context for correctness has been known to be desirable at least since f90 was introduced, but I don't know of any compilers which are entirely successful with it.

My point is that if IVDEP is correctly used in the simple example in the forward loop with positive k, it should also be fine for the backward loop with negative k. If IVDEP means you can reverse the loop, then it's _not_ appropriate for the forward loop with positive k (i.e. it's just luck that the compiler doesn't break things), and the documentation I linked to is incorrect.

I suppose I'll have to go through the code that inspired this (VASP), and remove all the IVDEP directives when the loop is backward.

By using IVDEP correctly in a case where ifort won't do anything with it, you must be referring to some other (hypothetical?) compiler, in this case one which may reverse forward-going loops. Unfortunately, there is no standard about IVDEP, and this would not be the first time when a change in compilers breaks an IVDEP usage.
A case could certainly be made against unnecessary use of IVDEP. The idea of the past of providing more specific directives didn't catch on. ifort's !dir$ simd vectorlength(16) has some of that flavor, if it is seen as an assertion that the source and destination are at least 16 array elements apart (not valid in your cases). The simd also includes some IVDEP flavor beyond that. It remains to be seen whether renewed proposals for a standard on vectorization directives will overcome the current deficiencies in documentation.

Well, as a user who's reading the IVDEP documentation, I have no idea whether the Intel compiler does anything different for forward vs. backward loops. So when I see an example that says IVDEP is OK in one situation (forward loop with references used in the direction shown in the example), I have no way of knowing that in what seems like an equivalent situation (a backward loop and references in the other direction), IVDEP will leads to incorrect behavior. Hence, I found the documentation to be confusing. Now I know.

jimdempseyatthecove's picture

FWIW
!!DIR$ IVDEP
do nzz=nz, 1, -1
C(ndest+nzz) = C(ndest+nzz-2*il)
end do
il is either 2 or 1. Thus making a vector dependancy of current cell-2 or current cell-4.
Vector size prior to Sandy Bridge was 2xREAL(8), Sandy Bridge has 4xREAL(8) capability. This means the backwards loop, with .-2 dependancy may have worked only if using SSE or AVX prior to Sandy Bridge .AND. if the generated code did not unroll the loop.
When you specify IVDEP, you are making a contract with the compiler, that assures it that the code will not have vector dependencies.
In this case you violated this contract.
Jim Dempsey

www.quickthreadprogramming.com

Login to leave a comment.