# Index offset leads to failure of vectorizing simple loops

## Index offset leads to failure of vectorizing simple loops

Hi,

I have a quick question (maybe a compiler bug?) regarding vectorization. I am playing with the following code piece:

Declaration of variables:

```1415     real(kind=8), allocatable :: F(:), dF(:)
1416     real(kind=8) :: value```

Loop that is being vecotirzed:

```1486         do k = ks, ke
1487           do ii = iis, iie
1488             value = vals(ii)
1489             do j = js, je
1490               ind_offset = ( (k-1)*N2 + (j-1) ) * N1g
1491               ioffset = ii + ind_offset
1492               do i = is, ie
1493                 dF(i + ind_offset) = dF(i + ind_offset) + value * F(i + ioffset)
1494               end do
1495             end do
1496           end do
1497         end do```

Vectorization report for the inner loop (Line 1492 to Line 1494):

```src/ModDeriv.f90(1492): (col. 15) remark: loop was not vectorized: existence of vector dependence.
src/ModDeriv.f90(1493): (col. 17) remark: vector dependence: assumed ANTI dependence between  line 1493 and  line 1493.
src/ModDeriv.f90(1493): (col. 17) remark: vector dependence: assumed FLOW dependence between  line 1493 and  line 1493.
src/ModDeriv.f90(1493): (col. 17) remark: vector dependence: assumed FLOW dependence between  line 1493 and  line 1493.
src/ModDeriv.f90(1493): (col. 17) remark: vector dependence: assumed ANTI dependence between  line 1493 and  line 1493.```

I compiled the code using ifort 13.1.0 with -O2 -vec-report6. From the vectorization report, the compiler could not vectorize the inner loop (Line 1492 to Line 1494). The compiler was failing to recognize that the inner loop is the same situation as

`dF( i ) = dF( i ) + v * F( i )`

which should be no different than

`A( i ) = B( i )  + c * D( i )`

all of which are vectorizable. Any suggestions?

Thanks for your time and help.

Best regards,
Wentao

5 posts / 0 new
For more complete information about compiler optimizations, see our Optimization Notice.

In front of your DO i loop insert:
!DEC\$ SIMD

Jim Dempsey

Quote:

jimdempseyatthecove wrote:

In front of your DO i loop insert:
!DEC\$ SIMD

Jim Dempsey

Hi Jim,

Thanks for your reply. I could vectorize the loop by adding !DIR\$ SIMD in front of the DO i loop. Actually I am just curious why the compiler is so conservative in face of such an easy loop:-)

Best regards,
Wentao

Not sure, probably an oversight by the compiler optimization gurus.

TimP mentioned in a different thread that the optimizer will at times re-order and/or collapse the loop nesting when it thinks the performance would be better. This is a good test case where it is not.

Jim Dempsey

Quote:

jimdempseyatthecove wrote:

Not sure, probably an oversight by the compiler optimization gurus.

TimP mentioned in a different thread that the optimizer will at times re-order and/or collapse the loop nesting when it thinks the performance would be better. This is a good test case where it is not.

Jim Dempsey

Hi Jim,

Thanks for your reply. I found if I only had one inner loop (not multi-level loop), the compiler could vectorize the code without !DIR\$ SIMD:

```1486         k = ks
1487           ii = iis
1488             value = vals(ii)
1489             j = js
1490               ind_offset = ( (k-1)*N2 + (j-1) ) * N1g
1491               ioffset = ii + ind_offset
1492               do i = is, ie
1493                 dF(i + ind_offset) = dF(i + ind_offset) + value * F(i + ioffset)
1494               end do```

So it should be the loop nesting that led to this issue. Thanks!

Best regards,
Wentao