I found that ifort (and gfortran) create a temporary for the following array assignment:
presumably because of the possibility that inc is less than zero. The result is stored in a stride 1 temporary and then copied to the destination, all reporting vectorization.
If I write
do i= 1,n-inc,inc
ifort decides not to vectorize with /QxAVX2. Apparently, that's a good decision, as adding a !dir$ simd to produce simulated gather-scatter makes it slower, even in the case inc==1 (but not as slow as the array assignment with temporary).
Intel's vecanalysis script:
reports heavy-overhead vectorization.
Just one more data point in the continuing question about marginal vectorization.