I realize that F90 gives us some array operations but just trying to figure this out. Old school thinking has us looping over the last array index in the outer most loop to address memory consecutively.
The results I'm getting are not what I expect. With default optimization I used -opt-report and for the "slow" code the compiler is optimizing and switching the order of the loops. For the "fast" code (where I loop over the last index first) it does not and that runs *slower*. What is going on? If I set -O0 then I get the expected result, code below runs faster with j in outer loop.
Source codes attached.
What do I take away from this? Should we not try and be smart about the index order in loops? Thanks for any insight.
parameter (ndimi=2000, ndimj=3000, ntimes=1000)
integer x(ndimi,ndimj),y(ndimi,ndimj), i,j,k
integer timesec1, timesec2
print *, 'time: ', timesec1
do k = 1,ntimes
x(i,j) = 5
y(i,j) = 6
x (i,j) = x(i,j) * y(i,j)
print *, 'time: ',timesec2
print *, 'diff: ' ,timesec2 - timesec1
ifort (IFORT) 12.1.6 20130222
ifort -mcmodel=medium -shared-intel -opt-report loopindex_slow.f >& report_slow.txt
ifort -mcmodel=medium -shared-intel -opt-report loopindex.f > & report.txt
LOOP INTERCHANGE in loops at line: 10 12 13
Loopnest permutation ( 1 2 3 ) --> ( 3 1 2 )