I'm in the middle of trying to optimize a large F90 program and was wondering if anyone knows of tools that can help with the following:
1) Showing where values computed are used, or where values are coming from. i.e., I want to see the data flow analysis, especially across procedure and file boundaries.
2) Something that can give me any indication about how to restructure loops to optimize code. I was looking at some code, the inner most loops being:
sumrd = 0.0
DO k = 1,nkin
DO np = 1,nreactmin(k)
DO i2 = 1,ncomp+nexchange
sumrd(i2) = sumrd(i2) + &
And I thought that the loop nest was in a bad order so I moved the i2 loop just outside the k loop and it absolutely killed performance. The whole program ran 30% slower. This loop nest went from less than 5% to 30% of the time. What did the compiler do and how can I use that information to help speed up other parts of the code?
I have 800 lines of code that I need to speed up. There are very few if statements and it's all just crank and grind over a big grid. It all fits in L2 cache without a problem. But I'm getting only 10% of peak performance and I'd expect better.
I'm using the 9.0 compiler on a linux/itanium machine. I use
as this is some magic I found on the net