In a compute-bound pgm, there was a loop that called two functions.
Each of these functions had a large local array declared. The
allocation of these arrays on the stack increased the execution time by
a factor of 10-20. What can be done?
The original code, running under CVF6, used a real array for two
distinct purposes. Under certain circumstances, the entire array
contained real data as input to the function, and returned modified
data. Under different circumstances, the first few elements of the
array contained inputs to the function, and were returned unmodified.
So, we have something like
REAL, DIMENSION(100000), INTENT(INOUT) :: LongData
REAL, DIMENSION(3) :: ShortData
x=MyFunc(LongData) ! the first case
x=MyFunc(ShortData) ! the second case
REAL FUNCTION MyFunc(RealArray)
REAL, DIMENSION(100000), INTENT(INOUT) :: RealArray
The trouble is, this gives a compiler error under IF9, because MyFunc
can exceed the dimensions of ShortData. I tried to get around this by
creating a new array, LongData2, and copying ShortData to its initial
elements. This works, but now the large array LongData2 must be created
on the stack each time MySub is called, and MySub is called millions of
Even in a fully-optimized release version, allocation of temporary
space on the stack is done one page at a time, and that turns out to be
about 10 x the execution time of everything else!
My first question is, have the compiler designers already considered
this problem, and used a more efficient way of allocating stack space?
If it is reserved and committed, the process shouldn't need to check
every page, and that probably blows the cache, too.
If not, may I suggest you could save a lot of instructions by
allocating stack with a few instructions: a compare to the end of known
good space, followed by a move to esp.