long compile times

long compile times

I have found out that when I compile my program with bounds checking on, it takes 8 minutes to compile on a 933Mhz machine. However, when I turn off bounds checking, the program compiles in about 2 minutes. Why so much more compile time to put in bounds checking? The Lahey and Salford compilers are able to put in bounds checking with minimal impact on compile times. Anyone else have this problem?

20 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

This is not a common situation. Usually, there is no detectable difference in compilation speed. Please send us an example at vf-support@compaq.com and we'll be glad to investigate.

Steve

Steve - Intel Developer Support

I was going to post about this same thing, but I just realized that we
are talking about the exact same piece of code. I can confim that what
he says is true, even with the 6.5a update installed. To throw a bit
more information into this, the old Microsoft Powerstation compiler can
compile this code in about 8 seconds. On an old version of the code,
both compilers get it done quickly. One of the big differences in the new
code is there are a lot of modules, and allocatable arrays. The program
is also mostly one large main program, so perhaps this combination of
factors plays some part it in.

I've been experimenting with his code, and it definitely seems to be
related to his use of lots of allocatable arrays.

Did either of you send us an example? If not, please do so.

Steve

Steve - Intel Developer Support

I think it has been sent in, but I'll send it again, just in case.

Thanks - we appreciate it.

Steve

Steve - Intel Developer Support

I have also got excessively long compile times - three source files took 2.5 minutes to compile under Watcom Fortran, but now takes over 20 minutes with full optimisation on Compaq Fortran. Without optimisation the code takes 6 minutes to compile, which is still very long.

I am using Compac Visual Fortran Version 6.5 on a 933MHz Pentium III processor with 384 bytes memory running windows NT. The code is automatically generated (thus not very efficient) and including some comments is 60,000 lines long.

We have seen some problems with automatically-generated code that consists of very long sequences of IF statements. This is due to a particular algorithm in the compiler which works fine in "normal" programs, but is slow for this particular, unusual usage. At this time, we don't have plans to rework this part of the compiler - it would be a several month effort with minimal gain overall.

Note - this comment does not apply to coletm or jparsly's code, which is something else entirely and is still under investigation.

Steve

Steve - Intel Developer Support

A partial solution to this problem seems to be to add more memory.
I went from 128M to 576M and eliminated a lot of disk thrashing, which
made significant improvements to my compile times, although not as
much as I would like to see.

I was going to guess that you were experiencing pagefile thrashing. Your source is quite large and complex. I looked at the times in the various compiler phases and didn't see anything that looked out of place. However, the program compiled in just about 10 minutes, with optimization, on my system, so I am not sure what's going on with yours!

You can't really compare to Watcom - that's a much smaller compiler with a much simpler language, and it doesn't do a lot of optimization.

Steve

Steve - Intel Developer Support

The compile time in the release configuration doesn't really bother me too much.
With the added memory it's down to 13 minutes and I can live with that.

It's the fact that it takes 9 minutes to compile in the normal debug configuration
and 5 minutes to compile with array bounds checking turned off. I got used
to relative fast compiles (30 seconds or so) with the MSFPS compiler and
with the Compaq compiler on the older version of the code.
coletm tells me that the Salford compiler compiles the code in less than 10
seconds.

Somehow adding modules and allocatable arrays to the program has
slowed down the compile with compaq, but not with Salford and MSFPS,
and I wonder why. What is the compiler doing , even with optimizations
turned off, that makes it so slow?

Steve,

There is still something out of whack with the long compile times. I have written a conversion utility that converts all input files from a previous version of the code we are talking about to a new version. It's < 2000 lines of code of which > 1000 lines are just I/O statements or simple assignment statements. There is very little logic and very few computations taking place (it just reads in the old version files and does a little bit of rearranging of the inputs). There are also only a few allocatable arrays compared to the code James is talking about. However, it takes about 30 seconds to compile on my 433 Mhz machine, which is comparable to what compile times would have been on a VAX 11/750 with 1Mb of memory back in the mid 80's.

FYI, the earlier version of the code before converting to F90 took < 5 minutes to compile on the same VAX machine. Computer speeds are now orders of magnitude faster than the VAX. As James stated, the F77 version of the code using CVF 6.5 takes < 1 minute to compile, but essentially the same code converted to F90 takes > 5 minutes to compile. The times are all relative depending upon the speed of the computer, but it seems strange that going from F77 to F90 causes such a huge increase in compilation times.

Also, the Salford compiler compiles the code in < 10 seconds and there is no discernable difference with array bounds checking on, which is what prompted my question a long time ago about why it takes so much longer to compile with array bounds checking on (I'm still waiting on some comment as to why this is occurring). However, as I stated before, the final code produced by CVF with optimizations on runs at least half again as fast as the Salford generated code, so CVF is very successful at generating code that runs fast for a release configuration.

However, it would be nice if the CVF folks could put some effort into speeding up the compile times for the debug configuration. During program development, I don't care how fast the program runs - I just want it to give me a quick turnaround time during compilation so that I can get more debugging work done. It's clear from the Salford (and Lahey) compile times that this can be accomplished more efficiently than in the CVF compiler.

Also, FYI, the Absoft and PlusFort compilers won't compile the program at all.

I would be glad to furnish all the codes if CVF would like to investigate further. I have a feeling that it wouldn't take much effort to greatly reduce the compile times during debugging, but I could be and am probably wrong.

Tom

We'll be glad to take a look. We don't have an explicit goal that debug compilations are faster - often, optimizations reduce the amount of intermediate representation that needs to be processed. But we have seen some examples that lead to our recognizing that a particular compiler algorithm needs improvement. Send us the sources (in a ZIP file) and we'll see what we can figure out.

Steve

Steve - Intel Developer Support

Steve,

I'll send them later on today - thanks for taking a look. For me at least, any speedup of compilation times in the debug configuration would be a lifesaver.

Tom

Tom

The fact that it compiles with some compilers and not with others makes me think
that perhaps the program is close to some arbitrary limit in symbol table or code size for these compilers. The compilers that fail maybe have 'hard' limits that can't be exceeded, and the Compaq compiler either has bigger limits or no such limits at all.

If the Compaq compiler is really clever it has dynamic limits and makes more
space available as needed. If so, one thing to watch out for is whether it
adds space in fairly large chunks, so as to not have to reallocate a bunch of
times.

Similarly, the thrashing problem that I got rid of by adding more memory is,at its
root, caused not just by having a lot of memory allocated, but by accessing
it in such a way that it has to span across it repeatedly. Sometimes this sort
of thing can be as easy to fix as just changing the order that loop indices
run in, so that memory is accessed in order.

I'd also have to agree with Tom, that execution speed in the debug
configuration is not nearly as important as compile time. I actually spend most
of my time programming in QB and VB, and I'm used to the instant gratification
that they provide.

Jim,

The problem with the other compilers (and somewhat with CVF) is that they take up a huge chunk of virtual memory which causes thrashing (the Absoft compiler was using over 3GB and climbing ater 12 hours of compilation). After a few hours of compilation time with the FortPlus compiler, I also called a halt to it as it was thrashing about all over the place. I have had similar experiences with the generic compilers that came standard on SGI and HP machines depending upon the optimizations. I have never had problems with the older DEC or Cray compilers.

I think the problem (Steve would know for certain) is that the code is comprised of a large main code (> 6000 lines) with a large number of variables. When the compiler starts building tables during the compilation process, the memory requried to keep track of all the information becomes overwhelming, particularly for any sort of global optimization. By adding additional memory, you now have enough for the tables to be built without ever having to resort to virtual memory.

I suppose I could have broken the main code into numerous subroutines, but my programming philosophy is that a subroutine is nothing but a GO TO with a whole lot of baggage associated with it (hence all the new stuff in F90/F95 to make subroutines easier to write correctly) wher a code with numerous subroutines ends up being the same as spaghetti code if they are used indescriminately. I want code to be easily understandable by someone other than myself - it is not easy to discern the logic flow in a program that has 500 subroutines. I only turn code into a subroutine if:

1) code is repeated several times in the main code (or is very similar)

2) It is peripheral to understanding the overall logic flow in the code

3) The code section will be replaced in the future with other code (such as different transport/turbulence schemes for water quality models).

A good analogy is the way computer program documentation has evolved. I used to sit down and read a manual from front to back cover. The best documentation had a logical progression to it and information that was not necessary for getting the "big picture" of the software was relegated to appendices. Unfortunately (for me at least) documentation now consists of a bunch of hyperlinked text that is great for finding immediately related information, but is nearly impossible to read with the idea of getting the big picture.

Unfortunately, I have a feeling compiler writers don't write compilers for this type of programming style. Apparently, I've become that old dog that I used to complain about in my younger graduate school days.

Tom

Tom - it's not necessarily bad to use a lot of memory, if you can manage to
confine yourself to use it in discrete chunks. My current worst case compile
uses over 700M of memory, but doesn't cause any thrashing because it
doesn't use it all at once. Somewhere between 128 and 576M of physical
memory I got enough to avoid the thrashing. My hope, based on nothing
more than speculation, is that there may be some simple changes that could
be made in the order of memory references eliminate the thrashing without
having the extra memory. I only installed extra memory on my own computer
so far, but there's 5 or 6 others here that are going to need it if a fix isn't
found for the compiler.

For anyone interested, Polyhedron Software's web site has a comparison of a number of different FORTRAN compilers for the PC and it's worth taking a look at. Their results for compilation times echo my experiences with my code using CVF, FTN95, LF95, Absoft, and NAS compilers.

CVF produces the fastest execution times followed closedly behind by LF95 with FTN95 coming in a somewhat distant fourth. Absoft and NAS are nothing to write home about in any category comparison.

However, when it comes to compilation times, CVF is almost dead last. For Polyhedron's sample program, FTN95 compiled it in 6.7 seconds while CVF took 31 seconds. For my program, the difference is a whole lot more. CVF takes almost 10 minutes to compile it while FTN95 takes less than 10 seconds with no difference in compilation time whether or not array bounds checking is on or off.

They also have a very interesting comparison of each compiler's ability to ferret out programming errors, both statically and dynamically, and their results again echo mine. FTN95 gives the most comprehensive set of error checking, followed by LF95, with CVF falling in the middle of the pack.

This reinforcces a conclusion I reached a long time ago that it is worthwhile to invest in several compilers when producing standard conforming complex scientific code (hence the reason why I have so many compilers). Different compilers are better at certain things than others. I just wish CVF would put more effort into detecting programming errors and decreasing their compile times. What the hey, one can always wish.

Another interesting thing in the comparisons is that the Intel compiler basically falls behind the competition (or is in the middle of the pack). I wonder how this will work out in the future when Intel takes over CVF?

Tom

ABOUT COMPILATION TIME

As it can be seen by the answers to this question, there exist many strategies for the development of an application. The decision will affect the debugging procedure and the time of compilation.

When the programmer makes a mistake, discovers it and makes the correction, he needs to compilate the program again. The compilator should discover and compilate again only the section that was modified. Unfortunately this is not done yet automatically, but it can be approximated by the programmer.

I have found that developing the applications by procedures that can be made of functions and subroutines gathered in Modules help a lot in decreasing the time of compilation because it is necessary to compilate only the ?procedure? that is being debugged and when it is done it is not necessary to compilate it again, unless the file is ?corrupted?.

This is done by using the ?Build? selection in the main Menu and then the ?Compile My_Module.f90?. When the Module is clean and compiled without errors, the executable is made with the ?Build My_application.exe? in the ?Build? selection of the main Menu, this selection only compilates the modules that have been modified since the last time that was used. Later on, it is not necessary to compilate this module again, unless it is modified.

Once the Module is done, the programmer can add it to another done module or let it as an independent one. It is made a part of the application by using the ?USE My_Module? statement in the main program.

This strategy gives good results if the main program or function is not compilated many times.

Oscar Piedrahita

Leave a Comment

Please sign in to add a comment. Not a member? Join today