Very slow compilation

Very slow compilation

In some cases compilation is unreasonably slow.
For exmaple, simple compilation of the file attached takes about 650 times more than required by gcc:

> time icc ibm2.c

real 1m3.930s
user 1m2.690s
sys 0m0.320s

> time gcc ibm2.c

real 0m0.091s
user 0m0.050s
sys 0m0.040s

This is the latest (11.1.072) version of icc for Linux (64 bit)

> icc -V
Intel C Intel 64 Compiler Professional for applications running on Intel 64, Version 11.1 Build 20100414 Package ID: l_cproc_p_11.1.072
Copyright (C) 1985-2010 Intel Corporation. All rights reserved.
FOR NON-COMMERCIAL USE ONLY

AttachmentSize
Download ibm2.c909 bytes
14 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.
Dale Schouten (Intel)'s picture

Well, sure enough, I'm seeing the same problem. Thanks for the self contained test case, this is exactly the kind of problems we hope to find (and solve, cross my fingers :-). I will definitely look into it and update here.

Thanks!
Dale

mecej4's picture

I can reproduce the problem with 11.1.069 (x64) on Suse 11.1. However, with -O0 or -O1 compile time drops to less than 1 second.

That's a fair point, when the comparison is against gcc -O0. Even if you set the nearest equivalent gcc options to the icc default, and use a current gcc which implements auto-vectorization, something like
gcc -O3 -ffast-math -fno-cx-limited-range -fno-strict-aliasing -funroll-loops --param max-unroll-times=2
gcc isn't going to attempt all the optimizations which OP has requested under icc.
icc -O1 would be roughly equivalent to gcc -O2 -ffast-math -fno-cx-limited-range -fno-strict-aliasing

It looks like the original code is not intended for any practical purpose, only to see whether a compiler can be provoked into attempting optimization beyond the bounds of sanity. You can't fault any compiler for taking longer with aggressive options set than gcc takes with optimizations disabled.
We have had continual arguments over whether it should be necessary, but the fact remains that large parts of some commercial applications have to be built with icc cut back to -O1, where gcc options -O3 or -ffast-math would never be considered.

mecej4's picture

Tim, I don't know if you remember: I had made a similar complaint in comp.lang.fortran regarding IFort, and requested feedback concerning a switch that would say to the compiler, "do a great job of optimization, I know you are very capable, but don't kill yourself at it!" You shot the suggestion down, and it was harder for me to make a case since the source files in question were several 100 Kbytes long and were covered by a non-disclosure agreement.

In the present case, as well, I don't think that the user, who did not write down any compiler switches explicitly, was conscious about all the optimizations that 'he had requested'. Had he known what they were, and what the likely cost was going to be, he might have looked for switches to generate within 95% of the best optimization possible, or something similar.

However, the fact that the C example is less than 1 kbyte long and took over 1 minute to compile puts new light on the issue. Perhaps, if the back ends of the C and Fortran compilers have some common portions, we shall all benefit when the developers work at it.

I think there are conflicting priorities here which will never be resolved.
What will give 95% is highly application dependent.
gcc users are expected to know that they should set switches when they want optimized code.
icc marketers want the compiler to show to best advantage when used by the people who write magazine articles based on trivial benchmarks with default compiler options, and when running SPEC baseline benchmarks. Those goals conflict to some extent with useability, and gcc is in a better position, as it clearly puts those goals at a lower priority than useability.
Another conflict is posed by the desire to have consistent defaults between linux and Windows. It is felt strongly that the icc default must be competitive with the VC default, and include auto-vectorization, since there is no equivalent VC option. The inexplicable part is the decision to make /fp:fast inconsistent with VC.
At one time, there was an effort to persuade Intel compiler to adopt a simple option which would be consistent with typical gcc options like -O2. This failed. The picture has changed now that gcc includes auto-vectorization in -O3. I'd like to see more coverage in the documentation about icc options which are equivalent to options normally used in the reference compiler, but that has become more difficult with the evolution of those compilers to include more optimization.
On the Fortran side, Steve Lionel has agreed that some of the standard compliance options should be used in normal practice, and he has made an effort to consolidate them.

Maybe, Intel compilershould introduce some new set of optimization options (say --gnu-O0 ... --gnu-O3) which somehow correspond to analogous gcc optimizations

Thanks, -- Victor
jimdempseyatthecove's picture

Quoting Victor Pasko (Intel)Maybe, Intel compilershould introduce some new set of optimization options (say --gnu-O0 ... --gnu-O3) which somehow correspond to analogous gcc optimizations

Sounds good on the surface, however...

Considering that the incentive for the compiler vendor is produce

Our -On produces faster code than their -On

Then the incentive would be for Vendor's A interpretation of Vendor's B equivilent options to be a selection of options that produces lower performing code.

The comparable options would be best selected from an un-biased (neutral) party.

Please note, the interpretation of -On is

-O0 is no optimization
-O1 is more effort than -O0
-O2 is more effort than -O1
-O3 is more effort than -O2
-Oother is additional features beyond -O3

And the vendor is more or less free to include which optimization features go into what level.

IMHO - it is the users responsibility to select from the available options a set that trades off compile time against performance returned (and quality of results).

It is likely that on rare occasions a given sample code may cause a compiler to choke or produce erronious results. Last week I encountered a problem where g++ could not handle a simple template that both icc and msvc had no problems with.

Jim Dempsey

www.quickthreadprogramming.com
dpeterc's picture

Quoting jimdempseyatthecove
Please note, the interpretation of -On is

-O0 is no optimization
-O1 is more effort than -O0
-O2 is more effort than -O1
-O3 is more effort than -O2
-Oother is additional features beyond -O3

And the vendor is more or less free to include which optimization features go into what level.

IMHO - it is the users responsibility to select from the available options a set that trades off compile time against performance returned (and quality of results).

My problem with Intel's choice of group of optimizations for O0, O1, O2, O3 is not with the effort but with the result.
I mostly get best results with -O1, while higher optimizations take more time to compile and produce code which is both slower and bigger. Or much bigger for negligent speedup.
Like with every toy or gadget, when the initial experimentation phase is over, one sticks with the options which worked best so far, and can't continuosly flip flop n-dimensional space of compiler options in search of a jackpot.
So I wish more time would be devoted to make sure that the optimization of a higer level is actually better, not just a particularly smart way of doing something which sometimes works and sometimes not; it is the programmer's responsability to check.
In comparison, GCC very rarely behaves worse with higher optimization.

jimdempseyatthecove's picture

>>I mostly get best results with -O1, while higher optimizations take more time to compile and produce code which is both slower and bigger. Or much bigger for negligent speedup.

This indicates your programs tend to perform better without loop unrolling.A small percentage of applications behave this way.

Try -Os

Jim

www.quickthreadprogramming.com

Intel compilers usually optimize for loop trip counts of about 100 when no clear information on that subject is present in the source code. Turning off vectorization as well as unrolling, as -O1 has done since the Intel 11.0 compilers, is likely to improve performance of short loops, as well as speed up compilation. In principle, profile guided feedback should induce the compiler to optimize for trip counts in the training data set, at least when they don't vary much (but certainly doesn't address compilation speed).
As you apparently leave unrolling off when you use gcc, Jim's suggestion may be on target. gcc actually makes it more difficult to get a useful level of unrolling for loop trip counts of 20 or more.
-Os is intended to reduce generated code size at the expense of performance in comparison with -O1. It's possible it may perform relatively well when loop trip counts are 0 to 1.

Though icc takes more time, the sample testcase run 4 times faster when compiled with Intel compiler.

$ time icc tstcase.cpp

real 0m33.722s

user 0m33.647s

sys 0m0.053s

$ time ./a.out

count = 991

real 0m0.005s

user 0m0.003s

sys 0m0.001s

$ time g++ tstcase.cpp

real 0m0.091s

user 0m0.061s

sys 0m0.029s

$ time ./a.out

count = 991

real 0m0.019s

user 0m0.017s

sys 0m0.003s

$ uname -a

Linux maya11 2.6.29.4-167.fc11.x86_64 #1 SMP Wed May 27 17:27:08 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux

$ icc -V

Intel C Intel 64 Compiler Professional for applications running on Intel 64, Version 11.1 Build 20100203 Package ID: l_cproc_p_11.1.069

Copyright (C) 1985-2010 Intel Corporation. All rights reserved.

I have reported the issue to the Intel compiler development team. I will update this forum thread when there is update on this.

jimdempseyatthecove's picture

Run-times on the order of 3ms is too short for meaningful results data.
Rework the main() so it takes an input arg for use as major loop count.
Based on your run times Iwould expect an interaton count of 1000 would produce more meaningful results.

Jim

www.quickthreadprogramming.com

Login to leave a comment.