ICC vs GCC vs LLVM/Clang

ICC vs GCC vs LLVM/Clang

The "conventional wisdom" was that icc was best by large margin (both as code size and speed), gcc most widespread and multiplatform, and Clang immature, but promising. Something along those lines:

http://www.hortont.com/blog/icc-and-mandelbrot/

But I have recently tested those compilers on my project (about 120k lines of C) on OpenSUSE 12.2, and things have changed radically. GCC 4.7.1 is on pair with icc 12.1.5, while Clang is approximately 25% slower. But Clang has excellent compile errors and warnings, and its static analysis is just superb. So some projects are switching from gcc to clang as default compiler.

Has anyone recently (in 2012) done any serious benchmarking of these compilers? Or can you share the benchmark tests of your production code? Which compile options give you best results? What is your justification for using icc, now that free compilers have improved so much?

26 Beiträge / 0 neu
Letzter Beitrag
Nähere Informationen zur Compiler-Optimierung finden Sie in unserem Optimierungshinweis.

icc has become more dependent on pragmas to keep ahead of gcc. Evidently, we know nothing of your code so as to suggest which pragmas, or to suggest equivalent compile command line parameters.
With use of pragmas, in my experience icc can vectorize more loops effectively than gcc, and maintain optimization of loops which don't vectorize effectively. There are significant changes in pragma usage for 13.0, not yet fully documented.
Typical roughly equivalent (aggressive) command lines:
gcc -std=c99 -O2 -ftree-vectorize -unroll-loops --param max-unroll-times=4 -march=native -ffast-math
icc -std=c99 -ansi-alias -O3 -complex-limited-range -xHost

I can't answer your question about whether adding pragmas so as to get full performance from icc is "justified." or whether you can "justify" use of icc when you don't make that effort, unless you are lucky enough to have an application which comes out ahead with no effort.

Hi everybody,

...
But I have recently tested those compilers on my project (about 120k lines of C) on OpenSUSE 12.2, and things have changed radically. GCC 4.7.1 is on pair with icc 12.1.5,
while Clang is approximately 25% slower. But Clang has excellent compile errors and warnings, and its static analysis is just superb. So some projects are switching from gcc to
clang as default compiler.
...

[SergeyK] I didn't have a chance to work with Clang C++ compiler. However, a Warning Level 5 '/W5' of Intel C++ compiler is awesome. When I turned it on for a middle size C/C++ project a couple of hundreds issues in the source codes were detected ( to be honest I was simply overwhelmed and it took a couple of weeks to go through all of them ).

...
Has anyone recently (in 2012) done any serious benchmarking of these compilers? Or can you share the benchmark tests of your production code? Which compile options give you best results?
...

[SergeyK] During last a couple of months we had a couple of short discussions on different forums, like TBB or Software Optimization, about benchmarking and I'd like to repeat that it depends on many factors, like:

- project or algorithm
- if some threading is used
- optimization options
- data type ( single- or double-precision )
- CPU ( SSE / SSEx.x / AVX )
- etc

For example, I tested a MergeSort algorithm and Intel C++ compiler v12.x gave the best results when all optimizations were disabled (!). In a more complex case, with a Strassen Heap-Based algorithm for matrix multiplication, a ~12-year old Borland C++ v5.x outperformed (!) modern C++ compilers ( Intel / MS / MinGW ) with one thread and a single-precision data type, but it "lost a battle" for a double-precision data type.

I have a task of testing a Linpack 100x100 Benchmark in C/C++ for PC on my list for a long time and I hope that I'll be able to allocate some time and complete it.

Best regards,
Sergey

Helo Tim,
I do not use pragmas, and I am not sure whether I want to in the first place.
I prefer having a source which is not adaped to a particular compiler, for me, guessing best compiler switches for each compiler is hard enough.
Maybe if my code would have a single hot spot, some matrix multiplication or one well defined algorithm bottleneck, it would be worth the effort.
I am sure pragrams are valid tool in some situations, but not for me.
Just yesterday I discovered the -flto switch in gcc 4.7.1. Link time optimization is roughly similar to -ipo in Intel's compiler.
This gave me the extra 5% speedup, and gcc was marginally faster in the end.

Hi Sergey,
Thank you for your comments. Please post your benchmarks, once you find times to make them, or links to relevant sites which contain solid benchmarks, in your opinion.

Best regards,

Dušan

Quote:

dpeterc wrote:

Hi Sergey,
Thank you for your comments. Please post your benchmarks, once you find times to make them, or links to relevant sites which contain solid benchmarks, in your opinion.

Best regards,

Dušan

Hi, I'll do it as soon as I receive a new system with a 64-bit Windows...

Hi,

I'm using GCC 4.8.1 compiler and Intel ICC 13 compiler. I'm using Ubuntu Server 12.04 64 bit on Intel Sandy Bridge core i7-3930K. I found that with these command line g++ -O3 -march=core-i7 -mtune=core-i7 -mavx and icpc -O3 -xAVX, GCC produces superior performance w.r.t. to ICC about 15%.

I'm amazing about the results, probably I'm forgetting something optimization flag for Intel compiler. Otherwise it's great advance of GCC.

Best Regards

icpc 14.0 (recently completed beta, release expected in a few weeks) gains optimizations for * __restrict (no longer depending on #pragma ivdep) so as to match performance of g++.

You do need -ansi-alias option (equivalent to g++ default -fstrict-aliasing) for icpc to be competitive.  That would not become a default for linux within the next year (and is not under consideration for Windows default).  Except for that, the more aggressive default optimizations of icpc are considered a sales point.  By using the (somewhat complicated) options for unrolling in g++ you could gain advantages over icpc for additional cases.

>>... I found that with these command line g++ -O3 -march=core-i7 -mtune=core-i7 -mavx and icpc -O3 -xAVX, GCC produces
>>superior performance w.r.t. to ICC about 15%...

I simply would like to note that with the following set of command line options:

/O3
/Ob1
/Oi
/Ot
/Oy
/GF
/MT
/GS-
/fp:fast=2
/Zi
/Gd
/Qfp-speculation:fast
/Qopt-matmul
/Qstd=c++0x
/Qrestrict
/QxAVX
/Qunroll:4
/Qopt-block-factor:64
/Qopt-streaming-stores:auto

Intel C++ compiler outperforms MinGW ( GCC-like for Windows ), Microsoft and Borland C++ compilers for more than 50%.

However, our statements are very fuzzy because we do not provide any complete test cases to reproduce results by somebody else and these comparisons will never end until both compilers exist.

Sergey, your option set is more complicated than g++ ! 

gcc/g++/gfortran equivalent of /fp:fast=2 is -ffast-math.   These options imply /complex-limited-range (-fcx-limited-range) and have some limitations on handling of division/sqrt.  They can break many applications.

gfortran equivalent of /Qopt-matmul is -fexternal-blas (can use MKL, ACML, libblas et al.)

gcc/g++/gfortran equivalent of /Qunroll:4 is -funroll-loops --param max-unroll-times=4

As pointed out earlier, Intel compilers under VS GUI default to /Qipo, which is equivalent to gnu -flto.  This could invalidate benchmarks such as the one I quote below.

I mention these as it's entirely possible to use consistent options to compare the compilers.  You appear to agree with me on the fallacy of comparing with the simplest possible (inconsistent) option settings.

I extended the benchmarks at https://sites.google.com/site/tprincesite/levine-callahan-dongarra-vectors to include Intel(r) Cilk(tm) Plus (not that I expect anyone to be much impressed).

Most cases perform essentially the same with most of the compilers tested.  Claiming some score such as some compiler is 10% better on geometric mean basis isn't very meaningful.  Excluding non-vectorizable cases boosts the relative score of Intel compilers.

>>...Claiming some score such as some compiler is 10% better on geometric mean basis isn't very meaningful...

I agree with that and this is exactly what I wanted to demonstrate:

nedo n made a claim that GCC outperformed ICC by 15% and did Not provide a set of command line options used to test performance of Intel C++ compiler.

Sergey Kostrov made a claim that ICC outperformed MinGW, MSC and BCC by 50% and did Not provide a set of command line options used to test performance of MinGW, MSC and BCC C++ compilers.

Also, both of us did Not provide any details about test cases or algorithms used to evaluate performance of all these C++ compilers.

Is there any sense in our statements? I do not think so.

Hi,

I provided a minimal options for both compiler g++ -O3 -march=core-i7 -mtune=core-i7 -mavx and icpc -O3 -xAVX. I wonder to know if Intel compiler is less performing with these minimal options.

Thank you

Quote:

nedo n. wrote:

I provided a minimal options for both compiler g++ -O3 -march=core-i7 -mtune=core-i7 -mavx and icpc -O3 -xAVX. I wonder to know if Intel compiler is less performing with these minimal options.

The absence of unrolling options or one which permits vectorized sum reduction will frequently pose a significant handicap for g++, as will the absence of -ansi-alias for icpc.  These evidently would affect different cases.

For g++, the option -march=native could be used to replace all the core-i7 and avx options.  There's probably no need to duplicate options under -mtune which are given already in -march.

>>I provided a minimal options for both...

This is still Not a right way of evaluating performance because both C++ compilers have different sets of Default options and in case of GCC compiler these Default options could be more aggressive (!) in terms of optimizations.

Hi Sergey,

please can you provide a set of best options for both compiler so I can do a fair comparison?

Best Regards

You do need -ansi-alias option (equivalent to g++ default -fstrict-aliasing) for icpc to be competitive.  That would not become a default for linux within the next year (and is not under consideration for Windows default).  Except for that, the more aggressive default optimizations of icpc are considered a sales point.  By using the (somewhat complicated) options for unrolling in g++ you could gain advantages over icpc for additional cases.

http://www.printingperiod.com

>>... can you provide a set of best options for both compiler so I can do a fair comparison?

I've already posted a set of compiler options for Intel C++ compiler and it is impossible to match these command line options to GCC compiler. I use the following set of command line options for GCC-like compilers ( for Release ):

-O3 or -O2
-m32 or -m64
-m[ instruction set ]
-ffast-math
-Wuninitialized
-fomit-frame-pointer
-DNDEBUG
-o
-Xlinker --stack=268435456

It is always a challenge when comparison of performance of C++ compilers needs to be done because all of them have unique features. There are always performance differences and, as I've mentioned that many times in the past, if it is less then 5% it could be neglected. However, a greater number is always a concern.

PS: It is almost like what car is better? You know that there are too many things which need to be taken into account...

I also would like to follow up on the following statement:

>>...But Clang has excellent compile errors and warnings...

In case of Intel C++ compiler use /W5 and /Wcheck options and if they never were used on a project ( for example ~100K C/C++ code lines and more ) a developer could be overwhelmed with the number of hundreds and hundreds ( if not thousands ) of warning and diagnostic messages.

My ICC version on Linux is 12.1.5, and -W5 does not work, documentation only mentions -Wremarks, but it does nothing on my code.

I also have ICC on OSX, version 11.1, and same compile options as on Linux produce a lot of useful remarks, just like you mention for -W5. I have spent a week cleaning my code ;-(

It is strange that slightly different version of same compiler produces very different level of warnings and remarks.

Anyway I must say that icc, while being powerful, also requires a lot of "babysitting", with each new release, you must study the options, default options for O1, O2, O3 change. I could run gcc, upgrading Linux with new versions, for years without knowing that much about compiler options. On icc, if you want good results, you need to study and try the options. It is especially difficult since the number of options is very high, some cause compiler to fail or take a very long time to compile, and make very big executable. And doing run time tests is difficult, since the particular set of compile optimizations may benetif your version of CPU, but not the one your customer is using.

If you're looking for inconsistencies among version of gcc, they're not difficult to find.   People still use versions where defaults differ from current gcc.

-fprotect-parens was a default for one major gcc version, regardless of whether -ffast-math was set, but afterwards only gfortran stayed that way.

-finline-functions used to be in effect by default only for -O3, and it used to inline only functions which appeared earlier in the file.

-ftree-vectorize was not always implied by -O3.

-fstrict-aliasing used not to be a default, but has been one for years now.  icc will not make such a change for another year at the earliest, and then it would differ between linux and Windows.

People still get hung up over gcc -m32 defaulting to i486, so the change to icc defaulting to -msse2 some time ago made sense.

>>...On icc, if you want good results, you need to study and try the options. It is especially difficult since the number of
>>options is very high...

Please take a look at a post:

Forum Topic: Evolution of Intel C++ compiler options - v7.1 -> v8.1 -> v12.0 -> v13.0
Web-link: http://software.intel.com/en-us/forums/topic/456342

and I agree that lots of time needs to be spent on learning and testing in order to achieve as better as possible results.

Tim,

I agree gcc also changes the options from release to release, it is normal evolution. But in my opinion icc is more drastic in that respect.

Sergey, thansk for the list, really helpful for someone who uses several compilers. Maybe for Mingw, mention actual version of gcc compiler. If you have time, please add clang.

>>...Maybe for Mingw, mention actual version of gcc compiler...

It is 3.4.2 and upgrade to a newer version is scheduled.

Please also verify how options /Wport and /Qeffc++ work.
...
/Wport issue portability diagnostics
/Qeffc++ enable effective C++ diagnostic warnings
...

>>...Please also verify how options /Wport and /Qeffc++ work...

Only 8 issues are detected when I used /Qeffc++ Intel C++ compiler option. Here is a consolidated list of warnings:

Warning 2012 - Effective C++ Item 1 prefer const and inline to #define
Warning 2013 - Effective C++ Item 2 prefer iostream to stdio
Warning 2014 - Effective C++ Item 3 prefer new and delete to malloc and free
Warning 2015 - Effective C++ Item 4 prefer C++ style comments
Warning 2017 - Effective C++ Item 6 missing delete of member pointer member "..." in destructor
Warning 2021 - Effective C++ Item 11 declare a copy constructor and assignment operator for "..."
Warning 2022 - Effective C++ Item 12 field member "..." not initialized (preferable to assignment in constructors)
Warning 2027 - Effective C++ Item 15 make sure operator = returns a *this

Hi,

I have just bought the latest Intel compiler 14 and one of xeon phi devices http://ark.intel.com/it/products/71992/Intel-Xeon-Phi-Coprocessor-5110P-8GB-1_053-GHz-60-core

I would like to know what are the best optimization options since I tried to compile my code and latest gcc 4.8.2 is still faster than intel 14.

I read this guide about building a code for xeon phi http://software.intel.com/en-us/articles/building-a-native-application-f.... So it seems that with a flag -mmic I can run natively my code on xeon phi. Are there some tips to take into account?

Thank you

Are we to assume that your code which works well with gcc is suitable also for Intel(r) Xeon Phi(tm)?  Then presumably it is vectorizable without difficulty and thread parallel, and the optimization reports should be meaningful, so you can check whether you have specified unrolling effectively (e.g. -unroll2 for Haswell, if that is what you use for gcc).

If you use -fcx-limited-range (implied by gcc -ffast-math) you should use icc -complex-limited-range.

It is possible for automatic memcpy substitutions to be slower than gcc code.  Such substitutions should be reported in opt-report.

If gcc is giving full performance, it may be tough for icc to come out ahead.

Under -mmic, if you use -fp-model source to gain accuracy, you must try to set -ftz -no-prec-div -no-prec-sqrt to retain some performance.  KMP_PLACE_THREADS environment variable is important, along with OMP_PROC_BIND or equivalent settings.  I assume you're not talking about MIC when you say gcc performs well.

Kommentar hinterlassen

Bitte anmelden, um einen Kommentar hinzuzufügen. Sie sind noch nicht Mitglied? Jetzt teilnehmen