“Why should I update GCC x86 compiler?” or “GCC compiler performance on Intel® Atom™ from version to version”

      I’ll try to figure out what is new for Intel® Atom™ architecture in new versions of GCC and how this affects performance and code size on the well-known EEMBC CoreMark*  benchmark: www.coremark.org

      The chart below shows CoreMark performance results for base and peak option sets on various GCC versions relative to GCC 4.4.6 base performance (higher is better):


base: “-O2 -ffast-math -mfpmath=sse -m32 -march=atom

base + if convertion: “-O2 -ffast-math -mfpmath=sse -ftree-loop-if-convert -m32 -march=atom

peak: “-Ofast -funroll-loops -mfpmath=sse -m32 -march=atom”, for 4.4 and 4.5 versions “-Ofast” is replaced with “-O3 -ffast-math

See: http://software.intel.com/en-us/blogs/2012/09/26/gcc-x86-performance-hints for performance option details. One of the peak performance option: “-flto” delivers no extra performance on CoreMark.

Here we can see that the base option set with “-ftree-loop-if-convert” reached peak performance on CoreMark.


      The chart below shows peak to base binary code size ratio on CoreMark for different GCC versions:

      The chart below shows binary code size increase on CoreMark for base option set relative to GCC 4.4.6 base option set:

-ffunction-sections -Wl,--gc-sections -fno-asynchronous-unwind-tables -Wl,--strip-all” was added to base and peak option sets to get numbers in the chart. These options do not affect performance on CoreMark.

See: http://software.intel.com/en-us/blogs/2013/01/17/x86-gcc-code-size-optimizations for details.

Here we can see that code size at peak option set is ~2 times larger than base and keeps growing, base option set code size is a little better than stable.

All measurements were made for the single thread run at Fedora 17 on Intel® Atom™ CPU D525, 1.80GHz, 4Gb memory, 2 cores.

      GCC showed very good progress from 4.4 to 4.8 version (mostly from 4.6 to 4.7 and from "if conversion" on base at 4.8 version). Code size on base option set is unchanged, on peak it keeps growing.

Below is a short summary of optimizations influence on CoreMark:

  • GCC 4.5 is the first version introducing "-march=atom" (see http://gcc.gnu.org/gcc-4.5/changes.html). GCC 4.4 represented here just for back reference and CoreMark for this version was built with “-march=i686 -mtune=generic -mssse3”. Major number of current Unix systems are using gcc-4.4+. Note that some gcc-4.4 builds may have “-march=atom” option backported from 4.5. For example, Android NDK gcc-4.4.
  • 4.6 version of GCC introduces much better inline algorithm and new opportunity to improve CoreMark performance: "-ftree-loop-if-convert" which is enabled by default at "-O3 (-Ofast)" and gives ~8% at “base” option set. Official changes: http://gcc.gnu.org/gcc-4.6/changes.html
  • At 4.7 version GCC “-march=atom” get LEA and IMUL tuning as well as other Atom™ architecture specific improvements. By IMUL tuning, I mean IMUL grouping as the architecture has to switch into a special mode to calculate IMUL (fixed in latest Atom™ processor Silvermont). LEA tuning is replacing LEA with moves and adds when LEA result goes to ALU (fixed in latest Atom™ processor Silvermont). Official changes: http://gcc.gnu.org/gcc-4.7/changes.html
  • 4.8 version of GCC improves bool optimizations resulting in less register pressure for some functions in CoreMark (affects only base option set with "-ftree-loop-if-convert” turned on). Also the 4.8 version introduces the ability to lower scheduler pressure: “-fschedule-insns -fsched-pressure” at high stability level on x86 (gives ~1% for CoreMark on peak option set). Generally “-fschedule-insns -fsched-pressure” add performance if "-funroll-loops" option is set. Official changes: http://gcc.gnu.org/gcc-4.8/changes.html

      What if GCC "-march=atom" was just “-march=i686 -mtune=generic -mssse3” at 4.8 version? Performance drop would be ~5%. "-ftree-loop-if-convert” would yield an additional 13% at "base". That’s another reason to switch to the newer version of GCC.

      So if you want to tune Atom™ application performance and care about code size, try GCC 4.8 with:

“-O2 -ffast-math -mfpmath=sse -ftree-loop-if-convet -fschedule-insns -fsched-pressure -m32 -march=atom”

      If code size is not critical, use GCC 4.8 and:

“-Ofast -flto -funroll-loops -mfpmath=sse -fschedule-insns -fsched-pressure -m32 -march=atom”