GCC x86 Code Size Optimizations

By Evgeny V Stupachenko, Published: 01/17/2013, Last Updated: 01/17/2013

      The time when programmers did their best to minimize an application code size have gone. The root cause is in significantly increased memory and hard drive sizes on PCs during past several years. The only exceptions are programmers writing code for embedded systems. They usually have tasks like: “You need to develop the particular algorithm. Your program should fit N bytes and use not more than N bytes of memory”. Today phones and tablets move programmers “back to the future”?

      This article should help GCC programmers in minimizing an application code size. Please note that all data in the article collected using x86 GCC compiler (version 4.7.2) at Fedora 17 for the target architecture Intel® Atom™.

      What is default GCC behavior?

      By default GCC produces dynamically executed binaries. Since static binaries are much bigger, that’s a great advantage for a world-wide GCC compiled applications code size. The amount of this advantage strongly depends on how many libraries were used.

      The most frequently used GCC option, when someone thinks about a code size, is “-Os”. Below is the table with code size geometric means for the set of applications common for phone and tablets.

      The results in the table are relative to “-Os”. Smaller is better (has less code size). “-m32, -mfpmath=sse, -march=atom” are assumed turned on.

-O2 6%
-O2 -flto -5%
-Ofast 11,5%
-Ofast -flto 3%
-Ofast -funroll-loops 19%
-Ofast -funroll-loops -flto 10,5%

      “-Ofast” (or “-O3”) and “-funroll-loops” are obviously increase code size. “-flto” which makes inline more aggressive should increase code size as well. Why then it shows opposite results?

      Redundant functions deleting is the root cause. Functions could become redundant because they are unused in current application configuration or because they are fully inlined. “-ffunction-sections -Wl,--gc-sections” is an alternative way to delete redundant functions. The technique should obviously help when you are using internal static libraries.

      Your application is still too big? Let’s try some other ways to minimize its code size. By default GCC enables “-fasynchronous-unwind-tables” resulting in an extended EH (exception handling) section even when compiles applications written on “C”. This, for sure, makes a debug process easier, but on the other side adds some kilograms (sorry, kilobytes) to an application weight. “-fno-asynchronous-unwind-tables” added to compilation options shrinks an apllication.

      What else can GCC do to decrease a code size? “-Wl,--strip-all” will force linker to remove all symbolic information. This will make a debug process very complicated, but not impossible. If you do care about a release code size – “strip all”!

      Below is the summary table indicating effect of adding:

    1. “-ffunction-sections -Wl,--gc-sections” (garbage collect)
    2. “-ffunction-sections -Wl,--gc-sections -fno-asynchronous-unwind-tables” (+ no unwind)
    3. “-ffunction-sections -Wl,--gc-sections -fno-asynchronous-unwind-tables -Wl,--strip-all” (+ strip all)

      to different optimization options.

      The results in the table are the same set of applications (as in the previous table) geometric means ratio of corresponding option set and default “-Os”. Smaller is better (have less code size). “-m32, -mfpmath=sse, -march=atom” are assumed turned on.

  default + garbage collect + no unwind + strip all
-Os - -5% -10,5% -22,5%
-O2 6% 0,5% -3,5% -13,5%
-O2 -flto -5% -5% -8% -17%
-Ofast 11,5% 6% 2% -6,5%
-Ofast -flto 3% 2,5% 0,5% -6,5%
-Ofast -funroll-loops 19% 12,5% 9,5% 3%
-Ofast -funroll-loops -flto 10,5% 10% 8,5% 2,5%

      Below is the short summary list of the GCC compiler options used. You can find full options list and descriptions at http://gcc.gnu.org/onlinedocs/gcc-4.7.2/gcc/Optimize-Options.html

  • "-Ofast" same as "-O3 -ffast-math" enables high level optimizations and aggressive optimizations on arithmetic calculations (like floating point reassociation)
  • "-flto" enable link time optimizations
  • "-m32" switch to 32 bits mode
  • "-mfpmath=sse" enables use of XMM registers in floating point instructions (instead of stack in x87 mode)
  • "-funroll-loops" enables loop unrolling
  • "-ffunction-sections" place each function or data item into its own section in the output file
  • "-Os" optimize for size
  • "-fno-asynchronous-unwind-tables" make unwind table generated precise at call boundaries only

      Below is the short summary list of linker options used. You can find full options list and descriptions at http://sourceware.org/binutils/docs/ld/Options.html

  • “--gc-sections” Enable garbage collection of unused input sections
  • “--strip-all” Omit all symbol information from the output file

Product and Performance Information


Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804