Limit Performance Impact of Global Symbols on Linux

Limit Performance Impact of Global Symbols on Linux

Global symbols on Linux are by default preemptible meaning that if a global symbol appears in a shared object it can be preempted by the same symbol name loaded earlier either in the main executable or in shared objects loaded before the current shared object.

When using the compiler option –fpic for generating object files for shared objects the compiler will use the full preemption model which means that any global symbol, data or function, can be preempted if loaded in a shared object. Therefore the compiler must generate the most expensive relocation that allows for preemption and symbol resolution to occur at load time. That means that for IA-32 architecture like the Intel® Atom™ Processor global data a GOT (Global Offset Table) entry is needed (R_386_GOTPC) vs. just a relative offset to the current GOT (R_386_GOTOFF). For functions it means that procedure linkage table (PLT) is needed (R_386_PLT32) vs. just a PC relative relocation (R_386_PC32).

The Intel® C/C++ Compiler for Linux supports a number of ways to improve performance using ELF (Executable and Linkable Format) symbol visibility attributes. By using the correct symbol visibility, it is possibly to not only to improve the code generated, but also to limit the number of symbols in the dynamic symbol table to improve load time resolution of symbols.

Suggestion for Dealing With Global Symbols and Shared Objects

Typically when building a share library there are a number of entry points that need to be exported and available to other components using the shared library. There will also be several functions that are for internal use only that don’t need to be available to other components. On Windows this issue is handle by explicitly requiring that functions exported from dynamic library (DLL Dynamic Linked Library) must declared with __declspec(dllexport) and imported by using __declspec(dllimport). On Linux the default behavior is to export all global symbols as they, very appropriately, have visibility “default” by default. It is, however possible change that. A few ways available through source code and/or compiler options are listed below.

1. Explicitly mark functions that are to be exported with the visibility attribute “default” in the source code and then compile the module using -fvisibility=hidden which makes all other defined symbols have visibility “hidden” meaning that referenced can be resolved when the shared object is linked and hidden symbols will not end up in the dynamic symbol table.

2. It is of course also possible to explicitly mark all routines that are defined internal to the shared object as having visibility “hidden” and then compile with normal options. Using the first suggested model also allows some macro magic to define __declspec(dllexport) on Windows and __attribute__ ((visibility ("default"))) on Linux

3. The desired affect can also be achieved without source changes. The compiler allows you to provide a text file with symbols and then an option to indicate what visibility the symbols in the file should have. For example if all exported symbols are added to a text file then adding the options:

-fvisibility-default=symbol_text_file -fvisibility=hidden

This option will apply visibility “default” to all the symbols listed in the symbol text file and visibility “hidden” to all other defined symbols.

Example:

static int global_static    = 1;


int global_export __attribute__ ((visibility ("default"))) = 2;

int global_dso_only  = 3;


static int func_static(int a)

{

   return a + 1;

}


extern int func_export1(int a) __attribute__ ((visibility ("default")));

extern int func_export1(int a)

{

   return a + 2;

}


extern int func_dso_only(int a)

{

   return a + 3;

}


extern int func_export2(void) __attribute__ ((visibility ("default")));

extern int func_export2(void)

{

   return func_static(global_static) +

          func_export1(global_export) +

          func_dso_only(global_dso_only);

}



Compiling the module for a shared object like

$ icc -xSSE3_ATOM -fpic -c dso1.c

$ readelf –s dso1.o

Symbol table '.symtab' contains 16 entries:

   Num:    Value  Size Type    Bind   Vis      Ndx Name

…

    12: 00000000     4 OBJECT  GLOBAL DEFAULT    4 global_export

    13: 00000040    16 FUNC    GLOBAL DEFAULT    3 func_export1

    14: 00000004     4 OBJECT  GLOBAL DEFAULT    4 global_dso_only

    15: 00000050    16 FUNC    GLOBAL DEFAULT    3 func_dso_only

…



Notice that all global symbols have visibility “default”. If we build a shared object from this object all symbols with default visibility end up in the dynamic symbol table:

$ icc -shared -o dso1.so dso1.o

$ readelf -s dso1.so

Symbol table '.dynsym' contains 14 entries:

   Num:    Value  Size Type    Bind   Vis      Ndx Name

     …

     3: 000016b0     4 OBJECT  GLOBAL DEFAULT   20 global_export

     4: 000004f0    16 FUNC    GLOBAL DEFAULT   10 func_export1

     6: 00000500    16 FUNC    GLOBAL DEFAULT   10 func_dso_only

     9: 000016b4     4 OBJECT  GLOBAL DEFAULT   20 global_dso_only

    11: 000004b0    64 FUNC    GLOBAL DEFAULT   10 func_export2



Looking at the relocations generated for the symbol references in dso1.o we can see that they are the most general and expensive relocations allowing symbol preemption to occur even if it is not desired by the programmer:

$ icc –shared –o dso1.so

$ readelf -r dso1.o


Relocation section '.rel.text' at offset 0x27d contains 6 entries:

 Offset     Info    Type            Sym.Value  Sym. Name

…

00000014  00000c03 R_386_GOT32       00000000   global_export

0000001c  00000d04 R_386_PLT32       00000040   func_export1

00000024  00000e03 R_386_GOT32       00000004   global_dso_only

00000030  00000f04 R_386_PLT32       00000050   func_dso_only


$ readelf –s dso1.so


Relocation section '.rel.dyn' at offset 0x318 contains 7 entries:

 Offset     Info    Type            Sym.Value  Sym. Name

…

00001684  00000306 R_386_GLOB_DAT    000016b0   global_export

00001688  00000906 R_386_GLOB_DAT    000016b4   global_dso_only

…

Relocation section '.rel.plt' at offset 0x350 contains 4 entries:

 Offset     Info    Type            Sym.Value  Sym. Name

…

000016a0  00000407 R_386_JUMP_SLOT   000004f0   func_export1

000016a4  00000607 R_386_JUMP_SLOT   00000500   func_dso_only

…



Now if we compile the source example using the visibility option –fvisibility-hidden we can see that the ELF visibility has been set to “hidden” for all defined symbols except the ones declared with attribute visibility “default”. The specification of visibility in the source using the attribute syntax will always override the command line option.

$ icc -xSSE3_ATOM -fpic -fvisibility=hidden -c dso1.c

$ readelf -s dso1.o


Symbol table '.symtab' contains 16 entries:

   Num:    Value  Size Type    Bind   Vis      Ndx Name

…     

     9: 00000000    16 FUNC    GLOBAL HIDDEN    3 func_dso_only

    13: 00000000     4 OBJECT  GLOBAL DEFAULT   4 global_export

    14: 00000050    16 FUNC    GLOBAL DEFAULT   3 func_export1

    15: 00000004     4 OBJECT  GLOBAL HIDDEN    4 global_dso_only



The symbols with visibility “hidden” will no longer appear in the shared objects dynamic table. Since most symbols in shared objects typically can have visibility “hidden” this saves load time of the shared object and also improves code for references to these symbols and allows for more optimization involving these symbols.

$ icc –shared –o dso1.so dso1.o

$ readelf -r dso1.o


Symbol table '.dynsym' contains 12 entries:

   Num:    Value  Size Type    Bind   Vis      Ndx Name

     …

     4: 00000490    16 FUNC    GLOBAL DEFAULT   10 func_export1

     9: 00000450    64 FUNC    GLOBAL DEFAULT   10 func_export2

     …



The relocations generated will also show the improvements due to use of ELF visibility options.

$ readelf -r dso1.o

Relocation section '.rel.text' at offset 0x27d contains 5 entries:

 Offset     Info    Type            Sym.Value  Sym. Name

…

00000022  00000d03 R_386_GOT32       00000000   global_export

0000002a  00000e04 R_386_PLT32       00000050   func_export1

00000030  00000f09 R_386_GOTOFF      00000004   global_dso_only


$ readelf –r dso1.so

Relocation section '.rel.dyn' at offset 0x2cc contains 6 entries:

 Offset     Info    Type            Sym.Value  Sym. Name

...

00001614  00000306 R_386_GLOB_DAT    00001638   global_export


Relocation section '.rel.plt' at offset 0x2fc contains 3 entries:

 Offset     Info    Type            Sym.Value  Sym. Name

…

0000162c  00000407 R_386_JUMP_SLOT   00000490   func_export1



As shown by a very simplified example performance improvements can be achieved through proper symbol management on Linux. In cases where there were thousands of entries removed form the shared object dynamic symbol table improvements up to 20 have been seen, but even smaller cases can see improvements of 5-7% and the Intel® Atom™ processor is affected more than Intel® 64 Processor due to in order execution. Also note that it may be an option to not use –fpic for IA-32 shared objects, however actual code will then be copied for each use and not shared by all applications loading the shared object. This is only possible for IA-32 shared objects for historic compatibility reasons.


Mike Chynoweth discusses how "short functions" and "zero length calls" can affect performance on Atom and how the issue gets worse with use of –fpic and support for symbol preemption.

An earlier white paper describes use of ELF visibility attributes for the Intel® Itanium processor family.

For more complete information about compiler optimizations, see our Optimization Notice.