Software Convention Models Using ELF Visibility Attributes

Submit New Article

Last Modified On :   October 17, 2008 9:21 AM PDT
Rate
 


Knud J. Kirkegaard, Compiler Development Engineer, Intel Corporation
Paul Winalski, Compiler Development Engineer, Intel Corporation
David C. Sehr, Compiler Architect, Intel Corporation


Introduction

Many applications developed for Linux* suffer performance degradation from a little-known and even less frequently used feature. That feature is symbol preemption, which is used by some developers of shared objects. Linux implements the Executable and Linking Format (ELF) object file format, which provides options to control the impact of symbol preemption. This paper describes the use of compiler options for the Intel® compiler and GCC* that enable full use of the ELF symbol visibility features. This common set of compiler directives and command-line options was developed in collaboration with the GCC team at Red Hat and will be implemented in GCC from version 3.5. Our experience with several large applications running on the Intel® Itanium® Processor Family has shown that very significant performance gains can be had with relatively little change to the customer application and build environment. What follows consists of four sections. The first discusses the class of applications that might encounter the overhead from preemption. The second describes the user model and options used to control preemption on a symbol-by-symbol basis. The third section presents an example and describes how performance is improved. The final section presents some conclusions.

Definitions

At run time, an application consists of one or more files that are mapped into a process’ address space by the runtime loader. Each distinct file is called a component of the application. There are two types of components. There is always one file that is the first one loaded for an application. This file is the main program component and there is always exactly one of them. Usually there are other components that are loaded with the application called shareable objects. As the name implies, a shareable object may be a component of more than one program. An example is libc.so, which is the shareable object version of the C run-time library.

A symbol is a name that represents a numeric value defined in an object file or a component file. Symbols typically represent the addresses of data items or routines. The linker (ld) is the program that builds components from object files produced by a compiler or by the assembler. One of the linker’s main jobs is to resolve symbolic references between the object files that comprise a component. References to symbols in other components are resolved at execution time by the runtime loader (ld.so).

One key feature of symbol resolution on Linux is symbol preemption. By default, all global symbols in a component are visible to all other components. When the runtime loader loads a component, if the new component defines a symbol that already exists in a previously-loaded component, the definition in the new component is overridden (preempted) by the existing definition. The runtime loader re-binds references to the symbol in the new component to refer to the existing definition. Thus, if the runtime loader is loading component x.so that defines a routine foo(), and a pre viously-loaded component of the application has already defined foo(), calls to foo() in x.so are modified to call the existing definition in foo(), not the one in x.so. Note that symbols defined in the main program of an application cannot be preempted, since the main program is always loaded first.

Because of symbol preemption, the final value of a global symbol might not be known until run time. This inhibits many useful code optimizations. Fortunately, as we shall see, there are several techniques available to avoid the performance penalties that symbol preemption imposes.

The global offset table (GOT) is a data structure that contains a list of addresses of symbols in a component. All references to symbols that are preemptable must be made indirectly, by first loading the symbol’s address from the GOT. This allows the runtime loader to preempt all of the component’s references to a symbol simply by changing the value in the symbol’s GOT entry.


Analysis to Identify Performance Opportunities

In general, applications that frequently call non-preemptable global functions or reference non-preemptable global static data are the most heavily impacted by preemption overheads. To diagnose whether your application is incurring overheads due to preemption, one should begin by examining the hottest functions in the application, as determined by gprof or a similar execution time profiler. The hottest functions typically contain the largest opportunities for improvement. Once the hottest functions have been identified, one should inspect them to determine whether they perform large numbers of calls to functions that should have been marked as non-preemptable, or they perform a large number of direct references to global data items that should have been marked non-preemptable. A direct reference is a reference without any *'s in C or C++, and a global data item is a data item declared outside any function without using the "static" keyword.

A direct reference to a preemptable global data item requires 2 levels of indirection to access the data value. First the offset into the global offset table must be determined, then the pointer to the global object can be loaded from the global offset table and finally the object value itself is loaded. In addition to the code size, the 2 levels of indirection also increase the pressure on the data caches. By creating assembly listings for hot functions, using gcc or the Intel compiler with the –S option, one can get an idea on the number of references through the global offset table by looking for the ltoff relocations. Through the use of the software models described in the following section it may be possible to reduce both code size and data cache behavior.

Position independent code is a requirement for a symbol to be preemptable, however once it has been determined that a symbol will not be preempted it may also be possible to use position dependent code to reference the symbol if it will be linked into the main executable. It is of course not possible to use position dependent code for any symbol in an object that will be used in a shared object. To analyze the effect of marking symbols as non-preemptable and possibly referenced by position dependent one should notice the static number of ltoff references decrease in assembly and disassembly listings and the number of gprel and movl increase.

The overhead of calling a function that is preemptible is to save and restore the global domain pointer across the call and the linker must resolve the relocation such that preemption can occur. By marking a global function as non-preemptable we know it will be bound within the same global domain and therefore it is no longer necessary to save and restore the global domain pointer across the call site to the function. The linker will also benefit from knowledge of a global function that is not preemptable, as the linker may bind the symbol at link time instead of generating an import stub for the call site that has to be resolved at load time. On the other hand if it is known that a call to a global function will be to another shared object the software model options described makes it possible to mark the symbol as it will always be resolved from another component and the import stub can be inlined for better code locality.

The benefit of identifying symbols that don’t need be preemptable is not only that more efficient references are available for global data and less call overhead, it is also protection against accidentally preempting a symbol that wasn’t intended to be preempted in the first place. For example when building shared objects from several objects, it may be necessary for the linker to resolve references between the objects that make up the shared object, however the symbols need not be visible outside the shared object. In order for the linker to resolve symbol references between objects they must be global, but by making the symbols non-preemptable by, for example, setting the hidden ELF visibility attribute on the symbol will not only prevent the symbol can accidentally be preempted but also that any other component can link against the symbol. We would like to recommend that during development of new code the visibility attributes of global symbols are carefully considered and set explicitly on the symbol declaration in header files. This will enable better code generation as well as protection against accidental preemption.


User Models/Options

On Linux* the default application model and the generic ELF ABI require the compiler to generate position independent code and, on Intel Itanium®, to ensure that symbol preemption is allowed for global symbols. Relatively few applications take advantage of position independence or symbol preemption, and support for these features can cause significant run-time overhead. Therefore several performance opportunities are available to applications that don't require position independent code or the default symbol preemption model as described in the generic ELF ABI. We have worked on developing safe software convention models for Linux together with the Linux community and other industry leaders. The models rely on the ELF visibility attributes and the user providing input to tell the compiler that position independent code is not required for the main executable.

By default, global variables must be addressed indirectly, through the global offset table (GOT). This introduces an extra level of indirection to load a global scalar variable. In the example below there is a load of a 4 byte object named data:

add r3 = @ltoff(data),gp
ld8 r2 = [r3]
ld4 r8 = [r2]

 

The add instruction obtains the address of data’s GOT entry using its offset from the component’s global pointer (gp) address. The linker, which is responsible for allocating address slots in the GOT, fills in this offset in response to the ltoff (literal offset) relocation referencing the symbol data. The ld8 instruction fetches data’s address from the GOT, and the ld4 instruction actually loads data’s value.

The Intel Itanium ABI states that small data items such as data must be allocated in a special short data (.sdata) section that is directly addressable as an offset from gp. However, the compiler cannot take advantage of this, as that would violate symbol preemption rules. The object referred to by the symbol data may change at load time as a result of symbol preemption, and may end up at a location that cannot be directly addressed from the component’s gp value. However, using the ELF visibility attributes we can tell the compiler and linker that a symbol cannot be preempted and therefore can be bound at link time, thus allowing the use of gp-relative addressing for the symbol. The default ELF visibility is default, meaning that the symbol can be preempted. If the ELF visibility of the global symbol is made protected or hidden, the compiler can assume that the symbol will be bound at link time. Item data can now be accessed as an offset from the component’s gp:

add r2 = @gprel(data),gp
ld4 r8 = [r2]

 

If the object is compiled for the main executable component, and therefore will not be used in a dynamic shared library, it is possible to tell the compiler that the object being compiler does not require position independent code. The access generated for global symbols with ELF visibility other than default can now use an absolute address.

movl r2 = ltconst(data)
ld4 r8 = [r2]

 

Another benefit of using the non-default ELF visibility attributes is that calls to function symbols that are protected or hidden do not need to save and restore the gp value. Since the call will be bound at link time and cannot be preempted, the callee’s gp value is guaranteed to be the same as the caller’s.

Preemption Models

With some of the optimization opportunities in mind we can now enumerate the software or preemption models that have been defined and are supported on Linux.

  • Full Preemption
    • This is the default application model as defined by the ELF ABI
    • Specified on the compiler command line by use of the option –fpic
    • Requires position independent code
    • Global symbols with default visibility may be replaced at load time by an already loaded symbol with the same name (symbol preemption)
    • This model is required for shared objects but may be used for the main executable as well
  • Module Preemption
    • The default model on Linux (no command line option required)
    • Defined global symbols are treated as if they have visibility protected or hidden and may be bound at link time
    • Requires position independent code
    • This model may be used for either component, however, defined symbols that are considered bound at link time must have the appropriate visibility attribute set, either protected or hidden.
  • No Preemption
    • All symbols can be bound at link time
    • This model exists for both position independent code and position dependent code.
    • This model can only be used for shared objects when position independent code is used.

The table below shows how to achieve the different models for gcc and the Intel® C++ Compiler:

Model / Compiler Full Preemption (PIC only) Module Preemption (PIC only) No Preemption (PIC) No Preemption (No PIC)
gcc -fpic Default -fvisibility=protected
or
-fvisibility=hidden
n/a
Intel C++ Compiler -fpic Default -fvisibility=protected
or
-fvisibility=hidden
-fminshared
and one of
-fvisibility=protected
or
-fvisibility=hidden

 

We can also mark symbols that can be guaranteed to be resolved to a different component and therefore will require a function import stub. Since it is known at compile time that the stub will be necessary, we can make the compiler generate the stub inline (rather than have the linker generate it) and improve code locality. A new symbol attribute extern has been added to tell the compiler to inline the import stub.

It is possible to set the visibility attribute for individual symbols by using the gcc attribute syntax for the declaration of a global symbol.

int data __attribute__((__visibility__("hidden")));
void func() __attribute__ ((visibility("protected")));

 

or

int __attribute__((__visibility__("hidden"))) data;
void __attribute__ ((visibility("protected"))) func();

 

The software convention models described have been proposed and discussed with the Linux community and other industry leaders to achieve widespread acceptance. The intent is to maintain binary compatibility by using existing ELF ABI symbol visibility attributes as well as compile time options that can change the default software convention used. The intent is that the linker will report errors if an attempt is made to link objects that have incompatible visibility attributes on symbols. Any problems we encountered during development of the software models with the linker not detecting violations in the use of visibility attributes should be fixed by ld version 2.14.90.0.3.


Examples

Three code examples are provided.

The first example illustrates symbol preemption. Invoke it with the command:

The key points to notice here are that although both the main program and the shareable object define and call routine dr1 (main_program.c and sharable_object.c), due to symbol preemption all calls invoke the version in the main program. The main program’s version of dr1 (main_program.c) has preempted the one in the shareable object. Similarly, both components define data item d1, but the value seen is always the one from the main program. However, the two components maintain their own respective values for routine pr1 and data item p1 because these are declared with protected visibility. See the comments in the source code for more details.

The second example illustrates some of the optimization opportunities enabled by declaring items with non-default visibility (vis_opt.c). Invoke it with the command:

sh -v build_vis.sh

 

This script produces assembly file build_vis.s, which you can examine to see the differences in the generated code with different visibility attributes. The example illustrates use of gp-relative addressing for non-preemptable short data items, inlining of non-preemptable routines, avoiding unnecessary saves and restores of gp when calling routines known to be in the same component as the caller, and inline generation of import stubs for routine calls known to be cross-component. See the comments in the source code for more details.

The third example illustrates how to use -fminshared and other command line options to change symbol visibility (prot1.txt, prot2.txt, user_models.c and def.txt), and thus the generated code. Invoke it with the command:

sh -v build_user_models.sh

 

The example produces a series of assembly files. See the comments in the source code and in the script for more details.


Conclusion

Many applications developed for Linux suffer performance degradation from symbol preemption, which is used by some developers of shared objects. We have described the problem, characterized the sort of applications most affected by it, and proposed the use of compiler options for the Intel compiler and GCC that reduce these performance problems by exploiting ELF symbol visibility features. Lastly, we have presented a pair of examples that illustrate the problem and how by hand examination a developer can observe the improvements from the compiler options.

 





Additional Resources

 


About the Authors

David C. Sehr is Compiler Architect in the Intel Compiler Laboratory. He obtained his B.S. in Physics and Mathematics from Butler University, and his M.S. and Ph.D. in Comput er Science from the University of Illinois at Urbana-Champaign. He has been a member of the compiler team at Intel since 1992, working to develop loop transformations, interprocedural optimizations, and profile-guided optimizations. He has also managed the scalar optimizer and code generator teams of the compiler for the Itanium Processor Family. He is currently the lead of the Compiler Architecture and Advanced Development team in the Intel Compiler Laboratory.

Paul Winalski is a Senior Software Engineer at Intel’s Nashua Software Laboratory, where he is a member of the scalar optimizer team for Intel’s compilers. Before joining Intel, Paul worked for over twenty years at Digital Equipment Corporation and Compaq Computer Corporation developing software tools and compilers. He was the team leader responsible for object file generation for Digital’s GEM compiler back end. Paul has an A. B. with honors in Biology from the College of the Holy Cross, and studied Computer Science at Worcester Polytechnic Institute.

Knud J. Kirkegaard is a Principal Engineer in the Intel Compiler Laboratory. He has a M.S. in Information and Control Systems from the University of Aalborg, Denmark. Before joining the Intel Compiler Lab in 1991 he worked on Ada compilers for 5 years at DDC International A/S, and 3 years for EDP Consulting on Office Automation. Since he joined Intel he has worked on scalar optimizations, interprocedural optimizations, and profile guided optimizations for IA-32, EM64T, and Itanium processor families.

 


Appendix: ELF Implementation of Visibility

ELF Implementation of Visibility

Linux uses the Executable and Linking Format (ELF) for both object files and loadable components (executable main programs and shareable objects). The symbol table section of an ELF file contains the symbolic names that the linker uses to bind object files together into a component and that the run-time loader uses to bind the components of an executing program. An ELF symbol table entry has two attributes that together implement symbol visibility:

Binding. This attribute describes the symbol’s scope with respect to other files, and affects how the symbol is handled when the linker combines object files to produce a component. There are three possible bindings:

  • STB_LOCAL. The symbol is not visible outside the file in which it is defined. Local symbols with the same name may exist in multiple files with out interfering with each other.
  • STB_GLOBAL. At link time, the symbol is visible to all files being combined. One file’s definition of a global symbol satisfies another file’s reference to the same global symbol. Except in the case of common symbols, it is an error for there to be multiple global definitions of the same name. The Linker will extract archive members to resolve undefined global symbols. It is an error if there is no matching definition for a globally-referenced name.
  • STB_WEAK. A weak symbol behaves as does a global symbol, with a few differences. If there are both a weak and a global definition of a name, the global definition takes precedence and any weak definitions are ignored. The Linker will not extract archive members to resolve undefined weak symbols. It is not an error to have unresolved weak references; unresolved weak symbols have the value zero.

 

Visibility. This attribute describes the symbol’s scope with respect to other run-time components, and affects how the symbol is handled when the run-time loader maps a component into process address space. There are four possible visibility values:

  • STV_DEFAULT. The symbol’s binding dictates its visibility. Symbols with local binding are hidden (visible only within their defining component; see below). Symbols with global or weak binding are visible outside their defining component. Furthermore, they are preemptable. When the run-time loader loads a component, a preemptable symbol that is already defined in a previously-loaded component is ignored and the existing definition is used (i.e., the existing definition preempts the definition in the newly-loaded component).
  • STV_PROTECTED. The symbol’s definition in the current component is visible to other components, but it is not preemptable—its definition is not overridden by an existing value from another component. Note that symbols with local binding cannot have protected visibility.
  • STV_HIDDEN. The symbol’s definition is not visible outside its own component. This of course means that the definition cannot be preempted.
  • STV_INTERNAL. This is a stronger, more restrictive version of STV_HIDDEN. The ELF specification defines the semantics of internal visibility as processor-specific, with the proviso that generic tools can safely treat internal symbols as hidden. Intel’s compilers currently do not implement any special meaning to internal visibility, beyond that of STV_HIDDEN.

 

When the Linker combines object modules to produce an executable image or shareable object, the visibility attribute that it assigns to a symbol is the most restrictive one that it encountered. For example, if foo is defined with STV_PROTECTED visibility in one compilation unit but referenced with STV_HIDDEN in another compilation unit, the visibility of foo in the final component that the Linker produces will be STV_HIDDEN.

An undefined reference with non-default visibility is an indication that the compilation unit in question was produced under the assumption that all components into which the unit might be linked will have at least that restrictive a visibility for the symbol in question. The compiler may have performed optimizations that depend on that visibility. Non-default visibility on an undefined reference allows the Linker to catch cases where different compilation units in an image have made conflicting assumptions about a symbol’s visibility.

Note that there is no way in ELF to express the concept of EXTERNAL visibility (i.e., the symbol definitely will not be defined in the current component, but will come from elsewhere). “EXTERNAL visibility” is merely information provided to the compiler to aid in optimization (such as inlining of the import stub for a cross-component routine call).