Option -Bsymbolic can cause dangerous side effects

-Bsymbolic (defined at the following location: http://sourceware.org/binutils/docs-2.16/ld/Options.html#Options)†  may seem to solve many problems. Unfortunately, –Bsymbolic is a dangerous option which can often result in some nonintuitive side effects.

Normally on Linux* (that is, when not using –Bsymbolic), the first instance of any object loaded is the one used throughout the program – whether defined in the static executable part or in one of the shared objects. This is achieved by symbol preemption. The dynamic loader builds a symbol table and all dynamic symbols are resolved against this. So normally if a symbol instance appears in a shared library (DSO) but was already defined by the static executable or a previous loaded DSO, that previous definition will also be used by the current DSO.

–Bsymbolic changes this behavior by turning off symbol preemption in the DSO to which it was applied. As a result, that DSO will always use its own instance. This can cause unintended and dangerous behavior.

Sometimes of course, this is exactly the behavior you want as your DSO has a well defined interface and anything that is not part of the interface is internal to the DSO. It is usually to achieve such encapsulation that developers flirt with –Bsymbolic in the first place. Below we show a more controlled way to achieve this that avoids the problems of –Bsymbolic.

The problem with –Bsymbolic is unintended use of multiple instances of some object.

Having multiple instances of an object immediately makes a C++ or Fortran program nonconforming. The ANSI C++ Standard (ISO/IEC 14882:1998) section 3.2 says:

“Every program shall contain exactly one definition of every noninline function or object that is used in that program; no diagnostic required. The definition can appear explicitly in the program, it can be found in the standard or a user defined library, or (when appropriate) it is implicitly defined (see 12.1, 12.4 and 12.8). An inline function shall be defined in every translation unit in which it is used.”

The Fortran 2003 Language standard (sect. 16.1) says:

"Program units, common blocks, external procedures, procedure binding labels, and variables that have the BIND attribute are global entities of a program. The name of a program unit, common block, or external procedure is a global identifier and shall not be the same as the name of any other such global entity in the same program, except that an intrinsic module may have the same name as another program unit, common block, or external procedure in the same program. A binding label of a global entity of the program is a global identifier and shall not be the same as the binding label of any other global entity of the program; nor shall it be the same as a name of any other global entity of the program that is not an intrinsic module, ignoring differences in case. A global entity of the program shall not be identified by more than one binding label."

So as soon as you turn on –Bsymbolic for C++ or Fortran you might expect trouble, but as shown in Case 2 below, some of the trouble comes from unexpected sources so even if you have carefully programmed with this in mind, you can still end up in trouble. Also, as we show in Case 1 below, you can get into trouble even in plain C unless you have been careful to design your code to work with -Bsymbolic.

Case 1: A variable definition gets duplicated between the static executable and the DSO.

Eg:
$ cat main.c
int MyGlobalVar;

extern void init();

int main() {
init();
printf("MyGlobalVar = %d\n ",MyGlobalVar);
}

$ cat so.c
int MyGlobalVar;

void init() {
MyGlobalVar=1;
}

Compiled without –Bsymbolic, the MyGlobalVar referenced by the printf in main prints a 1.

$ icc -shared -Wl,-soname,so.so.1 -o so.so.1 so.c -lc
$ icc main.c so.so.1 -Wl,-rpath, /home/user01/bsymbolic/globalvar_example
$ ./a.out
MyGlobalVar = 1

Compiled with –Bsymbolic, the printf in main prints a 0. Even though main calls the init routine, the variable initialized was not the one in main.

$ icc -shared -Wl,-soname,so.so.1 -o so.so.1 so.c -lc -Bsymbolic
$ icc main.c so.so.1 -Wl,-rpath,/home/user01/bsymbolic/globalvar_example
-Bsymbolic
$ ./a.out
MyGlobalVar = 0


Case 2: Exception Handling

Because type equivalence is performed by pointers on "type tables", and because you can end up with a separate "type table" in both your main program and your DSO. In this case, catching an exception in your main program that is thrown in the DSO (or vice versa) will not work.

Eg:

$ cat main.cpp
class X {
int I;
public:
X() : I(1) {}
};
extern void thrower();

int main() {
try {
thrower();
}
catch(X x) {}
return 0;
}

$ cat so.cpp
class X {
int I;
public:
X() : I(1) {}
};

void thrower() {
throw X();
}

When compiled without -Bsymbolic, the main program catches the exception.

$ icpc -shared -Wl,-soname,so.so.1 -o so.so.1 -lc so.cpp
$ icpc main.cpp so.so.1 -Wl,-rpath,/home/user01/bsymbolic/except_example
./a.out
$

When compiled with -Bsymbolic, the main program does not catch the exception.

$ icpc -shared -Wl,-soname,so.so.1 -o so.so.1 -lc so.cpp -Bsymbolic
$ icpc main.cpp so.so.1 -Wl,-rpath,/home/user01/bsymbolic/except_example
-Bsymbolic
$ ./a.out
Aborted

Case 3: Dynamic Casts

The trouble with "type tables" will also show up in dynamic casts. If the DSO tries to dynamic cast a pointer to an object created in the static executable, because the "type table" pointers will be different, the cast will fail.

Case 4: VTables

There are potential issues with vtables since these will also be duplicated. However, in this case your program probably has to be n on-C++ conforming to start with. Cases do exist where the main program and the DSO each have their own global delete and new which would not work when applying -Bsymbolic, but this is clearly not a conforming program.

Case 5: Duplicated I/O Buffers

Your program likely won’t fail in this case since initialization happens “behind your back” in C++ and will happen in both the DSO and the static executable. But if both are writing to stdout, there is a good chance the output will become garbled as each flushes its buffers independently.

A Safer Alternative to -Bsymbolic

Using linker scripts to force non-public symbols to be local is preferred to -Bsymbolic. What this does is force the developer of a .so to explicitly define/identify every single object/function where you want the DSO's version to be used. Additionally, if used correctly, it will guarantee that other developers won’t use unintended interfaces.

To do this, create a script file (for this example, we'll call it version_script) that looks like the following:

VERS_1.1 { local: symbol_name; symbol_name;};

where symbol_name is the name of the symbol. * can be used as a wild card to specify multiple symbols.

You then make the symbols local by adding “-Xlinker --version-script -Xlinker version_script” to the link line when building the DSO (not when linking the DSO in to the executable or another DSO).

Example Follows:

$ icc my_func.c -o libmy_lib.so -i_static -shared -Xlinker --version-script -Xlinker version_script
$ icc main.c -L. -lmy_lib -o main -i_static
$ cat version_script
VERS_1.1 {local: foo*;};

For more info on linker scripts, go to http://sourceware.org/binutils/docs-2.18/ld/index.html.† 

This link will take you off of the Intel Web site. Intel does not control the content of the destination Web Site.

For more complete information about compiler optimizations, see our Optimization Notice.

Comments

's picture

Your analysis is heavily biased.
1) It would be fair to also mention the disadvantages of version scripts.
2) You make unfair use of the C/C++/Fortran standards

1) Disadvantages of version scripts

1.a) It creates a heavy maintenance burden. This version script is a new genuine input file
in the build that has to be maintained in sync with the content of the C/C++ sources.
To generate it, one needs the power of the C/C++ parser (meaning the compiler should generate
it in fact but it doesn't). To avoid generating it, one needs to rely on patterns, like your
foo*, which means new rules for the code developers, rules that just wait to be violated.
The problem is made worse by the fact that you need to list the library
private stuff. It would have been more logical to list the public API of the lib and hide all
the rest by default (the public API is usually fewer symbols and less frequent changes).

1.b) Version scripts reduce portability of the code base. They are not supported
by many Unix-inspired embedded platforms. They are not supported by the Intel
compiler suite on Microsoft Windows platforms. Using version scripts requires you to find
an alternative on all other platforms where you ship .so/.dll. By contrast, -B symbolic
is supported on more Unix-like and it also mimic (to some extent) .dll linking.

2) Unfair use of the standard of programming languages

2.a) Programs having non-static file-scope variables or constants with the same name
in several source files is wrong. They shouldn't even link OK, with or without
libraries involved (including dynamic libraries). So your case 1 example, with
main.c and so.c and MyGlobalVar, is a non-conformant program! And any good static
code checker will signal it as such. You should mention this, as you do for your case 4
Note that classes defined twice fall under similar non-conformance
(because implementation of classes usually involves non-static data).
So your case 2 is bogus as well. It shouldn't even link and if it does link in one or
several particular combination of options, it's a hack by nature
(and different hacks behave differently at run-time, no surprise).

2.b) Writing C/C++ code for dynamic libraries cannot be based on the C or C++
language standards. As shocking as it may be, these standards do not cover
at all the making of one executable in several link steps. Therefore, the dynamic
libraries (.so, .dso, .dll, etc.), however popular they may be, they are all just
non-standard extensions from the language standard point of view.
It is idiosyncratic to pull off the standard book to mandate
one or another way to implement out-of-standard extensions.

Despite this bias that you have, I appreciate your post. I does contain very
useful examples of bad C and C++ code and clear explanation of what may go wrong
with it. Thank you.