Fun with weak symbols
Consider the following foo.c
source file:
extern int bar() __attribute__((weak));
int foo() {
return bar();
}
And the following bar.c
source file:
int bar() {
return 42;
}
Compile both sources:
$ gcc -o foo.o -c foo.c -fPIC
$ gcc -o bar.o -c bar.c -fPIC
In the resulting object for foo.c
, we have an undefined symbol reference to bar
. That symbol is marked weak.
In the resulting object for bar.c
, the bar
symbol is defined and not weak.
What we expect from linking both objects is that the weak reference is fulfilled by the existence of the bar
symbol in the second object, and that in the resulting binary, the foo
function calls bar
.
$ ld -shared -o test1.so foo.o bar.o
And indeed, this is what happens.
$ objdump -T test1.so | grep "\(foo\|bar\)"
0000000000000260 g DF .text 0000000000000007 foo
0000000000000270 g DF .text 0000000000000006 bar
What do you think happens if the bar.o
object file is embedded in a static library?
$ ar cr libbar.a bar.o
$ ld -shared -o test2.so foo.o libbar.a
$ objdump -T test2.so | grep "\(foo\|bar\)"
0000000000000260 g DF .text 0000000000000007 foo
0000000000000000 w D *UND* 0000000000000000 bar
The bar
function is now undefined and there is a weak reference for the symbol. Calling foo
will crash at runtime.
This is apparently a feature of the linker. If anyone knows why, I would be interested to hear about it. Good to know, though.
2012-02-23 10:46:50+0900
You can leave a response, or trackback from your own site.
2012-02-23 11:48:04+0900
Did you ask on the binutils mailing list? People like Ian Lance Taylor should know.
2012-02-23 12:01:32+0900
This is just a wild guess.
The linker probably only resolves non-weak symbols by extracting archive members. Having extracted those archive members, it can probably use those members to resolve weak symbols as well.
For example, *printf probably has a weak reference to the floating-point string conversion function. If you don’t use any floating-point in your code, you won’t trigger the extraction of the floating-point library, and *printf won’t be able to print floating-point numbers. But if you do use floating-point in your code, that will trigger the extraction of the appropriate object in the library, which will resolve the weak reference inside *printf.
2012-02-23 12:08:06+0900
Either “–whole-archive” or “-u bar” should help.
2012-02-23 12:47:19+0900
As Octoploid said those two options would help. The link editor (ld) will not see bar as totally undefined (as there is a weak reference for it, if you have weak references you’re _supposed_ to have a valid default value), so it won’t be used to decide that libbar.a is needed.
And since there is no other symbol in libbar.a that foo.o requires, it’ll discard the library as unused/unrequested.
2012-02-23 13:40:20+0900
What purpose do weak externs have, anyway?
I only know weak on functions to allow them to be easily overridden, cf. the pthreads-related functions (where _thread_sys_open is the syscall caller and open is weak calling it, and libpthread provides a non-weak open).
2012-02-23 14:18:51+0900
You can also use weak externs when you want backward compatibility with a given library without rebuilding the software.
For instance you might want to use a function if it’s available in the version of the library you’re running against, because it’s faster than doing the same thing “manually”, but you don’t want to require all your users to link against a newer version of that library, so you use a weak reference, and check against the function’s pointer being non-zero.
There are more tricks that relate to preloaded/interposed libraries (pthread as you noted falls in the latter category); most of the weakrefs tricks only concerns binary compatibility though, so they are used in prebuilt software (both proprietary and non; OpenOffice and Firefox are also using them as far as I can tell — LibreOffice is ususally not used prebuilt).
2012-02-23 21:39:03+0900
It’s just written in the ELF standard:
The link editor does not extract archive members to resolve undefined weak.
Btw,
[quote]
What we expect from linking both objects is that the weak reference is fulfilled by the existence of the bar symbol in the second object, and that in the resulting binary, the foo function calls bar.
$ ld -shared -o test1.so foo.o bar.o
And indeed, this is what happens.
[/quote]
This is not true. Since you are building a shared library, you will not know if the reference to “bar” in “foo” will use the definition in bar.c or not until dynamic loader resolving the reference when running, even “bar” in foo.c is a non-weak undefine.
2012-02-24 08:34:46+0900
Jie: well, that applies whether the symbol is weak or not.
2012-02-24 14:38:35+0900
@Diego: Pointers to an object are always not NULL in ISO C, and GCC is known to optimise your check for that away.
(Yes, ISO C is removed from reality, especially C1x…)
2012-02-24 22:04:18+0900
Uhm I’m pretty sure I have found code working with similar checks and they were not optimised away. But it’s probably a matter of telling the compiler what you want to do there.