libgcc.a symbol visibility considered harmful

I recently got to rebuild an Android NDK with a fresh toolchain again, and hit an interesting problem. I actually had hit it before, but only this time I fully analyzed what's going on. [As a side note, if you build such a NDK, don't use mpfr 3.1.0, as there is a bug in the libtool it ships]

Linking an application or a library pulls many things, that aren't part of the code being built. One of these many things is the libgcc static library. Part of libgcc consists in an implementation of the platform ABI. On Android systems, this means the ARM EABI. GCC, when compiling some instructions, will generate ABI calls. For example, integer divisions may call __aeabi_idiv.

Consider the following minimized real world scenario:

$ echo "int foo(int a) { return 42 % a; }" > foo.c
$ arm-linux-androideabi-gcc -o libfoo.so -shared foo.c -mandroid

GCC will emit a call to __aeabi_idivmod for the % operation. With GCC 4.6.3, this function is in _divsi3.o under libgcc.a. That function itself calls __aeabi_idiv0, which lives in _dvmd_lnx.o under libgcc.a.

When statically linking, ld will thus include foo.o, _divsi3.o and _dvmd_lnx.o, meaning it will include all functions from these object files. That is, foo, __divsi3, __aeabi_idiv, __aeabi_idivmod, __aeabi_idiv0 and __aeabi_ldiv0. And more than being included, these functions are exported, because symbol visibility in libgcc.a is default. So while we expect exporting foo from our library, we're actually exporting much more, including functions that just happened to be near the ones that our code (indirectly) uses.

Now, let's say we want to build another library, using that foo function from libfoo:

$ cat > bar.c <<EOF
extern int foo(int a);
long long bar(long long a) { return foo(a) % a; }
EOF
$ arm-linux-androideabi-gcc -o libbar.so -shared bar.c -mandroid

(The code above has absolutely no meaning, it just triggers the same function calls as what I was getting in the actual real world case)

When statically linking the above code, GCC will generate a call to __aeabi_ldivmod, which calls __aeabi_ldiv0, and many other things, directly or indirectly. When linking as above, nothing particularly nasty is going to happen. However, linking as above is actually wrong: the resulting library has an undefined reference to the foo symbol, and doesn't depend on libfoo. At runtime, if libfoo wasn't already loaded somehow, loading libbar would fail.

The proper way to link is the following:

$ arm-linux-androideabi-gcc -o libbar.so -shared bar.c -mandroid -L. -lfoo

A feature of ELF static linking is that when it resolves undefined symbols, the linker will choose to use the first occurrence of a symbol it finds in the various objects and libraries given on its command line. So with the command line above, for each __aeabi_* symbol, it will first look in libfoo if there isn't one. And while __aeabi_ldivmod is not in libfoo, __aeabi_ldiv0 is (see above).

So instead of including the code for __aeabi_ldiv0 from libgcc.a, it will call the copy from libfoo.

This wouldn't be so much of a problem if __aeabi_ldiv0 wasn't a weak symbol.

Enters faulty.lib. In the real world case, libfoo is loaded by the system dynamic linker, and libbar by faulty.lib. When resolving symbols for libbar, faulty.lib has to resolve libfoo symbols with the system linker, using dlsym(). On Android, dlsym() returns NULL for weak (defined) symbols, so faulty.lib can't resolve __aeabi_ldiv0.

The real world case wasn't a problem with GCC 4.4.3 from the vanilla Android NDK because in that GCC version, __aeabi_ldivmod doesn't call __aeabi_ldiv0.

This wouldn't happen if shared libraries wouldn't expose random platform ABI specific bits depending on what they use and depending on other symbols that happen to be in the same object files.

A similar issue happened a little while ago on Debian powerpc because a shared library was exporting ABI specific bits. Even worse, the toolchain was assuming the symbols would come from libgcc.a and generated wrong relocations for these symbols.

Update: Interestingly, the __aeabi_* symbols are hidden, in libgcc.a as provided on the Debian armel port.

2012-03-06 17:19:34+0900

faulty.lib, p.d.o, p.m.o

You can leave a response, or trackback from your own site.

5 Responses to “libgcc.a symbol visibility considered harmful”

  1. Z.T. Says:

    1. Can you link libfoo with a linker script that exports only the symbols you wish to export?

    2. At least the android build system passes a flag to the linker to abort if it ends up with undefined symbols.

    3. Do I understand correctly that with faulty.lib, using a newer toolchain on android is safe (when statically linking)? Or is the difference between libgcc versions cause any other problems?

  2. glandium Says:

    1. It’s possible, but cumbersome. The converse is also possible, hiding the __aeabi_* symbols, but that ends up exporting other symbols that would not normally be exported.
    2. I guess it passes -Wl,-z,defs.
    3. As long as you don’t end up depending on symbols which are defined as weak in system libraries or any library loaded by the system linker.

  3. Jan Hudec Says:

    From what I understood so far, not exporting those symbols would be a violation of C and C++ standard, because those require that each symbol is only defined once in each process, which would not be the case if multiple shared libraries used those symbols, got it via static linking and didn’t export it. Windows don’t give a damn about standards and don’t export such symbols (only export symbols specifically marked for export), ELF-based systems export them.

    I would consider dlsym() to be the actual problem. If the symbol is defined anywhere, it should find it, weak or not. It could be worked around by using custom version in faulty.lib. It’s not very nice, but on the other hand it should be standard ELF, so the chance of breaking in future should be reasonably low.

  4. glandium Says:

    Jan: Except you still end up with multiple copies, because libc.so has one, libgcc.a has one, and when linking, ld will choose (and include) libgcc.a’s copies because libgcc comes before libc on ld’s command line.

    Arguably, they are called trough the PLT, so the extra copies wouldn’t be used at runtime, but I’d except -Bsymbolic to change that assumption.

    I agree that in this particular case, dlsym() is the real culprit, that I am trying to circumvent. However, the debian problem is a similar beast, and isn’t remotely related to dlsym.

  5. David Turner Says:

    This can be avoided by ensuring that libgcc appears in your linker command line between object/static library files and shared libraries, i.e. something like:

    arm-linux-androidebi-gcc -shared -o libbar.so bar.o -lgcc libfoo.so

    This ensures that any libgcc symbol referenced by bar.o is copied into libbar.so.

    Note that the Android NDK build system enforces this automatically. Also, since NDK r7, the link-time system libraries (e.g. libc.so, libm.so, liblog.so) don’t export any libgcc symbols intentionally.

Leave a Reply