{"id":2510,"date":"2012-03-06T17:19:34","date_gmt":"2012-03-06T16:19:34","guid":{"rendered":"http:\/\/glandium.org\/blog\/?p=2510"},"modified":"2012-03-08T09:58:00","modified_gmt":"2012-03-08T08:58:00","slug":"libgcc-a-symbol-visibility-considered-harmful","status":"publish","type":"post","link":"https:\/\/glandium.org\/blog\/?p=2510","title":{"rendered":"libgcc.a symbol visibility considered harmful"},"content":{"rendered":"<p>I recently got to <a href=\"\/blog\/?p=2146\">rebuild an Android NDK with a fresh toolchain<\/a> again, and hit an interesting problem. I actually had hit it before, but only this time I fully analyzed what's going on. [As a side note, if you build such a NDK, don't use mpfr 3.1.0, as there is a bug in the libtool it ships]<\/p>\n<p>Linking an application or a library pulls many things, that aren't part of the code being built. One of these many things is the <code>libgcc<\/code> static library. Part of <code>libgcc<\/code> consists in an implementation of the platform ABI. On Android systems, this means the ARM EABI. GCC, when compiling some instructions, will generate ABI calls. For example, integer divisions may call <code>__aeabi_idiv<\/code>.<\/p>\n<p>Consider the following minimized real world scenario:<\/p>\n<blockquote><p><code>$ echo \"int foo(int a) { return 42 % a; }\" &gt; foo.c<br \/>\n$ arm-linux-androideabi-gcc -o libfoo.so -shared foo.c -mandroid<\/code><\/p><\/blockquote>\n<p>GCC will emit a call to <code>__aeabi_idivmod<\/code> for the <code>%<\/code> operation. With GCC 4.6.3, this function is in <code>_divsi3.o<\/code> under <code>libgcc.a<\/code>. That function itself calls <code>__aeabi_idiv0<\/code>, which lives in <code>_dvmd_lnx.o<\/code> under <code>libgcc.a<\/code>.<\/p>\n<p>When statically linking, <code>ld<\/code> will thus include <code>foo.o<\/code>, <code>_divsi3.o<\/code> and <code>_dvmd_lnx.o<\/code>, meaning it will include all functions from these object files. That is, <code>foo<\/code>, <code>__divsi3<\/code>, <code>__aeabi_idiv<\/code>, <code>__aeabi_idivmod<\/code>, <code>__aeabi_idiv0<\/code> and <code>__aeabi_ldiv0<\/code>. And more than being included, these functions are exported, because symbol visibility in <code>libgcc.a<\/code> is <code>default<\/code>. So while we expect exporting <code>foo<\/code> from our library, we're actually exporting much more, including functions that just happened to be near the ones that our code (indirectly) uses.<\/p>\n<p>Now, let's say we want to build another library, using that <code>foo<\/code> function from <code>libfoo<\/code>:<\/p>\n<blockquote><p><code>$ cat &gt; bar.c &lt;&lt;EOF<br \/>\nextern int foo(int a);<br \/>\nlong long bar(long long a) { return foo(a) % a; }<br \/>\nEOF<br \/>\n$ arm-linux-androideabi-gcc -o libbar.so -shared bar.c -mandroid<\/code><\/p><\/blockquote>\n<p>(The code above has absolutely no meaning, it just triggers the same function calls as what I was getting in the actual real world case)<\/p>\n<p>When statically linking the above code, GCC will generate a call to <code>__aeabi_ldivmod<\/code>, which calls <code>__aeabi_ldiv0<\/code>, and many other things, directly or indirectly. When linking as above, nothing particularly nasty is going to happen. However, linking as above is actually wrong: the resulting library has an undefined reference to the <code>foo<\/code> symbol, and doesn't depend on <code>libfoo<\/code>. At runtime, if <code>libfoo<\/code> wasn't already loaded somehow, loading <code>libbar<\/code> would fail.<\/p>\n<p>The proper way to link is the following:<\/p>\n<blockquote><p><code>$ arm-linux-androideabi-gcc -o libbar.so -shared bar.c -mandroid -L. -lfoo<\/code><\/p><\/blockquote>\n<p>A feature of ELF static linking is that when it resolves undefined symbols, the linker will choose to use the first occurrence of a symbol it finds in the various objects and libraries given on its command line. So with the command line above, for each <code>__aeabi_*<\/code> symbol, it will first look in <code>libfoo<\/code> if there isn't one. And while <code>__aeabi_ldivmod<\/code> is not in <code>libfoo<\/code>, <code>__aeabi_ldiv0<\/code> is (see above).<\/p>\n<p>So instead of including the code for <code>__aeabi_ldiv0<\/code> from <code>libgcc.a<\/code>, it will call the copy from <code>libfoo<\/code>.<\/p>\n<p>This wouldn't be so much of a problem if <code>__aeabi_ldiv0<\/code> wasn't a weak symbol.<\/p>\n<p>Enters <a href=\"\/blog\/?p=2436\">faulty.lib<\/a>. In the real world case, <code>libfoo<\/code> is loaded by the system dynamic linker, and <code>libbar<\/code> by faulty.lib. When resolving symbols for <code>libbar<\/code>, faulty.lib has to resolve <code>libfoo<\/code> symbols with the system linker, using <code>dlsym()<\/code>. On Android, <code>dlsym()<\/code> returns NULL for weak (defined) symbols, so faulty.lib can't resolve <code>__aeabi_ldiv0<\/code>.<\/p>\n<p>The real world case wasn't a problem with GCC 4.4.3 from the vanilla Android NDK because in that GCC version, <code>__aeabi_ldivmod<\/code> doesn't call <code>__aeabi_ldiv0<\/code>.<\/p>\n<p>This wouldn't happen if shared libraries wouldn't expose random platform ABI specific bits depending on what they use and depending on other symbols that happen to be in the same object files.<\/p>\n<p>A similar issue happened a little while ago on Debian powerpc because <a href=\"http:\/\/bugs.debian.org\/cgi-bin\/bugreport.cgi?msg=167;bug=624354\">a shared library was exporting ABI specific bits<\/a>. Even worse, the toolchain was assuming the symbols would come from <code>libgcc.a<\/code> and generated wrong relocations for these symbols.<\/p>\n<p><b>Update:<\/b> Interestingly, the <code>__aeabi_*<\/code> symbols are hidden, in <code>libgcc.a<\/code> as provided on the Debian armel port.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I recently got to rebuild an Android NDK with a fresh toolchain again, and hit an interesting problem. I actually had hit it before, but only this time I fully analyzed what&#8217;s going on. [As a side note, if you build such a NDK, don&#8217;t use mpfr 3.1.0, as there is a bug in the [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[28,5,25],"tags":[23],"class_list":["post-2510","post","type-post","status-publish","format-standard","hentry","category-faulty-lib","category-pdo","category-planet-mozilla","tag-en"],"_links":{"self":[{"href":"https:\/\/glandium.org\/blog\/index.php?rest_route=\/wp\/v2\/posts\/2510","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/glandium.org\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/glandium.org\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/glandium.org\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/glandium.org\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=2510"}],"version-history":[{"count":14,"href":"https:\/\/glandium.org\/blog\/index.php?rest_route=\/wp\/v2\/posts\/2510\/revisions"}],"predecessor-version":[{"id":2524,"href":"https:\/\/glandium.org\/blog\/index.php?rest_route=\/wp\/v2\/posts\/2510\/revisions\/2524"}],"wp:attachment":[{"href":"https:\/\/glandium.org\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=2510"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/glandium.org\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=2510"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/glandium.org\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=2510"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}