{"id":4297,"date":"2023-08-30T11:16:38","date_gmt":"2023-08-30T02:16:38","guid":{"rendered":"https:\/\/glandium.org\/blog\/?p=4297"},"modified":"2023-08-30T11:16:38","modified_gmt":"2023-08-30T02:16:38","slug":"hacking-the-elf-format-for-firefox-12-years-later-doing-better-with-less","status":"publish","type":"post","link":"https:\/\/glandium.org\/blog\/?p=4297","title":{"rendered":"Hacking the ELF format for Firefox, 12 years later ; doing better with less"},"content":{"rendered":"<p>(I haven't posted a lot in the past couple years, except for git-cinnabar announcements. This is going to be a long one, hold tight)<\/p>\n<p>This is quite the cryptic title, isn't it? What is this all about? ELF (Executable and Linkable Format) is a file format used for binary files (e.g. executables, shared libraries, object files, and even core dumps) on some Unix systems (Linux, Solaris, BSD, etc.). A little over 12 years ago, I wrote a blog post about <a href=\"\/blog\/?p=1177\">improving libxul startup I\/O by hacking the ELF format<\/a>. For context, libxul is the shared library, shipped with Firefox, that contains most of its code.<\/p>\n<p>Let me spare you the read. Back then I was <a href=\"\/blog\/?p=1016\">looking at I\/O patterns during Firefox startup on Linux<\/a>, and sought ways to reduce disk seeks that were related to loading libxul. One particular pattern was caused by relocations, and the way we alleviated it was through elfhack.<\/p>\n<p><a href=\"\/blog\/?p=1177#relocations\">Relocations<\/a> are necessary in order for executables to work when they are loaded in memory at a location that is not always the same (because of e.g. <a href=\"https:\/\/en.wikipedia.org\/wiki\/Address_space_layout_randomization\">ASLR<\/a>). Applying them requires reading the section containing the relocations, and adjusting the pieces of code or data that are described by the relocations. When the relocation section is very large (and that was the case on libxul back then, and more so now), that means going back and forth (via disk seeks) between the relocation section and the pieces to adjust.<\/p>\n<h2>Elfhack to the rescue<\/h2>\n<p>Shortly after the aforementioned blog post, the elfhack tool was born and made <a href=\"https:\/\/bugzilla.mozilla.org\/show_bug.cgi?id=606145\">its way into the Firefox code base<\/a>.<\/p>\n<p>The main idea behind elfhack was to reduce the size of the relocation section. How? By <a href=\"\/blog\/?p=1177#packing-relocations\">storing it in a more compact form<\/a>. But how? By taking the executable apart, rewriting its relocation section, injecting code to apply those relocations, moving sections around, and adjusting the ELF program header, section header, section table, and string table accordingly. I will spare you the gory details (especially the part about splitting segments or the hack to use .bss section as a temporary Global Offset Table). Elfhack itself is essentially a minimalist linker that works on already linked executables. <a href=\"https:\/\/bugzilla.mozilla.org\/show_bug.cgi?id=628283\">That<\/a> <a href=\"https:\/\/bugzilla.mozilla.org\/show_bug.cgi?id=628618\">has<\/a> <a href=\"https:\/\/bugzilla.mozilla.org\/show_bug.cgi?id=629635\">caused<\/a> <a href=\"https:\/\/bugzilla.mozilla.org\/show_bug.cgi?id=661800\">us<\/a> <a href=\"https:\/\/bugzilla.mozilla.org\/show_bug.cgi?id=725284\">a<\/a> <a href=\"https:\/\/bugzilla.mozilla.org\/show_bug.cgi?id=771569\">number<\/a> <a href=\"https:\/\/bugzilla.mozilla.org\/show_bug.cgi?id=1233963\">of<\/a> <a href=\"https:\/\/bugzilla.mozilla.org\/show_bug.cgi?id=1378986\">issues<\/a> <a href=\"https:\/\/bugzilla.mozilla.org\/show_bug.cgi?id=1385783\">over<\/a> <a href=\"https:\/\/bugzilla.mozilla.org\/show_bug.cgi?id=1388713\">the<\/a> <a href=\"https:\/\/bugzilla.mozilla.org\/show_bug.cgi?id=1423822\">years<\/a> (<a href=\"https:\/\/bugzilla.mozilla.org\/show_bug.cgi?id=1525510\">and<\/a> <a href=\"https:\/\/bugzilla.mozilla.org\/show_bug.cgi?id=1495733\">much<\/a> <a href=\"https:\/\/bugzilla.mozilla.org\/show_bug.cgi?id=1840931\">more<\/a>). In fact, it's known not to work on binaries created with lld (the linker from the LLVM project) because the way lld lays things out does not provide space for the tricks we pull (although it seems to be working with the latest version of lld. But who knows what will happen with next version).<\/p>\n<p>Hindsight is 20\/20, and if I were to redo it, I'd take a different route. Wait, I'm actually kind of doing that! But let me fill you in with what happened in the past 12 years, first.<\/p>\n<h2>Android packed relocations<\/h2>\n<p>In 2014, Chrome started <a href=\"https:\/\/chromium.googlesource.com\/chromium\/src\/+\/3e7b7388d6e38371c1f0dbc4b68c9888dc6fc5b1\">using a similar-ish approach for Android on ARM<\/a> with an even more compact format, compared to the crude packing elfhack was doing. Instead of injecting initialization code in the executable, it would <a href=\"https:\/\/chromium.googlesource.com\/chromium\/src.git\/+\/4fafc2d4326cdf7c644698abf976820fbd63614d\">use a custom dynamic loader\/linker to handle the packed relocations<\/a> (that loader\/linker was forked from <a href=\"https:\/\/android-review.googlesource.com\/c\/platform\/ndk\/+\/65190\">the one in the Android NDK<\/a>, which solved similar problems to <a href=\"https:\/\/bugzilla.mozilla.org\/show_bug.cgi?id=683127\">what our own custom linker had<\/a>, but that's another story).<\/p>\n<p>That approach eventually <a href=\"https:\/\/android.googlesource.com\/platform\/bionic\/+\/87a0617ebe7561bf28d3a19fbe192372598969b8\">made its way into Android itself<\/a>, in 2015, with <a href=\"https:\/\/android.googlesource.com\/platform\/bionic\/+\/18a6956b76a071097fc658c5fe13ef010e31864a\">support from the dynamic loader in bionic<\/a> (the Android libc), and later support for emitting those packed relocations <a href=\"https:\/\/github.com\/llvm\/llvm-project\/commit\/5c54f15c55e7d5f73496b9a2957bd673dd785414\">was added to lld<\/a> in October 2017. Interestingly, the packer added to lld created smaller packed relocations than the packer in Android (for the same format).<\/p>\n<h2>The road to standardization<\/h2>\n<p>Shortly after bionic got its native packed relocation support, a <a href=\"https:\/\/sourceware.org\/legacy-ml\/gnu-gabi\/2017-q2\/msg00000.html\">conversation started on the gnu-gabi mailing list<\/a> related to the general problem of relocations representing a large portion of Position Independent Executable. What we observed on a shared library had started to creep into programs as well because PIE binaries started to be prominent around that time, with some compilers and linkers starting to default to that for hardening reasons. <a href=\"https:\/\/sourceware.org\/legacy-ml\/gnu-gabi\/2017-q2\/msg00004.html\">Both Chrome's and Firefox prior art were mentioned<\/a>. This was April 2017.<\/p>\n<p>A few months went by, and a <a href=\"https:\/\/sourceware.org\/legacy-ml\/gnu-gabi\/2017-q4\/msg00005.html\">simpler format was put forward<\/a>, with great results, which led to, a few days later, a <a href=\"https:\/\/groups.google.com\/g\/generic-abi\/c\/bX460iggiKg\">formal proposal for RELR relocations in the Generic System V Application Binary Interface<\/a>.<\/p>\n<h2>More widespread availability<\/h2>\n<p>Shortly after the proposal, <a href=\"https:\/\/android.googlesource.com\/platform\/bionic\/+\/b7feec74547f84559a1467aca02708ff61346d2a\">Android got experimental support for it<\/a>, and a few months later, in July 2018, <a href=\"https:\/\/github.com\/llvm\/llvm-project\/commit\/11479daf2f0652d3e11f308a0810cc44da04f31d\">lld gained experimental support as well<\/a>.<\/p>\n<p>The <a href=\"https:\/\/git.kernel.org\/pub\/scm\/linux\/kernel\/git\/torvalds\/linux.git\/commit\/?id=5cf896fb6be3effd9aea455b22213e27be8bdb1d\">Linux kernel got support for it too<\/a>, for KASLR relocations, but for arm64 only (I suppose this was for Android kernels. It still is the only architecture it has support for to this day).<\/p>\n<p>GNU binutils <a href=\"https:\/\/sourceware.org\/git\/?p=binutils-gdb.git;a=commit;h=a619b58721f0a03fd91c27670d3e4c2fb0d88f1e\">gained support for the proposal<\/a> (via a <code>-z pack-relative-relocs<\/code> flag) at the end of 2021, and glibc eventually <a href=\"https:\/\/sourceware.org\/git\/?p=glibc.git;a=commit;h=e895cff59aa562cad83fa0fdd187bfe4b45312d5\">caught up<\/a> in 2022, and this shipped respectively in binutils 2.38 and glibc 2.36. These versions should now have reached most latest releases of major Linux distros.<\/p>\n<p>Lld thereafter <a href=\"https:\/\/github.com\/llvm\/llvm-project\/commit\/4a8de2832a2a730f63b71bdf1c1b446285ec5b6f\">got support for the same flag as binutils's<\/a>, with the same side effect of adding a version dependency on <code>GLIBC_ABI_DT_RELR<\/code>, to avoid crashes when running executables with packed relocations against an older glibc.<\/p>\n<h2>What about Firefox?<\/h2>\n<p>Elfhack was <a href=\"https:\/\/bugzilla.mozilla.org\/show_bug.cgi?id=1747782\">updated to use the format from the proposal<\/a> at the very end of 2021 (or rather, <a href=\"https:\/\/bugzilla.mozilla.org\/show_bug.cgi?id=1839746\">close enough to that format<\/a>). More recently (as in, two months ago), <a href=\"https:\/\/bugzilla.mozilla.org\/show_bug.cgi?id=1839743\">support for the <code>-z pack-relative-relocs<\/code> flag was added<\/a>, so that when building Firefox against a recent enough glibc and with a recent enough linker, it will use that instead of elfhack automatically. This means in some cases, Firefox packages in Linux distros will be using those relocations (for instance, that's the case since Firefox 116 in Debian unstable).<\/p>\n<p>Which (finally) brings us to the next step, and the meat of this post.<\/p>\n<h2>Retiring Elfhack<\/h2>\n<p>It's actually still too early for that. The Firefox binaries Mozilla provides need to run on a broad variety of systems, including many that don't support those new packed relocations. That includes Android systems older than Red Velvet Cake (11), and not necessarily very old desktop systems.<\/p>\n<p>Android Pie (9) shipped with experimental, but incompatible, support for the same packed relocation format, but using different constants. Hacking the PT_DYNAMIC segment (the segment containing metadata for dynamic linking) for compatibility with all Android versions &gt;= 9 would technically be possible, but again, Mozilla needs to support even older versions of Android.<\/p>\n<p>There comes the idea behind what I've now called <code>relrhack<\/code>: injecting code that can apply the packed relocations created by the linker if the system dynamic loader hasn't.<\/p>\n<p>To some extent, that sounds similar to what elfhack does, doesn't it? But elfhack packs the relocations itself. And because its input is a fully linked binary, it has to do complex things that we know don't always work reliably.<\/p>\n<p>In the past few years, an idea was floating in the back of my mind to change elfhack to start off a relocatable binary (also known as partially linked). It would then rewrite the sections it needs to, and invoke the linker to link that to its initialization code and produce the final binary. That would theoretically avoid all the kinds of problems we've hit, and work more reliably with lld.<\/p>\n<p>The idea I've toyed with more recently, though, is even simpler: Use the <code>-z pack-relative-relocs<\/code> linker support, and add the initialization code on the linker command line so that it does everything in one go. We're at this sweet spot in time where we can actually start doing this.<\/p>\n<h2>Testing the idea<\/h2>\n<p>My first attempts were with a small executable, and linking with lld's older <code>--pack-dyn-relocs=relr<\/code> flag, which does the same as <code>-z pack-relative-relocs<\/code> but skips adding the <code>GLIBC_ABI_DT_RELR<\/code> version dependency. That allowed to avoid having to do post-processing of the binary in this first experimentation step.<\/p>\n<p>I quickly got something working on a Debian Bullseye system (using an older glibc that doesn't support the packed relocations). Here's how it goes:<\/p>\n<pre><code class=\"language-C\">\/\/ Compile with: clang -fuse-ld=lld -Wl,--pack-dyn-relocs=relr,--entry=my_start,-z,norelro -o relr-test\n#include &lt;stdio.h&gt;\n\nchar *helloworld[] = {&quot;Hello, world&quot;};\n\nint main(void) {\n  printf(&quot;%s\\n&quot;, helloworld[0]);\n  return 0;\n}<\/code><\/pre>\n<p>This is a minimal Hello world program that contains a relative relocation: the <code>helloworld<\/code> variable is an array of pointers, and those pointers need to be relocated. Optimizations would get rid of the array but we don't enable optimizations specifically for that. We also disable &quot;Relocation Read-Only&quot;, which is a protection that makes the dynamic loader relocated sections read-only after it's done applying relocations. That would prevent us from applying the missing relocations on our own. We're just testing, we'll deal with that later.<\/p>\n<p>Compiling just this without <code>--entry=my_start<\/code> (because we haven't defined that yet), and running it yields a segmentation fault. We don't even reach <code>main<\/code> because there actually is an initialization function section that runs before that, and its location, defined in the <code>.init_array<\/code> section, is behind a relative relocation, which <code>--pack-dyn-relocs=relr<\/code> packed. This is exactly why <code>-z pack-relative-relocs<\/code> adds a dependency on a symbol version that doesn't exist in older glibcs. With that flag, the error becomes:<\/p>\n<pre><code>\/lib\/x86_64-linux-gnu\/libc.so.6: version `GLIBC_ABI_DT_RELR&#039; not found<\/code><\/pre>\n<p>which is more user-friendly than a plain crash.<\/p>\n<p>At this point, what do we want? Well, we want to apply the relocations ourselves, as early as possible. The first thing that will run in an executable is its &quot;entry point&quot;, that defaults to <code>_start<\/code> (provided by the C runtime, aka CRT). As hinted in the code snippet above, we can set our own with <code>--entry<\/code>.<\/p>\n<pre><code class=\"language-C\">static void real_init();\nextern void _start();\n\nvoid my_start() {\n  real_init();\n  _start();\n}<\/code><\/pre>\n<p>Here's our own entry point. It will start by calling the &quot;real&quot; initialization function we forward declare here. Let's see if that actually works. Let's add the following temporarily and see how things go.<\/p>\n<pre><code class=\"language-C\">void real_init() {\n  printf(&quot;Early Hello world\\n&quot;);\n}<\/code><\/pre>\n<p>Running the program now yields:<\/p>\n<pre><code class=\"language-shell\">$ .\/relr-test\nEarly Hello world\nSegmentation fault<\/code><\/pre>\n<p>There we go, we've executed code before anything relies on the relative relocations being applied. By the way, adding functions calls like this printf, that early, with elfhack, was an interesting challenge. This is pleasantly much simpler.<\/p>\n<h2>Applying the relocations for real<\/h2>\n<p>Let's replace that <code>real_init<\/code> function with some boilerplate for the upcoming real <code>real_init<\/code>:<\/p>\n<pre><code class=\"language-C\">#include &lt;link.h&gt;\n\n#ifndef DT_RELRSZ\n#define DT_RELRSZ 35\n#endif\n#ifndef DT_RELR\n#define DT_RELR 36\n#endif\n\nextern ElfW(Dyn) _DYNAMIC[];\nextern ElfW(Ehdr) __executable_start;<\/code><\/pre>\n<p>The defines are there because older systems don't have them in link.h. <code>_DYNAMIC<\/code> is a symbol that gives access to the PT_DYNAMIC segment at runtime, and the <code>__executable_start<\/code> symbol gives access to the base address of the program, which non-relocated addresses in the binary are relative to.<\/p>\n<p>Now we're ready for the real work:<\/p>\n<pre><code class=\"language-C\">void real_init() {\n  \/\/ Find the relocations section.\n  ElfW(Addr) relr;\n  ElfW(Word) size = 0;\n  for (ElfW(Dyn) *dyn = _DYNAMIC; dyn-&gt;d_tag != DT_NULL; dyn++) {\n    if (dyn-&gt;d_tag == DT_RELR) {\n      relr = dyn-&gt;d_un.d_ptr;\n    }\n    if (dyn-&gt;d_tag == DT_RELRSZ) {\n      size = dyn-&gt;d_un.d_val;\n    }\n  }\n  uintptr_t elf_header = (uintptr_t)&amp;__executable_start;\n\n  \/\/ Apply the relocations.\n  ElfW(Addr) *ptr, *start, *end;\n  start = (ElfW(Addr) *)(elf_header + relr);\n  end = (ElfW(Addr) *)(elf_header + relr + size);\n  for (ElfW(Addr) *entry = start; entry &lt; end; entry++) {\n    if ((*entry &amp; 1) == 0) {\n      ptr = (ElfW(Addr) *)(elf_header + *entry);\n      *ptr += elf_header;\n    } else {\n      size_t remaining = 8 * sizeof(ElfW(Addr)) - 1;\n      ElfW(Addr) bits = *entry;\n      do {\n        bits &gt;&gt;= 1;\n        remaining--;\n        ptr++;\n        if (bits &amp; 1) {\n          *ptr += elf_header;\n        }\n      } while (bits);\n      ptr += remaining;\n    }\n  }\n}<\/code><\/pre>\n<p>It's all kind of boring here. We scan the PT_DYNAMIC segment to get the location and size of the packed relocations section, and then read and apply them.<\/p>\n<p>And does it work?<\/p>\n<pre><code class=\"language-shell\">$ .\/relr-test\nHello, world<\/code><\/pre>\n<p>It does! Mission accomplished? If only...<\/p>\n<h2>The devil is in the details<\/h2>\n<p>Let's try running this same binary on a system with a more recent glibc:<\/p>\n<pre><code class=\"language-shell\">$ .\/relr-test \n.\/relr-test: error while loading shared libraries: .\/relr-test: DT_RELR without GLIBC_ABI_DT_RELR dependency<\/code><\/pre>\n<p>Oh come on! Yes, glibc insists that when the PT_DYNAMIC segment contains these types of relocations, the binary must have that symbol version dependency. That same symbol version dependency we need to avoid in order to work on older systems. I have no idea why the glibc developers went all their way to prevent that. Someone even <a href=\"https:\/\/sourceware.org\/pipermail\/libc-alpha\/2022-March\/136773.html\">asked when this was all at the patch stage, with no answer<\/a>.<\/p>\n<p>We'll figure out a workaround later. Let's use <code>-Wl,-z,pack-relative-relocs<\/code> for now and see how it goes.<\/p>\n<pre><code class=\"language-shell\">$ .\/relr-test \nSegmentation fault<\/code><\/pre>\n<p>Oops. Well, that actually didn't happen when I was first testing, but for the purpose of this post, I didn't want to touch this topic before strictly necessary. Because we're now running on a system that does support the packed relocations, when our initialization code is reached, relocations are already applied, and we're applying them again. That overcompensates every relocated address, and leads to accesses to unmapped memory.<\/p>\n<p>But how can we know whether relocations were applied? Well, conveniently, the address of a function, from within that function, doesn't need a relative relocation to be known. That's one half. The other half requires &quot;something&quot; that uses a relative relocation to know that same address. We insert this before <code>real_init<\/code>, but after its forward declaration:<\/p>\n<pre><code class=\"language-C\">void (*__real_init)() = real_init;<\/code><\/pre>\n<p>Because it's a global variable that points to the address of the function, it requires a relocation. And because the function is static and in the compilation unit, it needs a relative relocation, not one that would require symbol resolution.<\/p>\n<p>Now we can add this at the beginning of <code>real_init<\/code>:<\/p>\n<pre><code class=\"language-C\">  \/\/ Don&#039;t apply relocations when the dynamic loader has applied them already.\n  if (__real_init == real_init) {\n    return;\n  }<\/code><\/pre>\n<p>And we're done. This works:<\/p>\n<pre><code class=\"language-shell\">$ .\/relr-test \nHello, world<\/code><\/pre>\n<p>Unfortunately, we're back to square one on an older system:<\/p>\n<pre><code class=\"language-shell\">$ .\/relr-test \n.\/relr-test: \/lib\/x86_64-linux-gnu\/libc.so.6: version `GLIBC_ABI_DT_RELR&#039; not found (required by .\/relr-test)<\/code><\/pre>\n<h2>Hacking the ELF format, again<\/h2>\n<p>And here we go again, having to post-process a binary. So what do we need this time around? Well, starting from a binary linked with <code>--pack-dyn-relocs=relr<\/code>, we need to avoid the &quot;DT_RELR without GLIBC_ABI_DT_RELR&quot; check. If we change the PT_DYNAMIC segment such that it doesn't contain DT_RELR-related tags, the error will be avoided. Sadly, that means we'll always apply relocations ourselves, but so be it.<\/p>\n<p>How do we do that? Open the file, find the PT_DYNAMIC segment, scan it, overwrite a few tags with a different value, and done. Damn, that's much less work than everything elfhack was doing. I will spare you the code required to do that. Heck, that can trivially be done in a hex editor. Hey, you know what? That would actually be less stuff to write here than ELF parsing code, and would still allow you to follow at home.<\/p>\n<p>Let's start from that binary we built earlier with <code>--pack-dyn-relocs=relr<\/code>.<\/p>\n<pre><code class=\"language-shell\">$ objcopy --dump-section .dynamic=dyn relr-test<\/code><\/pre>\n<p>We now have a <code>dyn<\/code> file with the contents of the PT_DYNAMIC segment.<\/p>\n<p>In that segment, each block of 16 bytes (assuming a 64-bits system) stores a 8-byte tag and a 8-byte value. We want to change the <code>DT_RELR<\/code>, <code>DT_RELRSZ<\/code> and <code>DT_RELRENT<\/code> tags. Their hex value are, respectively, 0x24, 0x23 and 0x25.<\/p>\n<pre><code class=\"language-shell\">$ xxd dyn | grep 2[345]00\n00000060: 2400 0000 0000 0000 6804 0000 0000 0000  $.......h.......\n00000070: 2300 0000 0000 0000 1000 0000 0000 0000  #...............\n00000080: 2500 0000 0000 0000 0800 0000 0000 0000  %...............<\/code><\/pre>\n<p>(got lucky a bit here, not matching anywhere else than in the tag)<\/p>\n<p>Let's set an extra arbitrary high-ish bit.<\/p>\n<pre><code class=\"language-shell\">$ xxd dyn | sed -n &#039;\/: 2[345]00\/s\/ 0000\/ 0080\/p&#039;\n00000060: 2400 0080 0000 0000 6804 0000 0000 0000  $.......h.......\n00000070: 2300 0080 0000 0000 1000 0000 0000 0000  #...............\n00000080: 2500 0080 0000 0000 0800 0000 0000 0000  %...............<\/code><\/pre>\n<p>This went well, let's do it for real.<\/p>\n<pre><code class=\"language-shell\">$ xxd dyn | sed &#039;\/: 2[345]00\/s\/ 0000\/ 0080\/&#039; | xxd -r &gt; dyn.new\n$ objcopy --update-section .dynamic=dyn.new relr-test<\/code><\/pre>\n<p>Let me tell you I'm glad we're in 2023, because these objcopy options we just used didn't exist 12+ years ago.<\/p>\n<p>So, how did it go?<\/p>\n<pre><code class=\"language-shell\">$ .\/relr-test \nSegmentation fault<\/code><\/pre>\n<p>Uh oh. Well duh, we didn't change the code that applies the relocations, so it can't find the packed relocation section.<\/p>\n<p>Let's edit the loop to use this:<\/p>\n<pre><code class=\"language-C\">    if (dyn-&gt;d_tag == (DT_RELR | 0x80000000)) {\n      relr = dyn-&gt;d_un.d_ptr;\n    }\n    if (dyn-&gt;d_tag == (DT_RELRSZ | 0x80000000)) {\n      size = dyn-&gt;d_un.d_val;\n    }<\/code><\/pre>\n<p>And start over:<\/p>\n<pre><code class=\"language-shell\">$ clang -fuse-ld=lld -Wl,--pack-dyn-relocs=relr,--entry=my_start,-z,norelro -o relr-test relr-test.c\n$ objcopy --dump-section .dynamic=dyn relr-test\n$ xxd dyn | sed &#039;\/: 2[345]00\/s\/ 0000\/ 0080\/&#039; | xxd -r &gt; dyn.new\n$ objcopy --update-section .dynamic=dyn.new relr-test\n$ .\/relr-test\nHello, world<\/code><\/pre>\n<p>Copy over to the newer system, and try:<\/p>\n<pre><code class=\"language-shell\">$ .\/relr-test\nHello, world<\/code><\/pre>\n<p><a href=\"https:\/\/youtu.be\/watch?v=wUS5l-MU2Hs\">Flawless victory<\/a>. We now have a binary that works on both old and new systems, using packed relocations created by the linker, and barely post-processing the binary (and we don't need that <code> if (__real_init == real_init)<\/code> anymore).<\/p>\n<h2>Generalizing a little<\/h2>\n<p>Okay, so while we're here, we'd rather use <code>-z packed-relative-relocs<\/code> because it works across more linkers, so we need to get rid of that <code>GLIBC_ABI_DT_RELR<\/code> symbol version dependency it adds, in order for the output to be more or less equivalent to what <code>--pack-dyn-relocs=relr<\/code> would produce.<\/p>\n<pre><code class=\"language-shell\">$ clang -fuse-ld=lld -Wl,-z,pack-relative-relocs,--entry=my_start,-z,norelro -o relr-test relr-test.c<\/code><\/pre>\n<p>You know what, we might as well learn new things. Objcopy is nice, but as I was starting to write this section, I figured it was going to be annoying to do in the same style as above.<\/p>\n<p>Have you heard of <a href=\"https:\/\/jemarch.net\/poke.html\">GNU poke<\/a>? I saw a <a href=\"https:\/\/fosdem.org\/2023\/schedule\/event\/bintools_poke\/\">presentation about it at FOSDEM 2023<\/a>, and haven't had the occasion to try it, I guess this is the day to do that. We'll be using GNU poke 3.2 (latest version as of writing).<\/p>\n<p>Of course, that version doesn't contain the necessary bits. But this is Free Software, right? After <a href=\"https:\/\/git.savannah.gnu.org\/cgit\/poke\/poke-elf.git\/commit\/?id=7b5361f32f7366ce8cd403b180a67af68f5bc80e\">a<\/a> <a href=\"https:\/\/git.savannah.gnu.org\/cgit\/poke\/poke-elf.git\/commit\/?id=e068f0762be79322cfec4b0edd877205a104701b\">few<\/a> <a href=\"https:\/\/git.savannah.gnu.org\/cgit\/poke\/poke-elf.git\/commit\/?id=28291722adb0aa40c198fc9d78ceb6ff01fabe49\">patches<\/a>, we're all set.<\/p>\n<pre><code class=\"language-shell\">$ git clone https:\/\/git.savannah.gnu.org\/git\/poke\/poke-elf\n$ POKE_LOAD_PATH=poke-elf poke relr-test\n(poke) load elf\n(poke) var elf = Elf64_File @ 0#B<\/code><\/pre>\n<p>Let's get the section containing the symbol version information. It starts with a Verneed header.<\/p>\n<pre><code class=\"language-shell\">(poke) var section = elf.get_sections_by_type(ELF_SHT_GNU_VERNEED)[0]\n(poke) var verneed = Elf_Verneed @ section.sh_offset\n(poke) verneed\nElf_Verneed {vn_version=1UH,vn_cnt=2UH,vn_file=110U,vn_aux=16U,vn_next=0U}<\/code><\/pre>\n<p><code>vn_file<\/code> identifies the library file expected to contain those <code>vn_cnt<\/code> versions. Let's check this is about the libc. The section's <code>sh_link<\/code> will tell us which entry of the section header (<code>shdr<\/code>) corresponds to the string table that <code>vn_file<\/code> points into.<\/p>\n<pre><code class=\"language-shell\">(poke) var strtab = elf.shdr[section.sh_link].sh_offset\n(poke) string @ strtab + verneed.vn_file#B\n&quot;libc.so.6&quot;<\/code><\/pre>\n<p>Bingo. Now let's scan the two (per <code>vn_cnt<\/code>) Vernaux entries that the Verneed header points to via <code>vn_aux<\/code>. The first one:<\/p>\n<pre><code class=\"language-shell\">(poke) var off = section.sh_offset + verneed.vn_aux#B\n(poke) var aux = Elf_Vernaux @ off\n(poke) aux\nElf_Vernaux {vna_hash=157882997U,vna_flags=0UH,vna_other=2UH,vna_name=120U,vna_next=16U}\n(poke) string @ strtab + aux.vna_name#B\n&quot;GLIBC_2.2.5&quot;<\/code><\/pre>\n<p>And the second one, that <code>vna_next<\/code> points to.<\/p>\n<pre><code class=\"language-shell\">(poke) var off = off + aux.vna_next#B\n(poke) var aux2 = Elf_Vernaux @ off\n(poke) aux2\nElf_Vernaux {vna_hash=16584258U,vna_flags=0UH,vna_other=3UH,vna_name=132U,vna_next=0U}\n(poke) string @ strtab + aux2.vna_name#B\n&quot;GLIBC_ABI_DT_RELR&quot;<\/code><\/pre>\n<p>This is it. This is the symbol version we want to get rid of. We could go on by adjusting <code>vna_next<\/code> in the first entry, and reducing <code>vn_cnt<\/code> in the header, but forward thinking to automating this for binaries that may contain more than two symbol versions from more than one dependency, it's just simpler to pretend this version is a repeat of the previous one. So we copy all its fields, except <code>vna_next<\/code>.<\/p>\n<pre><code class=\"language-shell\">(poke) aux2.vna_hash = aux.vna_hash \n(poke) aux2.vna_flags = aux.vna_flags \n(poke) aux2.vna_other = aux.vna_other\n(poke) aux2.vna_name = aux.vna_name<\/code><\/pre>\n<p>We could stop here and go back to the <code>objcopy<\/code>\/<code>xxd<\/code> way of adjusting the PT_DYNAMIC segment, but while we're in poke, it can't hurt to try to do the adjustement with it.<\/p>\n<pre><code class=\"language-shell\">(poke) var dyn = elf.get_sections_by_type(ELF_SHT_DYNAMIC)[0]\n(poke) var dyn = Elf64_Dyn[dyn.sh_size \/ dyn.sh_entsize] @ dyn.sh_offset\n(poke) for (d in dyn) if (d.d_tag in [ELF_DT_RELR,ELF_DT_RELRSZ,ELF_DT_RELRENT]) d.d_tag |= 0x80000000L\n&lt;stdin&gt;:1:20: error: invalid operand in expression\n&lt;\/stdin&gt;&lt;stdin&gt;:1:20: error: expected uint&lt;32&gt;, got Elf64_Sxword<\/code><\/pre>\n<p>Gah, that seemed straightforward. It turns out <code>in<\/code> is not lenient about integer types. Let's just use the plain values.<\/p>\n<pre><code class=\"language-shell\">(poke) for (d in dyn) if (d.d_tag in [0x23L,0x24L,0x25L]) d.d_tag |= 0x80000000L\nunhandled constraint violation exception\nfailed expression\n  elf_config.check_enum (&quot;dynamic-tag-typ                       elf_mach, d_tag)\nin field Elf64_Dyn.d_tag<\/code><\/pre>\n<p>This time, this is because poke is actually validating the tag values, which is both a blessing and a curse. It can avoid shooting yourself in the foot (after all, we're setting a non-existing value), but also hinder getting things done (because before I actually got here, many of the <code>d_tag<\/code> values in the binary straight out of the linker weren't even supported).<\/p>\n<p>Let's make poke's validator know about the values we're about to set:<\/p>\n<pre><code class=\"language-shell\">(poke) for (n in [0x23L,0x24L,0x25L]) elf_config.add_enum :class &quot;dynamic-tag-types&quot; :entries [Elf_Config_UInt { value = 0x80000000L | n }]\n(poke) for (d in dyn) if (d.d_tag in [0x23L,0x24L,0x25L]) d.d_tag |= 0x80000000L\n(poke) .exit\n$ .\/relr-test\nHello, world<\/code><\/pre>\n<p>And it works on the newer system too!<\/p>\n<h2>Repeating for a shared library<\/h2>\n<p>Let's set up a new testcase, using a shared library:<\/p>\n<ul>\n<li>Take our previous testcase, and rename the <code>main<\/code> function to <code>relr_test<\/code>.<\/li>\n<li>Compile it with <code>clang -fuse-ld=lld -Wl,--pack-dyn-relocs=relr,--entry=my_start,-z,norelro -fPIC -shared -o librelr-test.so<\/code><\/li>\n<li>Create a new file with the following content and compile it:<\/li>\n<\/ul>\n<pre><code class=\"language-C\">\/\/ Compile with: clang -o relr-test -L. -lrelr-test -Wl,-rpath,&#039;$ORIGIN&#039;\nextern int relr_test(void);\n\nint main(void) {\n  return relr_test();\n}<\/code><\/pre>\n<ul>\n<li>Apply the same GNU poke commands as before, on the <code>librelr-test.so<\/code> file.<\/li>\n<\/ul>\n<p>So now, it should work, right?<\/p>\n<pre><code class=\"language-shell\">$ .\/relr-test\nSegmentation fault<\/code><\/pre>\n<p>Oops. What's going on?<\/p>\n<pre><code class=\"language-shell\">$ gdb -q -ex run -ex backtrace -ex detach -ex quit .\/relr-test\nReading symbols from .\/relr-test...\n(No debugging symbols found in .\/relr-test)\nStarting program: \/relr-test \nBFD: \/librelr-test.so: unknown type [0x13] section `.relr.dyn&#039;\nwarning: `\/librelr-test.so&#039;: Shared library architecture unknown is not compatible with target architecture i386:x86-64.\n\nProgram received signal SIGSEGV, Segmentation fault.\n0x00000000000016c0 in ?? ()\n#0  0x00000000000016c0 in ?? ()\n#1  0x00007ffff7fe1fe2 in call_init (l=&lt;optimized out&gt;, argc=argc@entry=1, argv=argv@entry=0x7fffffffdfc8, \n    env=env@entry=0x7fffffffdfd8) at dl-init.c:72\n#2  0x00007ffff7fe20e9 in call_init (env=0x7fffffffdfd8, argv=0x7fffffffdfc8, argc=1, l=&lt;\/optimized&gt;&lt;optimized out&gt;) at dl-init.c:30\n#3  _dl_init (main_map=0x7ffff7ffe180, argc=1, argv=0x7fffffffdfc8, env=0x7fffffffdfd8) at dl-init.c:119\n#4  0x00007ffff7fd30ca in _dl_start_user () from \/lib64\/ld-linux-x86-64.so.2\n#5  0x0000000000000001 in ?? ()\n#6  0x00007fffffffe236 in ?? ()\n#7  0x0000000000000000 in ?? ()\nDetaching from program: \/relr-test, process 3104868\n[Inferior 1 (process 3104868) detached]<\/code><\/pre>\n<p>Side note: it looks like we'll also need to change some section types if we want to keep tools like gdb happy.<\/p>\n<p>So, this is crashing when doing what looks like a jump\/call to an address that is not relocated (seeing how low it is). Let's pull the libc6 source and see what's around <a href=\"https:\/\/sourceware.org\/git\/?p=glibc.git;a=blob;f=elf\/dl-init.c;h=55d528c7a5fd88baed99184fb47e043b0cb6b232;hb=9ea3686266dca3f004ba874745a4087a89682617#l72\">dl-init.c:72<\/a>:<\/p>\n<pre><code class=\"language-C\">addrs = (ElfW(Addr) *) (init_array-&gt;d_un.d_ptr + l-&gt;l_addr);\nfor (j = 0; j &lt; jm; ++j)\n  ((init_t) addrs[j]) (argc, argv, env);<\/code><\/pre>\n<p>This is when it goes through <code>.init_array<\/code> and calls each of the functions in the table. So, <code>.init_array<\/code> is not relocated, which means our initialization code hasn't run. But why? Well, that's because the ELF entry point is not used for shared libraries. So, we need to execute our code some other way. What runs on shared library loading? Well, functions from the <code>.init_array<\/code> table... but they need to be relocated, we got ourselves a chicken and egg problem. Does something else run before that? It turns out that yes, right before that dl-init.c:72 code, there is this:<\/p>\n<pre><code class=\"language-C\">if (l-&gt;l_info[DT_INIT] != NULL)\n  DL_CALL_DT_INIT(l, l-&gt;l_addr + l-&gt;l_info[DT_INIT]-&gt;d_un.d_ptr, argc, argv, env);<\/code><\/pre>\n<p>And the good news here is that it doesn't require <code>DT_INIT<\/code> to be relocated: that <code>l_addr<\/code> is the base address the loader used for the library, so it's relocating the address itself. Thank goodness.<\/p>\n<p>So, how do we get a function in <code>DT_INIT<\/code>? Well... we already have one:<\/p>\n<pre><code class=\"language-shell\">$ readelf -d librelr-test.so | grep &#039;(INIT)&#039;\n 0x000000000000000c (INIT)               0x18a8\n$ readelf -sW librelr-test.so | grep 18a8\n     7: 00000000000018a8     0 FUNC    GLOBAL DEFAULT   13 _init\n    20: 00000000000018a8     0 FUNC    GLOBAL DEFAULT   13 _init<\/code><\/pre>\n<p>So we want to wrap it similarly to what we did for <code>_start<\/code>, adding the following to the code of the library:<\/p>\n<pre><code class=\"language-C\">extern void _init();\n\nvoid my_init() {\n  real_init();\n  _init();\n}<\/code><\/pre>\n<p>And we replace <code>--entry=my_start<\/code> with <code>--init=my_init<\/code> when relinking <code>librelr-test.so<\/code> (while not forgetting all the GNU poke dance), and it finally works:<\/p>\n<pre><code class=\"language-shell\">$ .\/relr-test\nHello, world<\/code><\/pre>\n<p>(and obviously, it also works on the newer system too)<\/p>\n<h2>But does this work for Firefox?<\/h2>\n<p>We now have a manual procedure that gets us mostly what we want, that works with two tiny testcases. But does it scale to Firefox? Before implementing the whole thing, let's test a little more. First, let's build two <code>.o<\/code> files based on our code so far, without the <code>relr_test<\/code> function. One with the <code>my_init<\/code> wrapper, the other with the <code>my_start<\/code> wrapper. We'll call the former <code>relr-test-lib.o<\/code> and the latter <code>relr-test-bin.o<\/code> (Compile with <code>clang -c -fPIC -O2<\/code>).<\/p>\n<p>Then, let's add the following to the <code>.mozconfig<\/code> we use to build Firefox:<\/p>\n<pre><code class=\"language-shell\">export MOZ_PROGRAM_LDFLAGS=&quot;-Wl,-z,pack-relative-relocs,--entry=my_start,-z,norelro \/path\/to\/relr-test-bin.o&quot;\nmk_add_options &#039;export EXTRA_DSO_LDOPTS=&quot;-Wl,-z,pack-relative-relocs,--init=my_init,-z,norelro \/path\/to\/relr-test-lib.o&quot;&#039;<\/code><\/pre>\n<p>This leverages some arcane Firefox build system knowledge to have something minimally intrusive to use the flags we need and to inject our code. However, because of how the Firefox build system works, it also means some Rust build scripts will also be compiled with these flags (unfortunately). In turn, this means those build scripts won't run on a system without packed relocation support in glibc, so we need to build Firefox on the newer system.<\/p>\n<p>And because we're on the newer system, running this freshly built Firefox will just work, because the init code is skipped and relocations applied by the dynamic loader. Things will only get spicy when we start applying our hack to make our initialization code handle the relocations itself. Because Firefox is bigger than our previous testcases, scanning through to find the right versioned symbol to remove is going to be cumbersome, so we'll just skip that part. In fact, we can just use our first approach with objcopy, because it's smaller. After a successful build, let's first do that for libxul.so, which is the largest binary in Firefox.<\/p>\n<pre><code class=\"language-shell\">$ objcopy --dump-section .dynamic=dyn obj-x86_64-pc-linux-gnu\/dist\/bin\/libxul.so\n$ xxd dyn | sed &#039;\/: 2[345]00\/s\/ 0000\/ 0080\/&#039; | xxd -r &gt; dyn.new\n$ objcopy --update-section .dynamic=dyn.new obj-x86_64-pc-linux-gnu\/dist\/bin\/libxul.so\n$ .\/mach run\n 0:00.15 \/path\/to\/obj-x86_64-pc-linux-gnu\/dist\/bin\/firefox -no-remote -profile \/path\/to\/obj-x86_64-pc-linux-gnu\/tmp\/profile-default\n$ echo $?\n245<\/code><\/pre>\n<p>Aaaand... it doesn't start. Let's try again in a debugger.<\/p>\n<pre><code class=\"language-shell\">$ .\/mach run --debug\n&lt;snip&gt;\n(gdb) run\n&lt;snip&gt;\nThread 1 &quot;firefox&quot; received signal SIGSEGV, Segmentation fault.\nreal_init () at \/tmp\/relr-test.c:55\n55          if ((*entry &amp; 1) == 0) {<\/code><\/pre>\n<p>It's crashing while applying the relocations?! But why?<\/p>\n<pre><code class=\"language-shell\">(gdb) print entry\n$1 = (Elf64_Addr *) 0x303c8<\/code><\/pre>\n<p>That's way too small to be a valid address. What's going on? Let's start looking where this value is and where it comes from.<\/p>\n<pre><code class=\"language-shell\">(gdb) print &amp;entry\nAddress requested for identifier &quot;entry&quot; which is in register $rax<\/code><\/pre>\n<p>So where does the value of the <code>rax<\/code> register come from?<\/p>\n<pre><code class=\"language-shell\">(gdb) set pagination off\n(gdb) disassemble\/m\n&lt;snip&gt;\n41          if (dyn-&gt;d_tag == (DT_RELR | 0x80000000)) {\n42            relr = dyn-&gt;d_un.d_ptr;\n   0x00007ffff2289f47 &lt;+71&gt;:    mov    (%rcx),%rax\n&lt;snip&gt;\n52        start = (ElfW(Addr) *)(elf_header + relr);\n   0x00007ffff2289f54 &lt;+84&gt;:    add    0x681185(%rip),%rax        # 0x7ffff290b0e0\n&lt;snip&gt;<\/code><\/pre>\n<p>So <code>rax<\/code> starts with the value from <code>DT_RELR<\/code>, and the value stored at the address 0x7ffff290b0e0 is added to it. What's at that address?<\/p>\n<pre><code class=\"language-shell\">(gdb) print *(void**)0x7ffff290b0e0\n$1 = (void *) 0x0<\/code><\/pre>\n<p>Well, no surprise here. Wanna bet it's another chicken and egg problem?<\/p>\n<pre><code class=\"language-shell\">(gdb) info files\n&lt;snip&gt;\n        0x00007ffff28eaed8 - 0x00007ffff290b0e8 is .got in \/path\/to\/obj-x86_64-pc-linux-gnu\/dist\/bin\/libxul.so\n&lt;snip&gt;<\/code><\/pre>\n<p>It's in the Global Offset Table, that's typically something that will have been relocated. It smells like there's a packed relocation for this, which would confirm our new chicken and egg problem. First, we find the non-relocated virtual address of the <code>.got<\/code> section in libxul.so.<\/p>\n<pre><code class=\"language-shell\">$ readelf -SW obj-x86_64-pc-linux-gnu\/dist\/bin\/libxul.so | grep &#039;.got &#039;\n  [28] .got              PROGBITS        000000000ab7aed8 ab78ed8 020210 00  WA  0   0  8<\/code><\/pre>\n<p>So that 0x000000000ab7aed8 is loaded at 0x00007ffff28eaed8. Then we check if there's a relocation for the non-relocated virtual address of 0x7ffff290b0e0.<\/p>\n<pre><code class=\"language-shell\">$ readelf -r obj-x86_64-pc-linux-gnu\/dist\/bin\/libxul.so | grep -e Relocation -e $(printf %x $((0x7ffff290b0e0 - 0x00007ffff28eaed8 + 0x000000000ab7aed8)))\nRelocation section &#039;.rela.dyn&#039; at offset 0x28028 contains 1404 entries:\nRelocation section &#039;.relr.dyn&#039; at offset 0x303c8 contains 13406 entries:\n000000000ab9b0e0\nRelocation section &#039;.rela.plt&#039; at offset 0x4a6b8 contains 2635 entries:<\/code><\/pre>\n<p>And there is, and it is a <code>RELR<\/code> one, one of those that we're supposed to apply ourselves... we're kind of doomed aren't we? But how come this wasn't a problem with librelr-test.so? Let's find out in the corresponding code there:<\/p>\n<pre><code class=\"language-shell\">$ objdump -d librelr-test.so\n&lt;snip&gt;\n    11e1:       48 8b 05 30 21 00 00    mov    0x2130(%rip),%rax        # 3318 &lt;__executable_start@Base&gt;\n&lt;snip&gt;\n$ readelf -SW librelr-test.so\n&lt;snip&gt;\n  [20] .got              PROGBITS        0000000000003308 002308 000040 08  WA  0   0  8\n&lt;snip&gt;\n$ readelf -r librelr-test.so | grep -e Relocation -e 3318\nRelocation section &#039;.rela.dyn&#039; at offset 0x450 contains 7 entries:\n000000003318  000300000006 R_X86_64_GLOB_DAT 0000000000000000 __executable_start + 0\nRelocation section &#039;.rela.plt&#039; at offset 0x4f8 contains 1 entry:\nRelocation section &#039;.relr.dyn&#039; at offset 0x510 contains 3 entries:<\/code><\/pre>\n<p>We had a relocation through symbol resolution, which the dynamic loader applies before calling our initialization code. That's what saved us, but all things considered, that is not exactly great either.<\/p>\n<p>How do we avoid this? Well, let's take a step back, and consider why the GOT is being used. Our code is just using the address of <code>__executable_start<\/code>, and the compiler doesn't know where it is (the symbol is extern). Since it doesn't know where it is, and whether it will be in the same binary, and because we are building Position Independent Code, it uses the GOT, and a relocation will put the right address in the GOT. At link time, when the linker knows the symbol is in the same binary, it ends up using a relative relocation, which causes our problem.<\/p>\n<p>So, how do we avoid using the GOT? By making the compiler aware that the symbol is eventually going to be in the same binary, which we can do by marking it with the hidden visibility.<\/p>\n<p>Replacing<\/p>\n<pre><code class=\"language-C\">extern ElfW(Ehdr) __executable_start;<\/code><\/pre>\n<p>with<\/p>\n<pre><code class=\"language-C\">extern __attribute__((visibility(&quot;hidden&quot;))) ElfW(Ehdr) __executable_start;<\/code><\/pre>\n<p>will do that for us. And after rebuilding, and re-hacking, our Firefox works, yay!<\/p>\n<h2>Let's try other binaries<\/h2>\n<p>Let's now try with the main Firefox binary.<\/p>\n<pre><code class=\"language-shell\">$ objcopy --dump-section .dynamic=dyn obj-x86_64-pc-linux-gnu\/dist\/bin\/firefox\n$ xxd dyn | sed &#039;\/: 2[345]00\/s\/ 0000\/ 0080\/&#039; | xxd -r &gt; dyn.new\n$ objcopy --update-section .dynamic=dyn.new obj-x86_64-pc-linux-gnu\/dist\/bin\/firefox\n$ .\/mach run\n 0:00.15 \/path\/to\/obj-x86_64-pc-linux-gnu\/dist\/bin\/firefox -no-remote -profile \/path\/to\/obj-x86_64-pc-linux-gnu\/tmp\/profile-default\n$ echo $?\n245<\/code><\/pre>\n<p>We crashed again. Come on! What is it this time?<\/p>\n<pre><code class=\"language-shell\">$ .\/mach run --debug\n&lt;snip&gt;\n(gdb) run\n&lt;snip&gt;\nProgram received signal SIGSEGV, Segmentation fault.\n0x0000000000032370 in ?? ()\n(gdb) bt\n#0  0x0000000000032370 in ?? ()\n#1  0x00005555555977be in phc_init (aMallocTable=0x7fffffffdb38, aBridge=0x555555626778 &lt;greplacemallocbridge&gt;)\n    at \/path\/to\/memory\/replace\/phc\/PHC.cpp:1700\n#2  0x00005555555817c5 in init () at \/path\/to\/memory\/build\/mozjemalloc.cpp:5213\n#3  0x000055555558196c in Allocator&lt;replacemallocbase&gt;::malloc (arg1=72704) at \/path\/to\/memory\/build\/malloc_decls.h:51\n#4  malloc (arg1=72704) at \/path\/to\/memory\/build\/malloc_decls.h:51\n#5  0x00007ffff7ca57ba in (anonymous namespace)::pool::pool (this=0x7ffff7e162c0 &lt;(anonymous namespace)::emergency_pool&gt;)\n    at ..\/..\/..\/..\/src\/libstdc++-v3\/libsupc++\/eh_alloc.cc:123\n#6  __static_initialization_and_destruction_0 (__priority=65535, __initialize_p=1)\n    at ..\/..\/..\/..\/src\/libstdc++-v3\/libsupc++\/eh_alloc.cc:262\n#7  _GLOBAL__sub_I_eh_alloc.cc(void) () at ..\/..\/..\/..\/src\/libstdc++-v3\/libsupc++\/eh_alloc.cc:338\n#8  0x00007ffff7fcfabe in call_init (env=0x7fffffffdd00, argv=0x7fffffffdcd8, argc=4, l=&lt;optimized out&gt;) at .\/elf\/dl-init.c:70\n#9  call_init (l=&lt;\/optimized&gt;&lt;optimized out&gt;, argc=4, argv=0x7fffffffdcd8, env=0x7fffffffdd00) at .\/elf\/dl-init.c:26\n#10 0x00007ffff7fcfba4 in _dl_init (main_map=0x7ffff7ffe2e0, argc=4, argv=0x7fffffffdcd8, env=0x7fffffffdd00) at .\/elf\/dl-init.c:117\n#11 0x00007ffff7fe5a60 in _dl_start_user () from \/lib64\/ld-linux-x86-64.so.2\n#12 0x0000000000000004 in ?? ()\n#13 0x00007fffffffdfae in ?? ()\n#14 0x00007fffffffdfe2 in ?? ()\n#15 0x00007fffffffdfed in ?? ()\n#16 0x00007fffffffdff6 in ?? ()\n#17 0x0000000000000000 in ?? ()\n(gdb) info symbol 0x00007ffff7ca57ba\n_GLOBAL__sub_I_eh_alloc.cc + 58 in section .text of \/lib\/x86_64-linux-gnu\/libstdc++.so.6<\/code><\/pre>\n<p>Oh boy! So here, what's going on is that the libstdc++ initializer is called before Firefox's, and that initializer calls malloc, which is provided by the Firefox binary, but because Firefox's initializer hasn't run yet, the code in its allocator that depends on relative relocations fails...<\/p>\n<p>Let's... just workaround this by disabling the feature of the Firefox allocator that requires those relocations:<\/p>\n<pre><code class=\"language-shell\">ac_add_options --disable-replace-malloc<\/code><\/pre>\n<p>Rebuild, re-hack, and... <a href=\"https:\/\/youtu.be\/ZmInkxbvlCs?t=108\">Victory is mine!<\/a><\/p>\n<h2>Getting this in production<\/h2>\n<p>So far, we've looked at how we can achieve the same as elfhack with a simpler and more reliable strategy, that will allow us to consistently use lld across platforms and build types. Now that the approach has been validated, we can proceed with writing the actual code and hooking it in the Firefox build system.  Our strategy here will be for our new tool to act as the linker. It will take all the arguments the compiler passes it, and will itself call the real linker with all the required extra arguments, including the object file containing the code to apply the relocations.<\/p>\n<p>Of course, I also encountered some more grievances. For example, GNU ld doesn't define the <code>__executable_start<\/code> symbol when linking shared libraries, contrary to lld. Thankfully, it defines <code>__ehdr_start<\/code>, with the same meaning (and so does lld). There are also some details I left out for the <code>_init<\/code> function, which normally takes 3 arguments, and that the actual solution will have to deal with. It will also have to deal with &quot;Relocation Read-Only&quot; (relro), but for that, we can just reuse the code from elfhack.<\/p>\n<p>The code already exists, and <a href=\"https:\/\/bugzilla.mozilla.org\/show_bug.cgi?id=1839740\">is up for review<\/a> (this post was written in large part to give reviewers some extra background). The code handles desktop Linux for now (Android support will come later ; it will require a couple adjustments), and is limited to shared libraries (until the allocator is changed to avoid using relative relocations). It's also significantly smaller than elfhack.<\/p>\n<pre><code class=\"language-shell\">$ loc build\/unix\/elfhack\/elf*\n--------------------------------------------------------------------------------\n Language             Files        Lines        Blank      Comment         Code\n--------------------------------------------------------------------------------\n C++                      2         2393          230          302         1861\n C\/C++ Header             1          701          120           17          564\n--------------------------------------------------------------------------------\n Total                    3         3094          350          319         2425\n--------------------------------------------------------------------------------\n$ loc build\/unix\/elfhack\/relr* \n--------------------------------------------------------------------------------\n Language             Files        Lines        Blank      Comment         Code\n--------------------------------------------------------------------------------\n C++                      1          443           32           62          349\n C\/C++ Header             1           25            5            3           17\n--------------------------------------------------------------------------------\n Total                    2          468           37           65          366\n--------------------------------------------------------------------------------<\/code><\/pre>\n<p>(this excludes the code to apply relocations, which is shared between both)<\/p>\n<p>This is the beginning of the end for elfhack. Once &quot;relrhack&quot; is enabled in its place, it will be left around for Firefox downstream builds on systems with older linkers that don't support the necessary flags. Elfhack will eventually be removed when support for those systems is dropped, in a few years. Further down the line, we'll be able to retire both tools, as support for RELR relocations become ubiquitous.<\/p>\n<p>As anticipated, this was a long post. Thank you for sticking to the end.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>(I haven&#8217;t posted a lot in the past couple years, except for git-cinnabar announcements. This is going to be a long one, hold tight) This is quite the cryptic title, isn&#8217;t it? What is this all about? ELF (Executable and Linkable Format) is a file format used for binary files (e.g. executables, shared libraries, object [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[25],"tags":[23],"class_list":["post-4297","post","type-post","status-publish","format-standard","hentry","category-planet-mozilla","tag-en"],"_links":{"self":[{"href":"https:\/\/glandium.org\/blog\/index.php?rest_route=\/wp\/v2\/posts\/4297","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/glandium.org\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/glandium.org\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/glandium.org\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/glandium.org\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=4297"}],"version-history":[{"count":17,"href":"https:\/\/glandium.org\/blog\/index.php?rest_route=\/wp\/v2\/posts\/4297\/revisions"}],"predecessor-version":[{"id":4344,"href":"https:\/\/glandium.org\/blog\/index.php?rest_route=\/wp\/v2\/posts\/4297\/revisions\/4344"}],"wp:attachment":[{"href":"https:\/\/glandium.org\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=4297"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/glandium.org\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=4297"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/glandium.org\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=4297"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}