Archive for February, 2024

When undefined behavior causes a nonsensical error (in Rust)

This all started when I looked at whether it would be possible to build Firefox with Pointer Authentication Code for arm64 macOS. In case you're curious, the quick answer is no, because Apple essentially hasn't upstreamed the final ABI for it yet, only Xcode clang can produce it, and obviously Rust can't.

Anyways, the Rust compiler did recently add the arm64e-apple-darwin target (which, as mentioned above, turns out to be useless for now), albeit without a prebuilt libstd (so, requiring the use of the -Zbuild-std flag). And by recently, I mean in 1.76.0 (in beta as of writing).

So, after tricking the Firefox build system into accepting to build for that target, I ended up with a Firefox build that... crashed on startup, saying:

Hit MOZ_CRASH(unsafe precondition(s) violated: slice::from_raw_parts requires the pointer to be aligned and non-null, and the total size of the slice not to exceed isize::MAX) at /builds/worker/fetches/rustc/lib/rustlib/src/rust/library/core/src/panicking.rs:155"

(MOZ_CRASH is what we get on explicit crashes, like MOZ_ASSERT in C++ code, or assert!() in Rust)

The caller of the crashing code was NS_InvokeByIndex, so at this point, I was thinking XPConnect might need some adjustement for arm64e.

But that was a build I had produced through the Mozilla try server. So I did a local non-optimized debug build to see what's up, which crashed with a different message:

Hit MOZ_CRASH(slice::get_unchecked requires that the index is within the slice) at /Users/glandium/.rustup/toolchains/nightly-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/slice/index.rs:228

This comes from this code in rust libstd:

    unsafe fn get_unchecked(self, slice: *const [T]) -> *const T {
        debug_assert_nounwind!(
            self < slice.len(),
            "slice::get_unchecked requires that the index is within the slice",
        );
        // SAFETY: the caller guarantees that `slice` is not dangling, so it
        // cannot be longer than `isize::MAX`. They also guarantee that
        // `self` is in bounds of `slice` so `self` cannot overflow an `isize`,
        // so the call to `add` is safe.
        unsafe {
            crate::hint::assert_unchecked(self < slice.len());
            slice.as_ptr().add(self)
        }
    }

(I'm pasting the whole thing because it will be important later)

We're hitting the debug_assert_nounwind.

The calling code looks like the following:

let end = atoms.get_unchecked(STATIC_ATOM_COUNT) as *const _;

And what the debug_assert_nounwind means is that STATIC_ATOM_COUNT is greater or equal to the slice size (spoiler alert: it is equal).

At that point, I started to suspect this might be a more general issue with the new Rust version, rather than something limited to arm64e. And I was kind of right? Mozilla automation did show crashes on all platforms when building with Rust beta (currently 1.76.0). But that was a different, and non-sensical crash:

Hit MOZ_CRASH(attempt to add with overflow) at servo/components/style/gecko_string_cache/mod.rs:77

But this time, it was in the same vicinity as the crash I was getting locally.

Since this was talking about an overflowing addition, I wrapped both terms in dbg!() to see the numbers and... the overflow disappeared but now I was getting a plain crash:

application crashed [@ <usize as core::slice::index::SliceIndex<[T]>>::get_unchecked]

(still from the same call to get_unchecked, at least)

The problem was fixed by essentially removing the entire code that was using get_unchecked. めでたしめでたし.

But this was too weird to leave it at that.

So what's going on?

Well, first is that despite there being a debug_assert, debug builds don't complain about the out-of-bounds use of get_unckecked. Only when using -Zbuild-std does it happen. I'm not sure whether that's intended, but I opened an issue about it to figure out.

Second, in the code I pasted from get_unchecked, the hint::assert_unchecked is new in 1.76.0 (well, it was intrinsics::assume in 1.76.0 and became hint::assert_unchecked in 1.77.0, but it wasn't there before). This is why our broken code didn't cause actual problems until now.

What about the addition overflow?

Well, this is where undefined behavior leads the optimizer to do what the user might perceive as weird things, but they actually make sense (as usual with these things involving undefined behavior). Let's start with a standalone version of the original code, simplifying the types used originally:

#![allow(non_upper_case_globals, non_snake_case, dead_code)]

#[inline]
fn static_atoms() -> &'static [[u32; 3]; STATIC_ATOM_COUNT] {
    unsafe {
        let addr = &gGkAtoms as *const _ as usize + kGkAtomsArrayOffset as usize;
        &*(addr as *const _)
    }
}

#[inline]
fn valid_static_atom_addr(addr: usize) -> bool {
    unsafe {
        let atoms = static_atoms();
        let start = atoms.as_ptr();
        let end = atoms.get_unchecked(STATIC_ATOM_COUNT) as *const _;
        let in_range = addr >= start as usize && addr < end as usize;
        let aligned = addr % 4 == 0;
        in_range && aligned
    }
}

fn main() {
    println!("{:?}", valid_static_atom_addr(0));
}

Stick this code in a newly created crate (with e.g. cargo new testcase), and run it:

$ cargo +nightly run -q
false

Nothing obviously bad happened. So what went wrong in Firefox? In my first local attempt, I had -Zbuild-std, so let's try that:

$ cargo +nightly run -q -Zbuild-std --target=x86_64-unknown-linux-gnu
thread 'main' panicked at /home/glandium/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/slice/index.rs:228:9:
slice::get_unchecked requires that the index is within the slice
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread caused non-unwinding panic. aborting.

There we go, we hit that get_unchecked error. But what went bad in Firefox if the reduced testcase doesn't crash without -Zbuild-std? Well, Firefox is always built with optimizations on by default, even for debug builds.

$ RUSTFLAGS=-O cargo +nightly run -q
thread 'main' panicked at src/main.rs:10:20:
attempt to add with overflow
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Interestingly, though, changing the addition to

        let addr = dbg!(&gGkAtoms as *const _ as usize) + dbg!(kGkAtomsArrayOffset as usize);

doesn't "fix" it like it did with Firefox, but it shows:

[src/main.rs:10:20] &gGkAtoms as *const _ as usize = 94400145014784
[src/main.rs:10:59] kGkAtomsArrayOffset as usize = 61744
thread 'main' panicked at src/main.rs:10:20:
attempt to add with overflow
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

which is even funnier, because you can see that adding those two numbers is definitely not causing an overflow.

Let's take a look at what LLVM is doing with this code across optimization passes, with the following command (on the initial code without dbg!(), and with a #[inline(never)] on valid_static_atom_addr):

RUSTFLAGS="-C debuginfo=0 -O -Cllvm-args=-print-changed=quiet" cargo +nightly run -q 

Here is what's most relevant to us. First, what the valid_static_atom_addr function looks like after inlining as_ptr into it:

*** IR Dump After InlinerPass on (_ZN8testcase22valid_static_atom_addr17h778b64d644106c67E) ***
; Function Attrs: noinline nonlazybind uwtable
define internal fastcc noundef zeroext i1 @_ZN8testcase22valid_static_atom_addr17h778b64d644106c67E(i64 noundef %0) unnamed_addr #3 {
  %2 = call fastcc noundef align 4 dereferenceable(31212) ptr @_ZN8testcase12static_atoms17hde3e2dda1d3edc34E()
  call void @llvm.experimental.noalias.scope.decl(metadata !4)
  %3 = call fastcc noundef align 4 dereferenceable(12) ptr @"_ZN4core5slice29_$LT$impl$u20$$u5b$T$u5d$$GT$13get_unchecked17he5e8081ea9f9099dE"(ptr noalias noundef nonnull readonly align 4 %2, i64 noundef 2601, i64 noundef 2601)
  %4 = icmp eq ptr %2, null
  ret i1 %4 
}

At this point, we've already done some constant propagation, and we can see the call to get_unchecked is done with constants.

What comes next, after inlining both static_atoms and get_unchecked:

*** IR Dump After InlinerPass on (_ZN8testcase22valid_static_atom_addr17h778b64d644106c67E) ***
; Function Attrs: noinline nonlazybind uwtable
define internal fastcc noundef zeroext i1 @_ZN8testcase22valid_static_atom_addr17h778b64d644106c67E(i64 noundef %0) unnamed_addr #2 {
  %2 = call { i64, i1 } @llvm.uadd.with.overflow.i64(i64 ptrtoint (ptr @_ZN8testcase8gGkAtoms17h338a289876067f43E to i64), i64 61744)
  %3 = extractvalue { i64, i1 } %2, 1             
  br i1 %3, label %4, label %5, !prof !4

4:                                                ; preds = %1
  call void @_ZN4core9panicking5panic17hae453b53e597714dE(ptr noalias noundef nonnull readonly align 1 @str.0, i64 noundef 28, ptr noalias noundef nonnull readonly align 8 dereferenceable(24) @2) #9                      
  unreachable

5:                                                ; preds = %1
  %6 = extractvalue { i64, i1 } %2, 0
  %7 = inttoptr i64 %6 to ptr
  call void @llvm.experimental.noalias.scope.decl(metadata !5)
  unreachable

8:                                                ; No predecessors!
  %9 = icmp eq ptr %7, null
  ret i1 %9
}

The first basic block has two exits: 4 and 5, depending on how the add with overflow performed. Both of these basic blocks finish in... unreachable. The first one because it's the panic case for the overflow, and the second one because both values passed to get_unchecked are constants and equal, which the compiler has been hinted (with hint::assert_unchecked) that it's not possible. Thus, once get_unchecked is inlined, what's left is unreachable code. And because we're not rebuilding libstd, the debug_assert is not there before the unreachable annotation. Finally, the last basic block is now orphan.

Imagine you're an optimizer, and you want to optimize this code considering all its annotations. Well, you'll start by removing the orphan basic block. Then you see that the basic block 5 doesn't do anything, and doesn't have side effects, so you just remove it. Which means the branch leading to it can't happen. Basic block 4? There's a function call, so it would have to stay there, and so would the first basic block.

Guess what the Control-Flow Graph pass did? Just that:

*** IR Dump After SimplifyCFGPass on _ZN8testcase22valid_static_atom_addr17h778b64d644106c67E ***
; Function Attrs: noinline nonlazybind uwtable
define internal fastcc noundef zeroext i1 @_ZN8testcase22valid_static_atom_addr17h778b64d644106c67E(i64 noundef %0) unnamed_addr #2 {
  %2 = call { i64, i1 } @llvm.uadd.with.overflow.i64(i64 ptrtoint (ptr @_ZN8testcase8gGkAtoms17h338a289876067f43E to i64), i64 61744)
  %3 = extractvalue { i64, i1 } %2, 1
  call void @llvm.assume(i1 %3)
  call void @_ZN4core9panicking5panic17hae453b53e597714dE(ptr noalias noundef nonnull readonly align 1 @str.0, i64 noundef 28, ptr noalias noundef nonnull readonly align 8 dereferenceable(24) @2) #9
  unreachable
}

Now, there's no point doing the addition at all, since we're not even looking at its result:

*** IR Dump After InstCombinePass on _ZN8testcase22valid_static_atom_addr17h778b64d644106c67E ***
; Function Attrs: noinline nonlazybind uwtable
define internal fastcc noundef zeroext i1 @_ZN8testcase22valid_static_atom_addr17h778b64d644106c67E(i64 noundef %0) unnamed_addr #2 {
  call void @llvm.assume(i1 icmp uge (i64 ptrtoint (ptr @_ZN8testcase8gGkAtoms17h338a289876067f43E to i64), i64 -61744))
  call void @_ZN4core9panicking5panic17hae453b53e597714dE(ptr noalias noundef nonnull readonly align 1 @str.0, i64 noundef 28, ptr noalias noundef nonnull readonly align 8 dereferenceable(24) @2) #9
  unreachable
}

And this is how a hint that undefined behavior can't happen transformed get_unchecked(STATIC_ATOM_COUNT) into an addition overflow that never happened.

Obviously, this all doesn't happen with -Zbuild-std, because in that case the get_unchecked branch has a panic call that is still relevant.

$ RUSTFLAGS=-O cargo +nightly run -q -Zbuild-std --target=x86_64-unknown-linux-gnu
thread 'main' panicked at /home/glandium/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/slice/index.rs:228:9:
slice::get_unchecked requires that the index is within the slice
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread caused non-unwinding panic. aborting.

What about non-debug builds?

$ cargo +nightly run --release -q
Illegal instruction

In those builds, because there is no call to display a panic, the entire function ends up unreachable:

define internal fastcc noundef zeroext i1 @_ZN8testcase22valid_static_atom_addr17h9d1fc9abb5e1cc3aE(i64 noundef %0) unnamed_addr #4 {
  unreachable
} 

So thanks to the magic of hints and compiler optimization, we have code that invokes undefined behavior that

  • crashes when built with cargo build --release
  • works when built with cargo build
  • says there's an addition overflow when built with RUSTFLAGS=-O cargo build.

And none of those give a hint as to what the real problem is.

2024-02-01 10:38:09+0900

p.m.o | No Comments »