EDIT 1: I initially wrote that the compiler-builtins crate is used by alloc, but it’s just pinned in alloc. Actually, it’s used everywhere during the linking process, exposing symbols for the compiler to use as a fallback when the backend can’t replace trivial routines with assembly code.

Also fixed a dead-link to Leptos.

Thanks @tgross35 for the feedback 🙏!

Context

I recently got my first commit merged into rust-lang, co-authored with @tgross35. It was a hotfix for the compiler-builtins crate, which is used by the alloc crate. The hotfix targeted a bug that occurred on some platforms when rebuilding the toolchain libraries like alloc and std using the -Zbuild-std flag on nightly cargo in a non-optimized profile (e.g., dev).

I know! That sounds really specific 🙄 But the blast radius of that bug was huge—it basically rendered any rust code compiled in this way inoperable on these platforms 😳!

Any bitwise left shift would recurse indefinitely. You may think that’s nothing but a lot of arithmetic operations actually cause shifts.

At the time of writing, only WebAssembly (wasm32-unknown-unknown) and SPARC have been affected by this bug, but I suspect more platforms were impacted.

How I stumbled upon this bug?

I am working on making Leptos, a Rust web framework, capable of compiling its server-side as a WebAssembly Component so it can be served on wasmCloud. ✨

I will likely present my work in another blog post, but for now, let’s focus on this bug.

Basically, the framework allows us to host a server that serves a basic HTML document and streams content that takes time to load as they become ready, improving SEO and UX. (Learn more here)

The client is a thin JavaScript shim that loads a wasm32-unknown-unknown module in the browser. This module is actually responsible for the heavy lifting (reactivity, routing, etc…).

I implemented a simple counter that increments as you click on a button. Nothing fancy, right? 😏

Increments as you click the button
A trivial reactive counter

Increments as you click the button

It worked well until I actually limit-tested my server-side implementation. I made the counter persistent on the server-side to show that the server (that is a WebAssembly component) was indeed handling the counter behind the scenes, storing it using a simple wasi:filesystem.

It worked until the counter reached 10, after that, the WebAssembly module running in the browser would crash with a stack overflow.

Me in front of the stack trace
POV

Me in front of the stack trace

I immediately blamed myself for being a poor developper and started to look at my server-side code since it was what I had touched. Turned out it was also bugged if you used stable Leptos without my changes. 😌

The issue was arising in serde_json in the browser wasm module. 💥

What’s the problem?

Here is the stack trace:

We can see it recurses on $__ashlti3
In-Browser Stack Trace

We can see it recurses on $__ashlti3

I forked the serde_json repository to isolate the instruction causing the issue by implementing a super simple logger that uses web_sys to log into the browser console.

I ended up on (source)

significand = significand * 10 + digit;

Wow! That’s a trivial operation on a primitive type (u64) 😶‍🌫️! I went to look for information on the $__multi3 symbol and ended-up on this. I learned a bit about the compiler-rt library, which seems to be used by LLVM to abstract away low-level and target-specific implementations of what I would call “fundamental” operations.

It turns out that the Rust toolchain has its own implementation of the builtins of compiler-rt in a crate called compiler-builtins.

I knew I was in for something pretty mind-blowing, so I reached out to people smarter than me and headed to the Rust Language Zulip, where @tgross35 promptly (like legit 1 minute after my initial message) came to the rescue 🦸!

You can find the 185 messages-long thread here.

How we identified it?

With good intuitions from @tgross35! 💪

The thing is the code, on paper, is meant to recurse indefinitely.

Click here if you have interest in the theoretical explaination.

If you go into the int module of the crate, you will see two traits: HInt (for Half Int) and DInt (for Double Int), which are used to mark relationships between different integer types. HInt has an associated type type D: DInt and vice-versa. An exemple is worth a thousand words: u8 is HInt<D = u16>, u16 is DInt<H = u8> but is also HInt<D = u32>, does it make sense? 🤓

Now, let val be a value of type X: HInt. This implies that it has an X::D type, which itself have an X::D::H type that is, by definition, X (X::D::H = X).

Let’s call val.widen_hi() (it is invoked in the context of __multi3). The implementation of widen_hi() shifts X::BITS bits left on val cast to an X::D (<val as X::D> << X::BITS).

Using our previous example, if val was a u8, then casting it to D would make it a u16. Calling widen_hi(), we shift u8::BITS (= 8) bits left, so the initial 8 bits end up in the higher half of the u16.

But how is left shift implemented?

With a builtin aswell. Do you remember this __ashlti3 in the stack trace? Well that’s our left shift implementation! You can find its definition here.

And it ends up calling X::D::ashl(val, X::BITS)).

I leave it to the reader to explore the body of this function!

But let shl = X::BITS, and n_h = X::D::H::BITS.

Since X::D::H = X, we have n_h = X::BITS. We hence have n_h = shl, which implies shl & n_h != 0 if shl != 0 (which is the case since there exists no X such that X::BITS = 0).

We enter the first branch and execute val.lo().wrapping_shl(shl - n_h).widen_hi(). Notice that, IIUC, val.lo() = val: since we doubled the width of the integer but haven’t shifted it left yet, the lower half (lo()) is equal to the initial value!

The operand to wrapping_shl is 0 since shl = n_h. Now, that’s where theory stops and I need to make a guess:

wrapping_shl maps directly to LLVM intrinsics and we need to wonder if shifting left by 0 is defined behaviour, I will assume it is, and it just return the input value as-is.

(Please, if that’s not correct, reach out to me!)

Which means val.lo().wrapping_shl(0) can just be written val.lo(), which we said is val. So we can reduce val.lo().wrapping_shl(shl - n_h).widen_hi() to val.widen_hi(), and we have our recursion. 🎉


So what is the trick? It turns out our fate is left in the hands of rustc. We rely on the compiler to break the recursion by replacing the call to left shift with native code instead of calling the builtin implementing the left shift (__ashlti3)! 😱

Though, don’t ask me at which compilation step this happens, IIUC, that may be when we work with MIR but Je ne parle pas rustc for now, so I let experts answer that point!

Reading the WebAssembly machine code, we realised that the faulty implementation was making calls while the working one was pure machine code! 💡

Here is our bug! 😏

How we fixed it?

With a certain degree of luck and educated guesses from @tgross35.

We tried to give strong inlining hints to rustc, hopping that it would replace the calls with machine code, but it didn’t work.

In the end, the problem was that the code had changed. Instead of having widen_hi() defined in the context of an impl HInt for primitive, we had a default implementation of widen_hi(), defined at the trait-level.

I reversed this in my commit and it worked ™️ ✨.

Why?

That’s not formally answered as of today.

My completly empirical guess is that having a default implementation in the trait caused monomorphisation changes, meaning the compiler no longer interpreted the self.widen_hi() as a primitive type but as a MinInt. When increasing the optimisation level though, the compiler (or LLVM) seems to be able to fix that.

I don’t have time to deep-dive into that. I need to get back to my work on Leptos, but if anyone debugs that deeper, please ping me on the Rust Lang Zulip 🙏 I would be really interested!

Conclusion

This experience made me reflect on several points.

First, I realised how shaky our foundations are. I don’t mean it in a harsh way! Current software foundations were there first; they gave us precious data and experience and, as a personal side-effect, gave me a job. Only a fool would ignore the knowledge that this experience conveys.

But, it is clear that we must be more careful as an industry going forward. That reinforced my alignment with the Bytecode Alliance’s vision even though, to be fair, I had an interest in capability-based OSes and other security 🍬 long before that.

Finally, I am amazed by the dedication of the open-source community and pepole like @tgross35! He made me feel really welcome, which, in turn, made me comfortable going on this debugging journey with him!

We started debugging at 21:20 CET and I left at 4:07 CET 🙃 Time flies when you are having fun! ❤️ 🦀

Anyway, it is time for me to go back to my work on Leptos!

Peace 👋