EDIT 1: I initially wrote that the
compiler-builtins
crate is used byalloc
, but it’s just pinned inalloc
. Actually, it’s used everywhere during the linking process, exposing symbols for the compiler to use as a fallback when the backend can’t replace trivial routines with assembly code.Also fixed a dead-link to Leptos.
Thanks @tgross35 for the feedback 🙏!
Context
I recently got my first commit merged into rust-lang
, co-authored with
@tgross35. It was a hotfix for the compiler-builtins
crate, which is used
by the alloc
crate. The hotfix targeted a bug that occurred on some platforms
when rebuilding the toolchain libraries like alloc
and std
using the
-Zbuild-std
flag on nightly cargo
in a non-optimized profile (e.g., dev
).
I know! That sounds really specific 🙄 But the blast radius of that bug was huge—it basically rendered any rust code compiled in this way inoperable on these platforms 😳!
Any bitwise left shift would recurse indefinitely. You may think that’s nothing but a lot of arithmetic operations actually cause shifts.
At the time of writing, only WebAssembly (wasm32-unknown-unknown
) and SPARC
have been affected by this bug, but I suspect more platforms were impacted.
How I stumbled upon this bug?
I am working on making Leptos, a Rust web framework, capable of compiling its server-side as a WebAssembly Component so it can be served on wasmCloud. ✨
I will likely present my work in another blog post, but for now, let’s focus on this bug.
Basically, the framework allows us to host a server that serves a basic HTML document and streams content that takes time to load as they become ready, improving SEO and UX. (Learn more here)
The client is a thin JavaScript shim that loads a wasm32-unknown-unknown
module in the browser. This module is actually responsible for the heavy lifting
(reactivity, routing, etc…).
I implemented a simple counter that increments as you click on a button. Nothing fancy, right? 😏
It worked well until I actually limit-tested my server-side implementation.
I made the counter persistent on the server-side to show that the server (that is a
WebAssembly component) was indeed handling the counter behind the scenes, storing it
using a simple wasi:filesystem
.
It worked until the counter reached 10, after that, the WebAssembly module running in the browser would crash with a stack overflow.
I immediately blamed myself for being a poor developper and started to look at my server-side code since it was what I had touched. Turned out it was also bugged if you used stable Leptos without my changes. 😌
The issue was arising in serde_json
in the browser wasm module. 💥
What’s the problem?
Here is the stack trace:
I forked the serde_json
repository to isolate the instruction causing the issue
by implementing a super simple logger that uses web_sys
to log into the browser
console.
I ended up on (source)
significand = significand * 10 + digit;
Wow! That’s a trivial operation on a primitive type (u64
) 😶🌫️!
I went to look for information on the $__multi3
symbol and ended-up on
this.
I learned a bit about the compiler-rt
library, which seems to be used by LLVM
to abstract away low-level and target-specific implementations of what I would
call “fundamental” operations.
It turns out that the Rust toolchain has its own implementation of the builtins
of compiler-rt
in a crate called compiler-builtins
.
I knew I was in for something pretty mind-blowing, so I reached out to people smarter than me and headed to the Rust Language Zulip, where @tgross35 promptly (like legit 1 minute after my initial message) came to the rescue 🦸!
You can find the 185 messages-long thread here.
How we identified it?
With good intuitions from @tgross35! 💪
The thing is the code, on paper, is meant to recurse indefinitely.
If you go into the Now, let Let’s call Using our previous example, if But how is left shift implemented? With a builtin aswell. Do you remember this And it ends up calling
I leave it to the reader to explore the body of this function! But let Since We enter the first branch and execute The operand to (Please, if that’s not correct, reach out to me!) Which means Click here if you have interest in the theoretical explaination.
int
module of the crate, you will see two traits: HInt
(for Half Int) and DInt
(for Double Int), which are used to mark relationships
between different integer types. HInt
has an associated type type D: DInt
and vice-versa. An exemple is worth a thousand words: u8
is HInt<D = u16>
,
u16
is DInt<H = u8>
but is also HInt<D = u32>
, does it make sense? 🤓val
be a value of type X: HInt
. This implies that it has an X::D
type, which itself have an X::D::H
type that is,
by definition, X
(X::D::H = X
).val.widen_hi()
(it is invoked in the context of __multi3
).
The implementation of widen_hi()
shifts X::BITS
bits left on val
cast to
an X::D
(<val as X::D> << X::BITS
).val
was a u8
,
then casting it to D
would make it a u16
. Calling widen_hi()
,
we shift u8::BITS
(= 8
) bits left, so the initial 8 bits end up
in the higher half of the u16
.__ashlti3
in the stack trace? Well that’s our left shift implementation!
You can find its definition
here.X::D::ashl(val, X::BITS))
.shl = X::BITS
, and n_h = X::D::H::BITS
.X::D::H = X
, we have n_h = X::BITS
.
We hence have n_h = shl
, which implies shl & n_h != 0
if shl != 0
(which is the case since there exists no X
such that X::BITS = 0
).val.lo().wrapping_shl(shl - n_h).widen_hi()
.
Notice that, IIUC, val.lo() = val
: since we doubled the width of the integer
but haven’t shifted it left yet, the lower half (lo()
) is equal to the initial
value!wrapping_shl
is 0 since shl = n_h
. Now, that’s where theory stops
and I need to make a guess:wrapping_shl
maps directly to LLVM intrinsics and we need to
wonder if shifting left by 0
is defined behaviour, I will assume it is,
and it just return the input value as-is.val.lo().wrapping_shl(0)
can just be written val.lo()
, which we said is val
.
So we can reduce val.lo().wrapping_shl(shl - n_h).widen_hi()
to val.widen_hi()
,
and we have our recursion. 🎉
So what is the trick? It turns out our fate is left in the hands of rustc
.
We rely on the compiler to break the recursion by replacing
the call to left shift with native code instead of calling the
builtin implementing the left shift (__ashlti3
)! 😱
Though, don’t ask me at which compilation step this happens,
IIUC, that may be when we work with MIR but Je ne parle pas
rustc
for now, so I let experts answer that point!
Reading the WebAssembly machine code, we realised
that the faulty implementation was making call
s while the working one
was pure machine code! 💡
Here is our bug! 😏
How we fixed it?
With a certain degree of luck and educated guesses from @tgross35.
We tried to give strong inlining hints to rustc
, hopping that it would replace
the calls with machine code, but it didn’t work.
In the end, the problem was that the code had changed. Instead of having
widen_hi()
defined in the context of an impl HInt for primitive
,
we had a default implementation of widen_hi()
, defined at the trait-level.
I reversed this in my commit and it worked ™️ ✨.
Why?
That’s not formally answered as of today.
My completly empirical guess is that having a default
implementation in the trait caused monomorphisation changes,
meaning the compiler no longer interpreted the self.widen_hi()
as a primitive type but as a MinInt
. When increasing the optimisation
level though, the compiler (or LLVM) seems to be able to fix that.
I don’t have time to deep-dive into that. I need to get back to my work on Leptos, but if anyone debugs that deeper, please ping me on the Rust Lang Zulip 🙏 I would be really interested!
Conclusion
This experience made me reflect on several points.
First, I realised how shaky our foundations are. I don’t mean it in a harsh way! Current software foundations were there first; they gave us precious data and experience and, as a personal side-effect, gave me a job. Only a fool would ignore the knowledge that this experience conveys.
But, it is clear that we must be more careful as an industry going forward. That reinforced my alignment with the Bytecode Alliance’s vision even though, to be fair, I had an interest in capability-based OSes and other security 🍬 long before that.
Finally, I am amazed by the dedication of the open-source community and pepole like @tgross35! He made me feel really welcome, which, in turn, made me comfortable going on this debugging journey with him!
We started debugging at 21:20 CET and I left at 4:07 CET 🙃 Time flies when you are having fun! ❤️ 🦀
Anyway, it is time for me to go back to my work on Leptos!
Peace 👋