How global is your context? And does it really need atomic reference counting?
Audio
Show Notes
- American Psycho business card scene
- Apple Keynote, Google Slides, SVG (Scalable Vector Graphics), EPS (encapsulated PostScript)
Arc<Mutex<T>
mut
and references and borrowing- variables and mutability in Rust
- Aleksey Kladov (@matklad)
- Aliasing XOR mutability (referred to as "Mutability XOR sharing")
- Amos' website fasterthanli.me, io_uring, Kubernetes reverse proxy, HTTP/2
tokio
,axum
,warp
,tide
,surf
- Ferrous Systems,
async-std
, dogfooding - Re: Send constraint in tokio, Local Async Executors and Why They Should be the Default
- Boats' blog post on thread-per-core (also called "share-nothing"), Glommio
- jsonc (json with comments),
serde
,arc-swap
,Box::new
andBox::Leak
grass_compiler
Poll::Pending
orPoll::Ready
thread_local!
macro,LocalKey
withwith_borrow
&with_borrow_mut
,RefCell
- Liquid templating engine
Transcript
Amos Wenger: Hey everyone, Amos here. Just wanted to apologize for the audio quality of this episode: I had my gain set up too high, I was too far away from the mic, there was AC in the background, we had to get rid of echo, had to get rid of clipping. You can hear that it's heavily processed. It was interesting, so we didn't want to throw away the episode or record it over again.
It gets better in the next few episodes. So thanks for your patience. I hear you. I love you. G'bye.
James Munns: I'm, I'm digging the new slide format. You went very like, uh, what is it? American Psycho eggshell and embossed print on us.
Amos Wenger: Oh, that's what the, the GIF was about!
James Munns: Oh, yeah.
Amos Wenger: I didn't know how to interpret that. And I was too scared to ask. It is one of the built-in Keynote themes.
Now you might be wondering: "Amos, why are you using Keynote? Do you, do you love Apple that much?" No, but I do hate Google and its Slides product that doesn't let you paste any vector format in there.
So you can't paste SVG. It prints an error about EPS: embedded PostScript. Which I know what that is-
James Munns: The other vector format.
Amos Wenger: I don't know what the fuck that's doing in their dialogue, but I know what that is. Yeah.
So apparently there's like a 13-step workaround to still import vector things in Google Slides. And of course, everybody who wants to get stuff done, just rasterizes it and gets a huge PNG in there instead.
But I don't want my presentations to be like. Really large, I don't know.
James Munns: Gotcha. Okay. I'm excited about this.
Amos Wenger: Cool! So should I just go? I, I'm gonna share my screen, I suppose. I can't see y'all anymore. This is just a tiny laptop. Okay, I guess I will do this one blind. That's okay.
Today I want to talk about Arc<Mutex<T>> and ways to avoid it.
First off, James, how do you say mutex? Do you say "mutt-ex"? "Mew-tek"?
James Munns: I say "mew-tex."
Amos Wenger: You say "mew-tex!"
James Munns: 'mu' as in like the SI prefix and then 'tex' as in Texas.
Amos Wenger: I noticed you say "mutt", you don't say "mute" for references, exclusive references.
James Munns: There's a couple things that I feel like I say contextually different. Like mut is definitely one. There's a bunch. Like, Rust has a bunch of non-word abbreviations.
Amos Wenger: That is true. It's a struggle when I make videos, I have to pick one and stick to it.
James Munns: Heh heh heh heh heh
Amos Wenger: And I think I'm switching from 'immutable reference' and 'mutable reference' to 'exclusive reference' and 'shared reference'. Even though those don't have the keyword in there.
James Munns: Yeah, Aleksey is the one that, when we were teaching together, he's the one that got me on that because he's like, it's better because one of the things we always had to teach when we were teaching was intermutability, which immediately breaks like the immutable reference and mutable reference thing because you're passing around an immutable reference to something that you mutate-
Amos Wenger: Yeah, yeah yeah.
James Munns: Inside of it, like a mutex.
Amos Wenger: Also... yeah, also, because of Rust's, like, mutability XOR sharing property, you sometimes use mutable references just to enforce some other invariants. You don't actually care about mutating it, but it's just a way to only have one of something. Um, anyway, that was the first, the title slide.
Uh, let's get going. So the thing with Arc<Mutex<T>> or Arc<RWLock<T>> or things like that is that Rust beginners use a lot of it. Like we recommend Rust beginners use it to get out of situations and not have to think about lifetimes. And I think that's good, but there comes a time where you want to get rid of them and I want to talk about some of the alternatives you can use specifically in the context of my website.
My website, uh, fasterthanli.me is 18,000 lines of Rust, which I realize actually doesn't sound like a big number, but it is a big number. And when you have to maintain it, especially when a lot of it is async Rust code and it mixes, uh, sync and async- I started writing it four years ago, and I've been maintaining it more or less.
Like every couple of months I go like, "Oh, there's new major versions of all the crates I rely on." There's lots of synchronous code and asynchronous code that are mixed up, but that's, that's not today's topic. Short version is just use channels. [laughter] And, uh, it's, it's pretty typical. I do weird things with HTTP.
I have, um HTTP implementation based on io_uring, which I'm getting funding for. But my website does not use it yet. I am planning on using that, but because my website is web application, it's not directly facing the web. It is actually on Kubernetes with a reverse proxy in front, I don't actually need to care about HTTPS. I don't need to care about HTTP/2, I just do plain text HTTP/1 and then the reverse proxy in front takes care of everything else.
So my plan is to actually use my own HTTP implementation for the reverse proxy as part of Kubernetes.
So my website is just axum on tokio
. I used to use warp
, which is also based on tokio
, but has weird combinators that I didn't like and blew up compilation times. And before that, I used tide
, which is in the async-std
ecosystem, let's say- cinematic universe.
I don't know if you remember that. Have you ever used, uh tide
and surf
and all that stuff, James?
James Munns: I'm definitely familiar with it. When I was at Ferrous, we were, I think, the one company that funded async-std
because the people who were working on it worked at Ferrous. So, I'm familiar with it, but this was before I did a lot of web stuff. But I was wondering if there's a specific French word for dogfooding.
Amos Wenger: Not that I know of... but I don't really talk about my work in French. That would require going out of the house and talking to other developers, which I do on occasion, but it's mostly an excuse to drink alcohol and in like socially acceptable setting. I don't know.
James Munns: I was gonna say the same in German. Like, my restaurant German is way better than my technical German because all the technical stuff that I do, even locally, everyone's speaking English for.
So, yeah. Okay. We'll stick with dogfooding.
Amos Wenger: So tokio
works with a thread pool. What that means is... let me- I want to try and give a short explanation, but I'm in danger of James jumping in and correcting me at any time, which is good- a good thing, this is why we have two hosts, I suppose. But basically what it means is that: you can do work in the background.
You can spawn tokio
tasks like you could spawn libstd, "lib stud" threads for the operating system level threads. But then because those tasks are futures, they can be polled, they can return saying, poll me later when something else has happened. There's no- there's nothing saying that the poll function of the future trait of that future will be called from the same thread the next time.
So they can move across threads. So they are Send
by definition, which is a huge headache because there's a lot of things that aren't Send
or a lot of things that aren't Sync
and you want to hold a reference to them. So that kind of restricts what you do.
And specifically in tokio
codebases with that model, you do end up having a lot Arc all the things, because you want several tasks to hold a reference to the configuration of your server or reference to any sort of context, any sort of secrets, anything, anything that like you might want to access in all the workers that service HTTP requests, for example.
James, do you have any thoughts so far? I'm catching my breath.
James Munns: Nah, you nailed it. I was gonna say, it's a, the "has to be Send
" is a tokio
specificism, just because they use, like you said, thread-stealing executors, where if one worker thread gets really busy, then the other ones can come along and go, "I'll take that off your plate." And so stuff doesn't get stalled for a very long time.
It's not specific, or it's not required by Rust async, but for tokio
, it's a design decision they make because they have... essentially, they looked at a lot of different people. Well, I don't know. I don't need to go into why. But yeah, it's a design decision they made, which is usually pretty reasonable.
But it is a point of a lot of half-informed blog posts discussing that decision.
Amos Wenger: That is true. Every, every month at least, every couple of weeks there's a new article saying why Rust async is bad. But that is, that is a tokio
specific thing or like this style executor. The alternative being thread-per-core pretty much where you, you have, can imagine, a current thread executor per CPU core, however many cores you want to dedicate to the app, and then you explicitly pass things between those.
You do message passing, for example, if you want to have a central, I don't know, API, that you'll query. If you want to share something, you usually do message passing... it's a different take on it.
James Munns: I know Boats has a better name than thread-per-core. Because thread-per-core Because like, tokio
does have a thread-per-core. It's just a worker thread, and stuff can move between that. I guess, like, there's like Glommio, I think...
Amos Wenger: tokio
can have less or more or...
James Munns: Yeah, it's true...
Amos Wenger: Like it, it manages the pool automatically. And then it has blocking threads, which is a separate pool and not really a pool because you can have any number of them if you just do a lot of file reads, for example.
Anyway, we're trying to get rid of Arc. So let me talk about the alternative. The first, the first thing is, for example, for my website. There's a config, it's a bunch of jsonc, I think, it's parsed with serde
, but it never changes. Sometimes you need to reload the config in some codebases, because you have a, yeah, dynamic reloading configuration.
In that case, you might want to reach for something like arc-swap
. which you just told me about, I just remembered about. But in my case, it never ever changes. And it's a long-running program. We need the config from the very start to the very end of the program. So you can just leak it. You just put it on the heap with Box::new
, and then do Box::leak
, and then you get a "mut", uh, well, an exclusive reference to it, that has lifetime 'static
and then you can just have that everywhere. You can have a global function that returns an immutable reference to that and that's something you don't need to pass everywhere and you know that except for the first few milliseconds of the program it is always initialized.
The other thing is that, you know, as a beginner you might naturally reach for Arc<Mutex<T>>
or Arc<RwLock<T>>
But if you don't actually ever change it, you could just do Arc<T>
directly. You don't need to wrap it in some sort of locking thing. It's not a magical combo.
They can be used separately.
Next up, if your value fits in an atomic, for example, on my website I have a dev environment and a production environment that's part of a config, but it's also handy to have in various functions.
For example, in development I show stack traces, in production I set secure cookies. If it's just a boolean, you can fit any u8
, you have an atomic u8
type in the standard library, you can just do that. You don't need an Arc, you don't need reference counting at all, it's just a u8
somewhere that you write to and read from atomically.
The trick I do want to talk about today is: if your global does not change while blocking, then you can use a thread-local.
James Munns: What do you mean by blocking?
Do you want to describe to me the use case that you usually use this for? Is it easier to work backwards from reality rather than trying to teach it?
Amos Wenger: Yes, So let's say you have an async task and you're going to call a blocking function. My use case is I have a bunch of SASS files in my website that define style sheets, they compile down to CSS and I use a crate called grass_compiler
. grass_compiler
lets you define functions that you can call from SASS and they end up being Rust functions so they can evaluate to anything you can call anything from it.
But those Rust functions are freestanding functions. They're not closures. You cannot capture any context. So in my case, that doesn't work. Maybe I need to do a database query to figure out the result of the function call. I need to know whether we're in development or environment. I need to know a lot of things.
So I need to pass in context somehow. And I need, I used to do something really dirty, which is I had a process wide lock, and then when you compile the CSS, it first acquired that lock and then wrote something in a process global, and then called the blocking function and then freed it. But then the insight was that even though we're in an async task, as long as we're being polled, as long as we don't yield back to the executor, which we can only do by returning, we own the current thread pretty much.
It's ours. There's only two ways we're letting go of it. Is that if we return with Pending::Poll
or pen, uh, what is the name of the... Poll::Pending
or
James Munns: Poll::Pending
or Poll::Ready
or whatever.
Amos Wenger: Or if we panic and then we unwind. If we, if we abort, that's not our problem. We can't deal with that. So. We're holding onto this thread, which means we can set a thread-local and then from the Rust function that's been registered as a grass
extension, we can read that — as long as we clean up things properly and, and to cover both the return and the panic unwind case, we can just have a guard struct that has a Drop
implementation that clears the thread-local when it's dropped, which happens, yeah, either on panic or on return.
James Munns: So the trick is that you're calling the compilation, which might then call a bunch of other functions like what is my base address? Am I in prod or not? So essentially you need to populate all the settings when you start the compilation, use those values while you're compiling one thing and then when you're done compiling one thing — which is one big blocking operation — then you just purge the settings cache, which is a thread-local, which means you were able to stuff context in, retrieve it back out from the free functions. And then when you're done with the whole compilation you go: "Cool! Just... wipe that away."
So the next time I'm compiling something else, I don't get stale "cache data" from compiling some other page or something like that.
Amos Wenger: I, I, I'm not comfortable calling it a cache. I am comfortable calling it context, which I also believe is what you would call it when tokio
does it. I have a slide that says, uh, "implicit context is bad unless I'm the one doing it," which is definitely the way tokio
does feel, but, uh, yeah, that's the first bit of the trick, which is not a trick at all.
It's just realizing, oh, we have a thread, we can use thread-locals and multiple concurrent invocations of CSS compilation from different tasks are not going to interfere with each other because they are currently being polled from different threads. So it all works.
The next insight is usually you can do that and that's fine, but you still need the thing you put in the thread-local to be 'static
, to be owned. So you can have an Arc<S>
where S
is a struct that has strings and then you have a reference counted struct with a bunch of strings or you could like put the struct itself which has the strings so you're cloning every string which might be better for cache locality or something. But I don't want to do that! I don't need to do that! Because as long as nothing is changing the thread-local while we block, so like we're doing CSS compilation, it only ever reads from the thread-local, it never modifies it and nothing else modifies the thing it's pointing to, then I think, I think it's the only, like one of the only legitimate use cases to cast a lifetime away, if that makes sense?
James Munns: Oh no, Amos, this got very serious very quickly.
Amos Wenger: Because hear me out: uh, casting pointer types and like references and stuff is generally a bad idea. Don't do it. Definitely creating an exclusive reference from a shared reference. So going from like const
to mut
is a big no-no. Because the compiler will create optimizations because it will assume that you're not mutating anything.
And so that is definitely a big no-no. And, generally: pointers are unsafe for a reason. There's a lot of ways to get it wrong. But. I do believe this is fine. And I have a "(show code)" slide here because, ironically, perhaps my, my website was broken at this time. I was going through a big refactoring. I do believe it is sound.
James Munns: So, okay: let me make sure I understand this right..
So you have sort of like the... the load context step before you do a compilation, which goes and makes all the database queries and things like that and stores it into a, either like a stack local &str
or a buffer or something like that, and then what you're actually smuggling into the thread-local is a pointer to something. So you're pretending that it's 'static because it's a pointer inside of the thread-local, so that whenever anyone accesses it, it's acting... I don't know. Bump allocator is not the right word, but something that you go, "Well, it's going to live as long as it lives. And I know that I'm gonna manually handle invalidation." So I'm going to go back to the cache metaphor that you don't like, but I'm going to manually invalidate the cache because when I leave the context of a single thread, so you're blocking code is acting like a guard on, "I'm not going to be moved to another thread".. and when you leave that context, you wipe the cache or invalidate the cache or the context and things like that.
So, okay... I'm with you I think.
Amos Wenger: So the thread_local!
macro from std
gives you... you don't see it in the declaration. But it actually wraps whatever type you give it into a LocalKey
. Which, I've become very intimate with as I try to do things across shared objects. So I, I know what's in there.
It's just a struct with a function that gets the address of the ThreadLocal
. But, then the ThreadLocal
type has a bunch of methods defined. Some of them are only available if the type inside of it is a RefCell
. And so I have a RefCell in there. So the actual type is LocalKey<RefCell<Option<*const T>>>
which is definitely not cursed. And then I have a run_with_ambient_context
: it takes a reference to an ambient context, which is definitely not 'static
. It could definitely be freed as soon as this returns. And then it casts that lifetime away by turning it into a raw pointer and puts it in the ambient context thread-local, which is very naughty because if we forget to clear it, we have a dangling pointer.
So anyone who dereferences it is going to be in for surprise. Surprisingly, the run_with_ambient_context
function is all safe because in Rust, creating pointers is not a problem, dereferencing them is, which, that's always the case. But yeah, there's just a ResetOnDrop
struct that has a Drop
implementation that resets the context back to None
.
So we have a guard here, then we set it to the reference cast to pointer, and then we call the function, cause it takes a function, which is blocking. It cannot, it cannot yield. You know, people complain about the fact that synchronous functions and asynchronous functions in Rust look different.
In that case, very good thing. We don't want people to actually return all the way back to tokio
because then this scheme would not work at all. And then the other function is with_ambient_context
, which is to actually get/look/read whatever is from the ThreadLocal
... well, there's a safety comment just above the unsafe block, which says, "We borrowed a ThreadLocal
, no other thread can drop the context while we're holding a reference to it."
Which I believe is true. If you look at the API. That LocalKey
provides for RefCell
inners. If you do with_borrow
and then with_borrow_mut
somewhere else, it's going to panic. So there is some checking going on.
James Munns: And you don't have to worry about Drop
because you're passing a reference in. So the actual owned one is outside of this context, so you're just passing a pointer in. So when you drop your RefCell
containing a pointer, it's not gonna call drop on the T, because...
Amos Wenger: Yes.
James Munns: ...dropping a *T
doesn't drop T
.
Yeah.
Amos Wenger: So this is safe because sure, we are taking some lifetime and casting it away into a raw pointer, but then we're immediately casting that again into an even smaller lifetime. So it's okay because the second, like the inner lifetime is smaller than the outer lifetime.
And that's the trick! Uh, it's, it's fast. You don't have like process-wide locks. It works great. You don't actually have to use Arc and clone Arcs to just pass context to arbitrary functions. I use that a lot. And then the drawback is of course that you have to make sure that it's set before you call a function, but it panics friendly, like in a friendly way if you forgot to set it.
It's just, you know, the usual debate between do I pass the context down explicitly as a parameter or do I set it to a thread-local and hope that it's there. And tokio
definitely has the same problem.
James Munns: Gotcha. I... phew. This is some spicy code, Amos...
Amos Wenger: I know it is!
James Munns: I follow what you're doing.
And... as far as you've explained it, it makes sense. I don't, I don't know if I could recommend this, and I, like, I wonder how... what are you gaining here? Like was mutex contention or RefCell
s or reference counting too expensive?
Or is this-
Amos Wenger: No...
James Munns: Is, was this a fever dive into "what can I get away with?"
Amos Wenger: So this is one use case and it doesn't matter because I only compile CSS when deploying new versions of the site, so that's... very rare. But what happens a lot is executing liquid filters.
Well... this could be a whole other episode explaining all the crimes I do in liquid filters. Liquid is a templating engine. Again, you can extend it. And I have a filter that lets you execute arbitrary SQL queries... which, you know, like, yeah, why not talk to the database from the templating language? Why not?
And so, yeah, it needs to have a database connection. And again, that's a thread-local. And I do the same trick here. And in this case, it is actually required, because it's from a thread pool. So we have a reference to it, but we don't own it. And it's going to go back. Like we have something that dereferences to a Connection
, but it's actually a pooled item connection or something.
It's a wrapper type. So in that scenario, it's unavoidable.
James Munns: Like I said, I think I follow you. I'm, I'm, I, is this a...? Yeah, I don't know. I don't want to put engineer brain on because this is a cool thing and I could "What about different ways to solve this problem?" but I imagine you've spent quite a bit of time with this.
Amos Wenger: I'm curious! Come at me! If you have a less cursed solution in mind, I mean... just like the several copies of tokio
playing nice with each other thing, I wish there was a simpler solution, but...
James Munns: Yeah... Yeah, I guess I guess the... you probably don't want to touch the SASS compilation library to take like a boxed closure or something because like I've written a bunch of stuff that uses function pointers rather than impl Fn
. Because, especially with closures, you get unnameable types. And for example, like when I was writing a scripting engine, it was very easy to have an array of like, okay, I'm binding this function to this name in my scripting language.
So when you type this word, this Rust function gets called. And that's really easy when everything is an Fn
, even if it takes context or whatever, that's very easy. But the second that you want to do closures- or actually the other spicy thing that gets you is async functions because every async function is an unnameable type too. In my scripting language, I ended up having to just have one big match statement that was like all the text keys that you could use and it mapped them to async functions.
But I guess the alternative might be modifying the library to take a boxed closure instead, like a boxed in Fn
, because then you would be able to like pre populate all of them with their context and then call them, but that'd probably be a, I imagine, a fairly invasive change to the library.
Amos Wenger: It would be... And my strategy to survive in open source now, is: I'm either gonna write my own thing and then tell people that I'm not gonna do their weird use case because I wrote it for me. Or: use something exactly as it is and never require any big change from it because I value the maintainers too much, or: fork an existing thing and then, you know, make it do exactly what I need. And then, you know, thank the maintainer, the original maintainer profusely, but never get up in their business and ask them to change their thing completely just to suit my use case. So in this instance, you know, I have at least two use cases where I don't want to be in charge of maintaining a fork of the liquid crate.
I don't want to be in charge of maintaining a fork of a SASS compiler. So, this works, and it's safe demonstrably. So
James Munns: Hee-hee-hee-hee Spicy But safe
Amos Wenger: I'm happy for you to prove me wrong or anyone for that matter.
Episode Sponsor
This episode is sponsored by the Ladybird browser.
Today, every major web browser is funded or powered by Google's advertising empire. Choice is good, but your only choice is Google. The Ladybird browser wants to do something about this. Ladybird is a brand new browser and web engine written from scratch and free of the influences of Big Tech. Driven by web standards first approach, Ladybird aims to render the modern web with good performance, stability, and security.
From its humble beginnings as an HTML viewer for the SerenityOS hobby operating system project, Ladybird has since grown into a cross-platform browser supporting Linux, macOS, and other Unix like systems.
In July, Ladybird launched a non-profit to support development and announced a first Alpha for early adopters targeting 2026, but you can support the project on GitHub or via donations today.
Visit ladybird.org for more information and to join the mailing list.