A somewhat reasonable use of dynamic linking
Amos presents rubicon, which through terrible dynamic linking crimes, brought joy again into developing their website
Video
Audio
Show Notes
- baud, Amos' post on X about the recording, Scarlett 2i2 audio interface
- rubicon
- I Was Wrong About Rust Build Times
- Minify CSS
- rust-cache GitHub Action, crater
- sccache, Bazel, Buck
- Alex Crichton
- What are Rust editions?
- Rust crate types / linkage
- odht crate
- the inline attribute
- rustup
- Amos' PR about the prefer-dynamic docs
- tokio, tracing, parking_lot
- nix or https://nixos.org/
- monomorphization
- rust orphan rules
- -Z randomize-layout
- Sentry
- GDB (GNU Debugger), LLDB (LLDB Debugger)
- tokio's intrusive linked list crash bug
- eyre
- liquid, minijinja
Transcript
Amos Wenger: Amazingly, the podcast thing that we use generates file named full-name--pronouns.mp4
. So it pops all those notifications with your pronouns. And as we discussed earlier, I just want to have this on the record: it shows that I'm recording at 19 kilohertz, which is not a thing. What's actually happening is that we're recording at 192 kilohertz and that they only tested for 48- and 41,000.
And so they only take the first two characters of the full string. And so it chose 48 for James and 19 for me.
James Munns: Which it took me a while to wrap my head around because in serial ports, you might have a baud rate of 19200 or 19.2. So I was like, "It's weird that they pick like a serial baud rate for your audio interface," until I realized that you were talking about 192,000 and not 19,200.
Amos Wenger: I mean, it's, it's a Scarlett 2i2. I was kind of surprised if it was going worse than whatever USB mic.
James Munns: Yeah.
Intro: Fixing build times with rubicon
Amos Wenger: Okay, so today's presentation is called "Fixing build times with rubicon: a somewhat reasonable use of dynamic linking." So in Self-Directed Research tradition, we have a nice subtitle.
This is the follow-up to "I Was Wrong About Rust Build Times."
James Munns: Our very first episode.
Amos Wenger: Yes, I was wrong to say you should just split things into different crates. Actually, you should split things into different dynamic libraries. And immediately, if you've tried that, you might be thinking, "Oh, but rustc lets you do that, cargo lets you do that."
And it kind of does, but also not in a way that makes me happy. So we're going to get into that, this is why I made rubicon. Let's just follow the slides along.
So building large Rust projects is slow. And by large Rust projects, I mean, like anything that's trying to do webby stuff, like HTTP and whatnot serving endpoints- is going to have anywhere from a hundred crates at the beginning to like, it can easily go up to 300, 400.
It's easy to rack up dependencies. And it's really hard to shed them. I have tried for my own website, but, uh, sometimes you just need to transcode images or videos or something, minify CSS. There's always a lot of things to do. So you can either call an external tool or you can link to something.
And if you link to something, then that's more build time.
Why is it that slow to build these projects?
Amos Wenger: Why is it that slow to build these projects? Well, because simply, there's just a lot of code. There's just lots to do. There's lots of code to parse, and then to typecheck, and then to borrowcheck, lots to codegen, lots to optimize, and then lots to link.
And rustc already has incremental builds, which are enabled by default for development, I believe, but not for release builds? Does that sound true?
James Munns: I'd have to check the- I know on default profiles incremental's turned on, I can't remember. And a lot of people turn 'em off in CI too, because when you're doing a clean build in CI, you want them off because they just add overhead, if you're never- if you're gonna throw away the environment as soon as you're done.
So I know a lot of people in their CI like production builds turn them off completely, even if the release profile has them on. Just because you go: well, it adds 10% to the build time for context, I'm gonna throw away immediately.
Amos Wenger: Yeah, even people who actually store cache with like the rust-cache action on GitHub Actions tend to disable it for some reason. I know for sccache, it doesn't really work for the incremental builds. So incremental builds are sort of unfinished, let's say. Also even when they do work for you, they're not incremental enough.
Not everything has been incrementalized, if that makes sense. It's something that you, as a rustc developer, you need to manually do for all sorts of different tasks and just not all of them have been done due to lack of time and funding and mental health, as always.
James Munns: Cache invalidation is not the fun problem to solve most of the time.
Amos Wenger: No, it's not. It's, it's pretty- well, we have crater, so when, when they make a change to rustc, it's actually tested against, I want to say all the crates, but is that actually all the crates?
I'm not actually sure it's all the crates. I mean, there's not all the publicly available ones, right?
James Munns: It's all the one on crates.io and then some smattering of GitHub projects. But if you have a private infrastructure at your large tech company, then it's obviously not going to be touching all of your private, you know, "only tracked in your internal repo" type stuff.
Amos Wenger: Well, chances are large companies don't use cargo anyway. They have their own build system. But yeah, ? Why is incremental not enough? Because it's not incremental enough. Proc macros aren't cached because they can have side effects. So you just simply cannot cache them. So you have to build them and run them. And then if they then produce the same code I guess you can... not do anything? Just reuse the artifact from last time, but you still had to run them. So if you use a lot of proc macros, it might make sense to build them with optimizations, which is a bad situation to be in, but that's how it is sometimes.
Static linking takes time because you're generating files that can be hundreds of megabytes with debug information and everything, and you're just copying a lot of things in different places and doing relocations. It's still, it's a lot of work. We've done, we've made faster linkers, but it's still a lot of work.
Link time optimization takes time. You're, you're delaying optimizations from compile time to link time, which, all happens within the same cargo invocation, but still: it's work. There's just a lot of work to do. And like I said, there's a chance that other big companies, use something that isn't cargo, they use Bazel, they use Buck, they use something else we haven't heard of yet.
And those build systems are typically designed to avoid doing any work that doesn't need to be done by making everything content-addressed. And yeah, they're aiming for perfect caching. And they think about remote building as a first class citizen. Constraints that, yeah, weren't there when cargo was designed.
And it's worth mentioning that if, I don't know, I tend to think that rustc needs more resources, but I think that's even more true of cargo.
James Munns: I know it's changed recently. It's gone from being like just the Alex Crichton show to now there's quite a few more full-time contributors. So I think that trend has been changing and I've started seeing a lot of motion on stuff that had been stalled for a long time and I think other people are seeing the same thing. Because cargo is one of those- like you mentioned all these other build tools- you'll have these big companies that have internal tooling teams and a consistent environment and consistent resources for build servers and things like that. Cargo is kind of hard because it's gotta fit for everyone. It's an amazing tool for like, the center 95 percent of mass of the problem, but... like you said, when you are designing something that ends up having 800 dependencies that change, that all have proc macros, like, just extenuating circumstances to death, it becomes hard to have a one size fit all tool if you aren't willing to constrain the problem, like: hey, we always build with these build machines and we always have these profiles or whatever and whatever that companies can get away with when you control the environment.
Amos Wenger: Yeah, there's only so much you can do without completely breaking compatibility and saying, "Okay, everyone who uses cargo now has to migrate over to cargo 2." And then that's, I don't think that's something that anybody wants. So anytime you introduce something new in cargo, it has to be opt-in or it has to be compatible, support both versions, like the edition system.
It's, it's a hard job. I have respect for the maintainers of cargo, even though I'm the first one to complain when things take too much time.
crate-type= "dylib"
Amos Wenger: So building big projects take time. And like I said, the Rust compiler already knows how to use dynamic libraries. There's an attribute. You can put it into a cargo manifest called crate-type
under [lib]
.
You can set it by default for libraries. I think it's rlib? And then for executables, what is it for executables? Just static or something? I don't know. staticlib
? No.
James Munns: It's just, uh, binary. Yeah, staticlib
is one, but I think it's just bin
for binaries when you're doing a fully linked, statically linked executable.
Amos Wenger: That makes sense. Yeah. So, but because the, the only ones you really said by hand are like, well, dylib
, or "die lib" as you call it or cdylib
.
It doesn't help much...
Amos Wenger: So it has that capability, but it doesn't help much because it's still one big graph when you do cargo build
, if you have this big project with 700 dependencies and all sorts of features enabled and disabled, it has to do a lot of work to just figure out:
what is in that graph? What is enabled? What is disabled? And then load a bunch of information that it's already computed from last time from disk. And it's actually really cool how they do that. They have a package called odht
on-disk hash table for hash tables that do not fit in memory and/or that you do want to persist to the disk.
So it's, it's a really, really cool trick, but it also means, uh, usually lots of I/O. I feel like if you measure the amount of syscalls that cargo has to do for a no-op build you will realize that. Yeah, .
So, not only with d-y-lib or dy-lib is it still one big graph.
Cargo has a lot of work to do, to come to the conclusion that there is actually no work to be done. But it's also kind of, informational. It's "prefer-dynamic linking". It's not "do dynamic linking". You're, you're asking nicely, just like #[inline]
.
James Munns: I was gonna say like #[inline]
versus #[inline(always)]
.
Amos Wenger: Exactly. It's a global setting saying, "Hey compiler, if you can find any boundaries where you can do dynamic linking instead of static linking, please do that.
And then I guess you find out the result by just looking at your target folder. And if there's a bunch of .so
s or .dylib
s or dlls, then I guess it found some.
It's a very active area of development of rustc because the docs for this were wrong for four years and nobody noticed until I submitted a pull request. I love bragging about my little pull requests. I'm like, "I read the docs, nobody did! Or: noticed the mistake, nobody noticed." and now it's correct. So, yay!!! For me!!
The fix: compose smaller projects
Amos Wenger: So because building big projects is hard, let's build smaller projects instead. Let's compose smaller projects together so you can pick your own boundaries, if that makes sense, because maybe you don't want libtokio. Maybe you just want like lib download files from somewhere, or lib use object storage .
James Munns: And, this isn't so weird, it's already kind of like that today: as a maintainer, you have to pick where your crate boundaries are, and when do I split this up into multiple crates, versus when is the orphan rule so annoying it's worth to have one really big crate, .
Amos Wenger: Yes. Unfortunately, making that boundary happen is trivial for command line interface binaries. It's trivial for HTTP servers trivial for GRPC, but it's real tricky with dlopen
Why is it tricky?
Amos Wenger: Why is it tricky? Because Rust has no stable ABI. That's not a bug, that's a feature. It lets compiler developers experiment with optimizations. They're not, you know, bound to a specific way of doing things. They can, if they need to, change things from one version to the other.
Because in the README for rubicon, I said, "Okay, you have to respect a few rules. One of them is build everything with the same rustc version."
Amos Wenger: And then someone was like, "Well, if you enable the option that randomizes all struct layouts, it would still break." I was like, "Yes, but if you do that... I mean, you're kind of asking for it, literally."
And then the other problem is that globals get duplicated. Because, well, if you trust rustc to do the dynamic linking, you will be disappointed because you have to patch everything. It's going to pick its own boundaries that are not necessarily going to match what you want, and then it's going to rebuild more than you want. Like, it's surprising. You change the call site and it rebuilds libtokio. But if you do it yourself, by design- like if you, I don't know- you have your main program and then you have a module, dynamically loaded module that talks to object storage. So something like Amazon S3. Well, they both have tokio, so you have two copies of tokio now, and the problem is tokio uses globals to work. For example the current runtime.
I think we've already talked about that at some point I don't know when or if it will be published... But you have that problem where the application starts the tokio runtime, it calls into the module and the module says, "There's no ambient runtime. This is thread does not currently have a runtime set up," and they're both right.
As A thinks there's a runtime, B thinks there's no runtime because there's two copies of the thread-local variable that stores what runtime is currently active- the ambient runtime, as I like to call it.
James Munns: You've now opened yourself to a multiverse. You've gone full Marvel, and now you have a multiverse of worlds where one multiverse is a Globals exist and the other one doesn't and they get very confused when you try and pretend that there's only one true universe.
Amos Wenger: Yes..
It gets worse - installs tracing subscriber
Amos Wenger: But it gets worse than that. Because having several copies of tokio, okay, find a way to make them use the same global. You can pass, the, the runtime handle explicitly. That's fine, . Here's another scenario.
The app installs a tracing subscriber, which is what you usually do at the beginning of an app. You say like: Okay, uh, tracing events should go to the console or they should go to that file as JSON or pipe to somewhere, my log pipeline. And then if you have some other modules that also use tracing... again, their version of tracing has a different copy of the global that says: here's what the subscriber is for that process of that thread.
And so their events silently go nowhere. So if you're trying to debug why they don't work and you're like, "Hey, we can't enable logging for them, that's weird..." and you're like, "Are we using the wrong syntax? What's happening? Are we not setting the environment variables in the right place?" And you're like, "Oh, okay. It's just different copies of the global.
Even worse: panic handlers
Amos Wenger: Even worse: panic handlers. Again, those are process-wide... This one is tricky because , even if you use rubicon, you actually have to use prefer-dynamic because they need to use the same libstd because otherwise you have one panic handler per shared object and anything like color-backtrace
, whatever, anything that would record crashes from the application, like Sentry to some sort of web service wouldn't work if it happened in a module.
And something even more fun...
Amos Wenger: And something even more fun. So if you try to work around the fact that modules think that there's no, there's no running or ambient tokio runtime and you pass a handle explicitly- well, because tokio uses parking_lot and parking_lot also has globals it has a global table of threads that need to be woken up in the case of certain events.
And so what happens is that not only can it just hang forever sometimes, because it's, it gets an event from the wrong copy of the parking_lot code and it puts it in wrong global and something's never checked. It can also sometimes wake up the wrong thread and I've had memory corruption because there's some unsafe code in there that assumes that there's only one copy of globals, which is a perfectly reasonable assumption to make, to be fair... but it's just not true anymore. So yeah, uh, that was fun.
James Munns: You're really selling me on dynamic linking right now
Amos Wenger: It's just- it's if you thought you like Rust, but you missed all the memory and safety and then random crashing.
Supposed to be a quick hack...
Amos Wenger: So, it was supposed to be a quick hack, all right. I thought I would do just enough crimes and just be careful, which is something I say never works... it's a typical case of do as I say, not as I do,
You should use a memory safe language. You can't just be careful because everybody makes mistakes. "Just be careful" is not a plan. It's wishful thinking. But it was going to be enough for me because I'm special, right? And, uh, no, I just spent eight weeks debugging this. Mostly memory corruption and hangs...
James Munns: Think of all the learning! There was so much learning going on!
Amos Wenger: I had to relearn to use a debugger. I'm on macOS now, so GDB is kind of, a 5th class citizen. So I have to use LLDB. And yeah you- you learn about internals. And then the last two weeks of that, were actually a tokio bug.
I was getting weird crashes around intrusive linked lists for two weeks and I posted about it on Discord because that's what I do, all right? I post completely incomprehensible screenshots and then people just kind of go, "Oh he's cooking something!"
James Munns: It's one of those things though, when you get used to Rust, especially when you're writing unsafe code: if something goes wrong, you go, "It must be me," like, "Other people surely would have done the right thing, and..." so it's always such a surprise when you go, "Oh no, there was just a weird edge case-" which is such a- that's almost the default assumption for my days in like C and things like that.
You go, "Well, someone must have screwed up," but in Rust, it feels like it's the opposite. You go, "_I _must have screwed up somewhere."
Amos Wenger: Well, not only that, but also I was doing pretty explicit crimes, right? Anything that went wrong at this point, especially memory corruption related, must have been because I was like trying to deduplicate globals,
but no, what happened was: when I forked tokio to add support for rubicon , there was a bug in the default branch. . At some point, I was just checking the issue tracker... And I was like, "Wait- wait- wait a fucking second, when was anyone gonna ping me about this?"
I tweeted about it. This is how it works, right? You just tweet a screenshot and then people ping you when it's fixed. Right? That, that's, that's my normal experience.
So now that it actually works, I'm pretty confident about it. I've deployed the version of my website that uses that, maybe a month ago, Uh, I present rubicon, which is just a way to import and export globals.
You just wrap things in macros. You wrap your thread-locals and your process-locals. , nobody talks like this. I just like the symmetry, you know?
James Munns: You're inventing precise vocabulary for a problem that you are the explorer of.
Amos Wenger: A problem I made for myself, yes. So rubicon works through source-level patching, I have patches ready for parking_lot, tokio, tracing, and eyre- that's e y r e. Those are very small patch sets,
The rubicon README goes into the specific setup you have to do if you want to use this. I'm not aware of anyone else using this, but everyone's missing out because my link times are great now. I used to have to wait like... I don't know... 12, 20 seconds just for the damn thing to assemble itself so I could run it.
And now, some modules, yeah, they just link in less than a second and, uh, I can restart the app and it works great. I could do actually dynamic hot reloading, but I'm, I'm not actually doing it. That's, uh...
James Munns: That's spooky for another day.
End result: I'm having fun again
Amos Wenger: Yes, but the end result is: I'm having fun working on my site again.
I've discovered so many things that I actually did in the templating language, like liquid... I migrated to minijinja
. That's another episode we can do. There's so many weird hacks I did as either client-side JavaScript or in the templating language, because I didn't want to touch the Rust code again, because it was so slow to compile and deploy and everything. But now I'm able to ship lots of small changes because testing a new change in debug- I think, the app itself- if I change the binary, the executable crate, it takes, I think, three seconds to compile and link and then reload. And then it's building and loading modules at runtime in the order that it needs them.
And also preemptively before it actually needs them. So it kind of warms up the cache as it does that. And the idea was that in CI, I could upload the built version of those modules somewhere, and then just download them, because it's not like the image encoder changes every week, right?
James Munns: So you still have your three or four hundred dependencies, they're just sharded into 7 or 8 reasonable blocks, and it's rare that anything other than the leaf block that is your actual site is what actually changes.
Amos Wenger: That is exactly it. And it also means because each module is its own layer in the container image that you deploy, when they don't change, they don't change. You don't get like one layer with all the big several hundred megabyte executable that changes. You got like blobs between 2 and 12 megabytes that just stay exactly the same across deploys. And deploying is now really fast I have a server in Singapore, which is usually pretty slow to deploy to.
Future steps: kill this crate
Amos Wenger: Real quick future steps: future steps is to kill this crate because it shouldn't exist. It should be a single flag called globals-linkage
import or export. And if it was a single flag, then it could keep name mangling, which rubicon completely disables because I've tried my best, but I don't think there is a way to abuse current Rust syntax to export and import globals with name mangling.
I may be wrong. But, I want to get people more interested in it because the benefits are really cool. And it could also be something you only use in development, but then in production, you do a static build. I think this should be a built-in Rust feature because it would be really, really cool.
James Munns: I really like the trailblazing. Like, the trailblazing is super fun because you go: look, these are the biggest problems, and other people can come up with even better solutions, but at least I've walked down the road far enough to go, "That's a problem, that's a problem, that's a problem," ... even if the solution's incomplete or requires you to source-patch things- which isn't unreasonable, honestly, for people maintaining an application.
Like, I sometimes carry dependency source patches for a fairly long time because it- you know, my solution that works great for me isn't reasonable for other people, but it's something that cargo already has handles for. So I like the idea of: let's play around with this and then kill it and get it upstreamed because you're certainly not the only one who wants or needs this.
So I think being able to show it off and getting people to be able to take it for a spin. I've seen some of your builds where you just go: save! And it's deployed three seconds later. Which- I have a customer project right now that's in a similar size, like three four hundred deps and our main crate with all the business logic is still one large crate. And even just running tests.
I changed one line of a test and it took like nine seconds to build on my silly fast MacBook. And I'm like: Oh no, this is... Coming from mostly embedded stuff, this is the first time where I go: Oh yeah, I see that problem. And I could definitely use something like this.
Amos Wenger: Yeah, so contributing to rustc is probably going to take a hot minute just because... well, you need to get it right. It needs to be useful to everyone, not just like your weird thing in a basement. So in the meantime, I think I'm on good terms- with the tokio developers and others and tracing.
So I'm going to try to get those merged because it's completely opt-in if you don't enable the import-globals or export-globals features, it's not gonna mean extra dependencies. Gonna be the exact same code as before.
Other people, mostly in game development, have looked into that. I just want to encourage people to write about what you're doing, because then when others try it, you go a little further each time. You just on the shoulders of giants, except giants are like... someone who blogged about it two years ago.
James Munns: Very cool.
Episode Sponsor
The Self-Directed Research podcast is made possible by our sponsors. We offer 30 second host-read ad at the end of every episode. Not sure how to get your message out, or what to say? Let us help!
If you'd like to promote your company, project, conference, or open job positions to an audience interested in programming and technical deep dives, send us an email to contact@sdr-podcast.com for more information about sponsorship.
Thanks to all of our sponsors for their support.