WEBVTT

NOTE
This file was generated by Descript <www.descript.com>

00:00:13.887 --> 00:00:16.977
<v Amanda Majorowicz>This is Self-Directed
Research, where our hosts, James

00:00:16.977 --> 00:00:19.947
and Amos, give each other 20 minute
presentations that usually last

00:00:19.947 --> 00:00:22.777
longer than that about whatever
they've been obsessing over lately.

00:00:23.087 --> 00:00:27.647
For more episodes, show notes, and
transcripts, visit sdr-podcast.com.

00:00:27.977 --> 00:00:29.717
New episodes drop every Wednesday.

00:00:30.737 --> 00:00:33.387
Today's episode is brought to
you by the Ladybird web browser.

00:00:33.677 --> 00:00:38.157
Check out the description or our show
notes at sdr-podcast.com and listen to the

00:00:38.157 --> 00:00:39.937
end of the episode for more information.

00:00:40.287 --> 00:00:43.257
This week, Amos presents
"Thread Locals Galore."

00:00:48.380 --> 00:00:50.780
<v Amos Wenger>We're getting better at
this, but I hate how awkward I am.

00:00:50.990 --> 00:00:51.580
I'm impressed.

00:00:51.630 --> 00:00:54.220
So I spent the whole morning working
on a video script, and that was

00:00:54.220 --> 00:00:55.660
very smooth and very well spoken.

00:00:55.870 --> 00:00:58.170
Then I sign on to this call
with two Americans, and I'm

00:00:58.170 --> 00:00:59.250
like, "Oh, is my accent okay?

00:00:59.250 --> 00:01:00.160
Can they understand me?

00:01:00.490 --> 00:01:02.080
They can tell I'm French,
and I'm like, I know it!"

00:01:02.185 --> 00:01:03.535
<v James Munns>You were very well spoken.

00:01:03.755 --> 00:01:04.205
<v Amanda Majorowicz>Oh yeah.

00:01:04.620 --> 00:01:06.410
<v Amos Wenger>Yes, thank you.

00:01:06.703 --> 00:01:08.793
<v James Munns>So do you have
like your punchy sentence of

00:01:08.793 --> 00:01:10.653
like, "I'm gonna focus on this"?

00:01:10.963 --> 00:01:13.593
<v Amos Wenger>No, I'm not
going focus on anything ever.

00:01:13.913 --> 00:01:14.483
That's your thing.

00:01:15.273 --> 00:01:18.183
No, I'm jealous of your slides because
I have actual bullet points still.

00:01:18.183 --> 00:01:21.653
You have like, catchy sentences
that drive you on each slide.

00:01:21.653 --> 00:01:23.703
But at least I'm looking at
the presenter view right now so

00:01:23.703 --> 00:01:24.813
I know what's coming up next.

00:01:25.093 --> 00:01:28.443
But mostly I make slides so
that I can ignore them later.

00:01:29.263 --> 00:01:29.653
<v James Munns>Yeah.

00:01:30.193 --> 00:01:32.833
I make slides that are like
subtitles basically, and they're

00:01:32.833 --> 00:01:34.303
just punchy one sentences.

00:01:34.453 --> 00:01:37.903
So like yours, if I had bullet points,
I'll just make like nine slides for that.

00:01:38.348 --> 00:01:39.688
<v Amos Wenger>Yeah, yeah, exactly.

00:01:39.953 --> 00:01:43.183
<v Amanda Majorowicz>Well, and your, your
thread, it says "thread" in the title.

00:01:43.193 --> 00:01:44.253
So that's the thread.

00:01:45.428 --> 00:01:48.398
<v Amos Wenger>It's, well,
it's thread-locals.

00:01:48.418 --> 00:01:49.318
It's a single unit.

00:01:49.343 --> 00:01:50.703
<v Amanda Majorowicz>Do I
know what I'm talking about?

00:01:50.703 --> 00:01:51.123
No.

00:01:51.333 --> 00:01:51.723
The end.

00:01:51.823 --> 00:01:53.378
<v Amos Wenger>No, neither do I.

00:01:53.798 --> 00:01:56.748
Here's the other thing, is that
I'm always talking about something

00:01:56.748 --> 00:01:58.205
that I just found out about.

00:01:58.295 --> 00:02:02.855
So, most likely this podcast is going to
be a whole series of, "Never mind what I

00:02:02.875 --> 00:02:06.165
said last episode, I didn't know what I
was talking about, I have been corrected."

00:02:06.360 --> 00:02:07.460
<v James Munns>Engagement bait!

00:02:07.710 --> 00:02:08.960
Engagement bait!

00:02:09.030 --> 00:02:10.000
<v Amos Wenger>But not on purpose!

00:02:10.030 --> 00:02:11.140
I'm just learning things.

00:02:11.450 --> 00:02:13.910
I got something wrong in a
video and it's been haunting me.

00:02:13.910 --> 00:02:16.250
I haven't published anything
since in months because I'm

00:02:16.250 --> 00:02:17.770
like, I got it really wrong.

00:02:17.800 --> 00:02:19.120
And what do you do?

00:02:19.130 --> 00:02:20.730
You're not going to delete the video.

00:02:20.870 --> 00:02:23.590
YouTube doesn't let you just add a errata.

00:02:23.610 --> 00:02:25.873
I have a pinned comment,
but nobody reads those.

00:02:25.883 --> 00:02:27.253
So I just have to live with it.

00:02:27.273 --> 00:02:30.063
<v James Munns>Well, we can add a section
to the front of these eventually of just

00:02:30.063 --> 00:02:32.113
"Corrections From Previous Episodes."

00:02:32.153 --> 00:02:32.423
<v Amos Wenger>Yeah.

00:02:32.423 --> 00:02:33.323
Yeah, we can do that.

00:02:33.323 --> 00:02:34.793
<v Amanda Majorowicz>Yeah, just
like in Time magazine, they're

00:02:34.793 --> 00:02:36.463
like, "Yes, corrections..."

00:02:36.833 --> 00:02:39.573
Did somebody point it out to you
though, the mistake or whatever?

00:02:39.633 --> 00:02:40.953
<v Amos Wenger>Yes, a friend.

00:02:41.203 --> 00:02:41.703
<v Amanda Majorowicz>Oh!

00:02:41.703 --> 00:02:45.173
<v Amos Wenger>A friend who I hadn't
spoken to in weeks, uh, messages me

00:02:45.173 --> 00:02:49.153
on Signal saying, "Uh Oh," I was like,
"What do you mean 'Uh oh?'" It was like,

00:02:49.213 --> 00:02:52.053
"Well, this is not how a TCP/IP works."

00:02:52.053 --> 00:02:52.933
I was like, "Oh, well..."

00:02:53.323 --> 00:02:56.423
<v James Munns>It's funny, I posted my
DMA slides in the chat and someone

00:02:56.423 --> 00:02:57.513
"Well actually'd" me on that.

00:02:57.513 --> 00:03:00.363
They're like, "Is the peripheral-"
So like, this is the whole thing

00:03:00.363 --> 00:03:02.963
that I didn't get into- You get into
these things called interconnect

00:03:02.963 --> 00:03:06.853
matrices, where there's actually
multiple buses and multiple accessors.

00:03:06.883 --> 00:03:10.593
So you have the accessors, like the CPU
and the DMA like this, and then there's

00:03:10.603 --> 00:03:12.173
the different buses that they can access.

00:03:12.203 --> 00:03:15.673
And there's actually- a big part
of this arbitration is just, which

00:03:16.063 --> 00:03:18.133
matrices are you going through?

00:03:18.133 --> 00:03:21.763
And in a lot of current
architectures, peripheral bus is not

00:03:21.973 --> 00:03:23.653
something you directly connect to.

00:03:23.893 --> 00:03:27.243
It's actually there's an adapter- so
that memory bus that I talked about

00:03:27.248 --> 00:03:30.663
was something called the AHB or the
high speed bus, and then there's

00:03:30.663 --> 00:03:32.253
the APB or peripheral- I dunno.

00:03:32.523 --> 00:03:32.853
Yes.

00:03:32.853 --> 00:03:33.903
I had people "well actually" me.

00:03:33.908 --> 00:03:37.063
But then it was funny 'cause the room
circled around to even more "well,

00:03:37.063 --> 00:03:40.703
actually-ing" where we started, "well
actually," and we realized that like, even

00:03:40.713 --> 00:03:45.035
on common microcontrollers, they're all
implemented totally radically differently.

00:03:45.236 --> 00:03:48.366
<v Amos Wenger>For the benefit of the
listener, if this is kept, I hope it is:

00:03:48.436 --> 00:03:51.366
uh, alt text for looking at James' hands.

00:03:51.476 --> 00:03:56.456
James' hands briefly became an
interconnected matrix of various buses.

00:03:56.746 --> 00:03:58.906
<v James Munns>Four fingers,
one vertical, one horizontal.

00:03:58.956 --> 00:03:59.156
Yeah.

00:03:59.156 --> 00:03:59.574
It's a matrix.

00:04:00.436 --> 00:04:01.256
<v Amos Wenger>Yes, of course.

00:04:01.996 --> 00:04:05.456
But it was not any clearer to me,
looking at the hands versus not having

00:04:05.456 --> 00:04:06.486
the hands, if that's any comfort.

00:04:06.696 --> 00:04:07.876
<v James Munns>I'll send you some diagrams.

00:04:07.896 --> 00:04:10.656
Cause we went and looked up
reference manual, like data

00:04:10.656 --> 00:04:11.706
sheets and stuff like that.

00:04:11.926 --> 00:04:14.646
Cause some of them do explain
how those are wired up.

00:04:14.646 --> 00:04:17.206
But yeah, this is, I guess
the pre talk for the...

00:04:18.186 --> 00:04:19.476
we have post talk I guess.

00:04:19.496 --> 00:04:19.976
I don't know.

00:04:20.291 --> 00:04:20.881
<v Amos Wenger>That's fine.

00:04:20.881 --> 00:04:23.151
That encourages the people to
actually listen to every episode.

00:04:23.151 --> 00:04:24.681
Cause I was like, "Oh, I missed
something in the previous one.

00:04:24.681 --> 00:04:25.791
So that's, that's smart as well."

00:04:26.021 --> 00:04:29.301
I should do the same and I kind
of have the community to do that

00:04:29.311 --> 00:04:33.021
now, even though I've pulled people
from all sorts of different places.

00:04:33.271 --> 00:04:36.611
So actually no, like maybe one
of them if they're awake at that

00:04:36.611 --> 00:04:38.531
time, but not everyone like you.

00:04:39.031 --> 00:04:42.301
And also I like the effect
of surprise, so even just the

00:04:42.301 --> 00:04:43.521
slide deadline that we've set.

00:04:43.521 --> 00:04:46.641
I'm like, "Ah, he's gonna read my slides
and then the surprise gonna be gone."

00:04:47.341 --> 00:04:49.531
Like I want to go next
line and people go, "Whoa!"

00:04:49.611 --> 00:04:50.511
See I want that.

00:04:50.591 --> 00:04:51.981
That's, that's what I
aim for in my videos.

00:04:52.071 --> 00:04:55.222
Also if I talk about something that, uh,
a project I haven't finished, I already

00:04:55.222 --> 00:04:58.442
get the reward for doing it, even though
it's not finished, and then I give it up.

00:04:59.192 --> 00:05:01.032
Which is not the case here,
because there's a lot of

00:05:01.032 --> 00:05:02.212
social pressure to show up.

00:05:02.327 --> 00:05:04.117
<v James Munns>My hope for this
is that it's dopamine drip.

00:05:04.450 --> 00:05:07.687
It's been super, super nice to just
ping you, like, once a week and be like,

00:05:08.007 --> 00:05:09.427
"Alright, I got three things on my mind.

00:05:09.427 --> 00:05:10.147
Which one's interesting?"

00:05:10.147 --> 00:05:10.817
And you're like, "This one."

00:05:10.817 --> 00:05:11.687
And I'm like, "Dope!"

00:05:11.697 --> 00:05:12.487
That's all I needed.

00:05:12.487 --> 00:05:15.597
I just need someone to tell me
that they wanted to hear me talk

00:05:15.597 --> 00:05:17.957
about it and now all of a sudden
I'm fired up to talk about it.

00:05:18.178 --> 00:05:18.778
<v Amos Wenger>Yes.

00:05:18.778 --> 00:05:22.538
I think my reply was one word, so it's
proof that it really doesn't take much.

00:05:22.663 --> 00:05:22.913
<v James Munns>Yeah.

00:05:23.163 --> 00:05:24.033
Captive audience.

00:05:25.108 --> 00:05:25.658
<v Amos Wenger>All right.

00:05:26.118 --> 00:05:30.111
Well, mutually assured slides or whatever.

00:05:31.081 --> 00:05:34.601
Oh, this should have
been the podcast name!

00:05:34.601 --> 00:05:37.411
Mutually Assured
Presentation or something.

00:05:37.816 --> 00:05:38.276
<v James Munns>There you go.

00:05:39.061 --> 00:05:39.801
<v Amos Wenger>Oh, wow.

00:05:40.331 --> 00:05:44.651
All right, today I want to talk
about thread-locals, um, again.

00:05:46.276 --> 00:05:48.366
<v James Munns>The descent of
madness into thread-locals.

00:05:49.241 --> 00:05:51.935
<v Amos Wenger>So the title of my
presentation is "Thread-locals Galore,"

00:05:52.275 --> 00:05:55.815
and the subtitle is "the only thing
more evil than one singleton is

00:05:55.815 --> 00:05:57.615
multiple copies of that same singleton."

00:05:58.485 --> 00:05:58.895
Obviously.

00:05:59.175 --> 00:06:00.345
So let's talk about variables.

00:06:00.855 --> 00:06:03.325
Variables might not be variables,
some of them are const.

00:06:03.435 --> 00:06:04.345
Uh, don't pay attention.

00:06:04.345 --> 00:06:05.805
That's just how we name
things around here.

00:06:07.530 --> 00:06:11.200
There's such a thing as local variables,
which are usually stored on the stack,

00:06:11.410 --> 00:06:14.257
unless you're not looking and then the
optimizer might put them in register.

00:06:14.487 --> 00:06:17.827
Unless you want to debug and then
it's using debug information to

00:06:17.827 --> 00:06:20.417
know which register to look in and
pretend it's actually on the stack.

00:06:20.467 --> 00:06:20.887
Whatever.

00:06:21.137 --> 00:06:22.257
Then there's heap allocation.

00:06:22.407 --> 00:06:26.107
The heap, as opposed to the stack,
is a large area where we put things,

00:06:26.157 --> 00:06:30.117
so kind of the same, but we don't put
them in order, as the name implies.

00:06:30.117 --> 00:06:32.667
It's kind of disorganized, that's why
it's called 'heap' but actually it's

00:06:32.667 --> 00:06:35.767
very very well organized because the
allocator knows where everything is.

00:06:35.767 --> 00:06:38.277
You can free things you can allocate
things, even if you didn't know the

00:06:38.277 --> 00:06:41.607
size at compile time, you can decide the
size at runtime like: Oh, this time it's

00:06:41.607 --> 00:06:43.807
going to be an array of 128 elements.

00:06:43.807 --> 00:06:45.027
And this time it's going to be 64.

00:06:45.027 --> 00:06:46.867
And I didn't know that at
compile time, but it's okay.

00:06:46.867 --> 00:06:49.677
Cause the allocator is here
to find me some free space.

00:06:49.677 --> 00:06:50.967
If there's not too much fragmentation.

00:06:50.967 --> 00:06:54.237
<v James Munns>You could do that on
the stack too, in C, but not Rust.

00:06:54.247 --> 00:06:56.157
Cause C has a cool
thing called `alloca()`.

00:06:56.437 --> 00:06:59.120
Which is basically like- it's using
your stack as a bump allocator.

00:06:59.360 --> 00:07:02.810
And the reason that Rust doesn't expose
it is because it's wildly dangerous.

00:07:03.130 --> 00:07:06.620
And uh, maybe a little less dangerous
in Rust where we have real lifetimes

00:07:06.620 --> 00:07:07.480
where you'd be able to tell.

00:07:07.480 --> 00:07:10.980
But a lot of people just avoid it
because it's very easy to accidentally

00:07:10.980 --> 00:07:14.820
hand back a pointer to your stack,
and: oops, stack corruption.

00:07:14.860 --> 00:07:18.353
But yeah, heap is usually where we put
the dynamically-sized stuff now, at least.

00:07:18.387 --> 00:07:18.677
<v Amos Wenger>Yeah.

00:07:18.697 --> 00:07:21.182
See, that's not even in my bullet
points, but yes, that is correct.

00:07:21.202 --> 00:07:22.925
I remember uh, VLAs.

00:07:22.925 --> 00:07:24.835
variable-length arrays from my C days.

00:07:25.115 --> 00:07:28.305
I used to make a language that compiled
down to C, so that was, that was a thing.

00:07:28.615 --> 00:07:31.195
Next up, so we talked about locals,
we talked about heap allocations.

00:07:31.550 --> 00:07:34.910
Third category, statics: stored in
memory-mapped sections of the executable.

00:07:35.060 --> 00:07:38.130
Surprise, executables are files,
but they're also mapped in memory.

00:07:38.400 --> 00:07:42.090
And some part of them is code,
which we call text section,

00:07:42.090 --> 00:07:43.110
because of course we do.

00:07:43.400 --> 00:07:47.424
And then, data and constants and whatever
are stored in other sections, whose

00:07:47.424 --> 00:07:49.234
names I forgot because I didn't study.

00:07:49.704 --> 00:07:51.534
I don't want to call them statics.

00:07:51.534 --> 00:07:55.164
I want to call them process-local
because their lifetime is

00:07:55.164 --> 00:07:56.314
essentially that of the process.

00:07:56.324 --> 00:08:00.124
So the executable is mapped into
memory and then started, and then you

00:08:00.124 --> 00:08:04.374
can be sure that that memory area is
reserved from the beginning of the

00:08:04.374 --> 00:08:06.624
process to the end, to its death.

00:08:07.024 --> 00:08:07.664
And then.

00:08:08.244 --> 00:08:12.376
Fourth category, we have thread-local
storage, which is a lie, largely.

00:08:12.527 --> 00:08:18.432
It's essentially just like process
local storage, except relative to some

00:08:18.432 --> 00:08:21.822
address and that address changes every
time we switch to a different thread.

00:08:22.262 --> 00:08:27.962
And that is arranged either by the
kernel or the kernel or the kernel

00:08:27.962 --> 00:08:30.352
on different registers depending
on different architectures.

00:08:30.465 --> 00:08:32.965
The last time I studied
this, I was on 64-bit Linux.

00:08:33.215 --> 00:08:34.205
That's a few years ago.

00:08:34.495 --> 00:08:37.675
And, uh we used the FS
segment register back then.

00:08:37.685 --> 00:08:39.565
I have no idea what's
happening on 32-bit Linux.

00:08:39.575 --> 00:08:40.505
It's probably different.

00:08:40.725 --> 00:08:44.465
I am now on 64-bit ARM macOS.

00:08:44.960 --> 00:08:51.504
And ChatGPT told me that the
TPIDR_EL0 register is being used.

00:08:51.784 --> 00:08:52.664
James has input?

00:08:52.664 --> 00:08:52.884
No?

00:08:53.476 --> 00:08:54.756
<v James Munns>Well, this
is all desktop stuff.

00:08:54.756 --> 00:08:57.236
I hang out in microcontrollers-
microcontrollers don't have thread-local

00:08:57.236 --> 00:08:58.516
storage because we don't have threads.

00:08:58.626 --> 00:09:00.496
So, makes sense to me.

00:09:00.636 --> 00:09:01.256
<v Amos Wenger>Lucky you.

00:09:01.877 --> 00:09:02.257
But basically yeah.

00:09:02.952 --> 00:09:05.787
Every thread-local has an offset.

00:09:05.787 --> 00:09:07.587
So some of it you can
determine statically.

00:09:07.587 --> 00:09:11.077
Let's say you have three variables
in your executable and they're

00:09:11.077 --> 00:09:12.317
all marked as thread-local.

00:09:12.337 --> 00:09:15.617
So the compiler annotates
those accordingly, the linker

00:09:15.617 --> 00:09:16.417
puts everything together.

00:09:16.417 --> 00:09:20.417
It adds up all the thread-locals that
everybody knows about from all the objects

00:09:20.427 --> 00:09:23.597
and then makes room for them in the
executable and then you have room for all

00:09:23.627 --> 00:09:27.927
these, but then you can also, of course,
allocate thread-locals dynamically,

00:09:28.067 --> 00:09:30.657
by asking nicely the operating system.

00:09:30.952 --> 00:09:33.742
Which is the libc, which is the
allocator, which is the threads runtime.

00:09:33.742 --> 00:09:34.592
It's all the same thing.

00:09:34.832 --> 00:09:38.012
You, you think there's such a thing
as like different libraries on your

00:09:38.012 --> 00:09:39.812
Linux system, but it's all libc, right?

00:09:39.822 --> 00:09:44.494
It's libc and and whatever
UI framework you're using.

00:09:44.494 --> 00:09:44.814
Cool.

00:09:44.924 --> 00:09:46.874
So that's easy, right?

00:09:46.964 --> 00:09:47.824
Uh, pretty much.

00:09:47.924 --> 00:09:52.084
You have all those thread-locals
in the same block, and there's one

00:09:52.084 --> 00:09:55.164
of these blocks per thread, and
you just change the base address

00:09:55.184 --> 00:09:56.224
whenever you change the thread.

00:09:56.224 --> 00:09:57.194
You don't even have to worry about it.

00:09:57.194 --> 00:09:58.054
You don't actually do it.

00:09:58.054 --> 00:09:58.854
The kernel does it.

00:09:59.204 --> 00:10:00.424
And that's it, right?

00:10:00.464 --> 00:10:00.814
Wrong.

00:10:00.824 --> 00:10:05.034
Because, we have to worry about
the case where the thread ends.

00:10:05.134 --> 00:10:06.094
Those things happen.

00:10:06.374 --> 00:10:09.834
Uh, you start a thread, and then it
ends, and in Rust, we have a thing

00:10:10.454 --> 00:10:14.232
that has been, widely regarded as
a bad idea and made everyone angry.

00:10:14.622 --> 00:10:15.452
It's the Drop trait.

00:10:16.092 --> 00:10:17.022
No, it's a very good idea.

00:10:17.722 --> 00:10:18.372
James is frowning.

00:10:18.622 --> 00:10:21.252
We have the Drop trait, which
means it's a destructor.

00:10:21.262 --> 00:10:24.001
Whenever you drop a value,
nobody owns it anymore.

00:10:24.001 --> 00:10:25.271
Nobody has a reference to it anymore.

00:10:25.281 --> 00:10:26.261
It expires.

00:10:26.261 --> 00:10:27.341
It falls out of scope.

00:10:27.591 --> 00:10:30.221
Then you can run some code to
clean some things up, which is

00:10:30.221 --> 00:10:35.181
very useful if that value is your
view of some hardware resource.

00:10:35.191 --> 00:10:37.421
So like, I don't know, a network
connection, an open file.

00:10:37.700 --> 00:10:40.340
<v James Munns>So yeah, I guess the
difference between these and statics-

00:10:40.340 --> 00:10:44.160
so statics are interesting because they
live forever forever, so as far as you're

00:10:44.510 --> 00:10:49.070
concerned, like, the destructor will never
be run on a static, but thread-locals

00:10:49.090 --> 00:10:53.040
have to both live forever, at least
apparently to each thread they have to

00:10:53.040 --> 00:10:56.980
live forever, except for when the thread
dies, but like, so it has to live forever

00:10:56.980 --> 00:10:58.580
except for until it doesn't, I guess.

00:10:58.635 --> 00:10:59.765
<v Amos Wenger>Yes, exactly.

00:11:00.085 --> 00:11:04.375
In fact, inside the internals for
thread-locals in Rust, there's

00:11:04.375 --> 00:11:07.685
a thing that says: Well, we say
`'static`, but it's not actually true.

00:11:07.685 --> 00:11:10.985
It's actually slightly shorter than
the lifetime of even the thread, which

00:11:10.985 --> 00:11:13.595
is already less than the lifetime
of the process, because we need to

00:11:13.605 --> 00:11:16.505
run the destructor at some point,
and then that'll be the end of that.

00:11:16.895 --> 00:11:18.675
And so `'static` is a
double lie in this case.

00:11:19.305 --> 00:11:20.425
So that's the first complication.

00:11:20.435 --> 00:11:25.630
Some types implement Drop, so we can't
just like, free the underlying storage

00:11:25.820 --> 00:11:27.070
and just not refer to it anymore.

00:11:27.070 --> 00:11:30.820
We have to know the type
of every variable that's...

00:11:30.890 --> 00:11:33.540
every thread-local and run their
destructor if they have one.

00:11:34.240 --> 00:11:37.750
Complication number two, related to
birth rather than death this time:

00:11:37.860 --> 00:11:42.260
sometimes you know exactly what the byte
pattern of some value is going to be.

00:11:42.260 --> 00:11:46.510
So if you initialize a signed
64-bit integer to 42, you know

00:11:46.510 --> 00:11:47.440
exactly what that looks like.

00:11:47.440 --> 00:11:49.340
You can bake that into the
executable, you can memory-map that.

00:11:50.661 --> 00:11:51.941
It's const, it's beautiful.

00:11:51.951 --> 00:11:54.181
You know exactly what it's going
to be, but sometimes you don't.

00:11:54.191 --> 00:11:57.621
Sometimes you have to
run some code at runtime.

00:11:57.641 --> 00:11:59.141
I'm going to say "run at runtime" a lot.

00:11:59.221 --> 00:11:59.781
That's okay.

00:12:00.131 --> 00:12:00.931
It's part of the process.

00:12:01.231 --> 00:12:03.591
You have to run some code at
runtime to figure out what the

00:12:03.591 --> 00:12:04.771
byte pattern is going to be.

00:12:05.071 --> 00:12:06.491
It's not const.

00:12:06.501 --> 00:12:08.831
It cannot be evaluated at compile-time.

00:12:08.851 --> 00:12:10.251
It has to be evaluated at runtime.

00:12:11.181 --> 00:12:14.631
In which case, what you do is
hopefully you know how much storage

00:12:14.631 --> 00:12:15.881
you need, so you reserve that.

00:12:16.071 --> 00:12:18.951
And then you have a thing that
every time someone tries to access

00:12:18.951 --> 00:12:22.251
that thread-local, it goes, "Wait
a minute, let's check the state.

00:12:22.251 --> 00:12:23.601
Has it already been initialized?

00:12:23.826 --> 00:12:26.396
If not, let's run the constructor."

00:12:26.476 --> 00:12:28.176
(I guess in that case, even
though we don't explicitly

00:12:28.176 --> 00:12:29.296
have constructors in Rust).

00:12:29.606 --> 00:12:32.816
"Let's run the initialization code,
which is going to put the initial byte

00:12:32.816 --> 00:12:34.186
pattern doing whatever it needs to do.

00:12:34.466 --> 00:12:36.236
And then you get to actually borrow it."

00:12:36.717 --> 00:12:41.287
And if you put those together, you
have a whole lifecycle of like,

00:12:41.287 --> 00:12:42.707
this thing is not initialized yet.

00:12:42.717 --> 00:12:44.377
This thing is initialized,
you can have it.

00:12:44.527 --> 00:12:46.717
We've already run the destructor, why
are you even trying to borrow this?

00:12:48.025 --> 00:12:50.845
The way this is all implemented
in the Rust standard library

00:12:50.875 --> 00:12:52.395
is actually pretty smart.

00:12:52.745 --> 00:12:57.947
And, uh, I was actually pretty impressed
with it because I can't show you code

00:12:57.947 --> 00:13:01.007
because you're just listening to this,
of course, but I'm looking at an array,

00:13:01.007 --> 00:13:06.467
a 2x2 matrix of whether a type has a Drop
implementation or no Drop implementation.

00:13:08.707 --> 00:13:10.477
James is making finger motions.

00:13:10.547 --> 00:13:12.987
<v James Munns>Sorry, I was
throwing the 2x2 matrix gang sign.

00:13:13.017 --> 00:13:13.747
Yeah, sorry.

00:13:14.917 --> 00:13:18.798
As opposed to the 4x4 AHB
matrix axis gang sign.

00:13:19.028 --> 00:13:21.968
That'll only make sense if you listen
to the last episode or whatever, however

00:13:21.968 --> 00:13:24.588
it's oriented, but it's an easter egg.

00:13:26.233 --> 00:13:28.483
<v Amos Wenger>So I'm looking at
it- there's two dimensions, right?

00:13:28.483 --> 00:13:31.653
There's 'needs Drop' or 'doesn't need
Drop,' and then there's 'initialization is

00:13:31.653 --> 00:13:36.473
const' or ' initialization is lazy,' and
that gives us four different scenarios.

00:13:36.513 --> 00:13:40.753
And in the best scenario, which is
'doesn't need Drop' and 'initialization

00:13:40.753 --> 00:13:46.298
is const,' then we just use whatever
the compiler does, and you're going to

00:13:46.298 --> 00:13:47.408
ask, "But isn't the compiler rustc?"

00:13:47.408 --> 00:13:47.718
No, it's LLVM.

00:13:48.678 --> 00:13:50.758
So you just tell LLVM,
it's a thread-local.

00:13:51.078 --> 00:13:53.498
And then it does the thing I said,
which is like to statically reserve

00:13:53.498 --> 00:13:55.318
some storage in the executable.

00:13:55.658 --> 00:13:58.328
And the linker also knows about that.

00:13:58.328 --> 00:13:59.928
And the dynamic loader
also knows about that.

00:13:59.938 --> 00:14:02.788
The operating system, everybody
knows about what to do, except for

00:14:02.788 --> 00:14:05.648
humans, because there's like 13
people who have gone the length

00:14:05.648 --> 00:14:06.778
of understanding how this works.

00:14:07.938 --> 00:14:10.898
Ultimately, it all comes down
to this one single interface,

00:14:10.918 --> 00:14:12.038
which is called ``LocalKey``.

00:14:12.058 --> 00:14:15.158
And I have to zoom into my own slide here
because Jesus, that's a lot of comments.

00:14:15.538 --> 00:14:20.898
`LocalKey` is a struct that has a single
field and the field is not the value.

00:14:20.918 --> 00:14:25.828
The field is a function, not a
closure a bare function that takes

00:14:25.898 --> 00:14:29.998
an option to a mutable reference
to an option to the value.

00:14:31.748 --> 00:14:36.418
I know it's hard to, if you pay for
the, for the 10 bucks a month tier,

00:14:36.438 --> 00:14:38.128
you could actually see the function.

00:14:38.128 --> 00:14:40.488
So just go, just go to the
standard library, type in

00:14:40.488 --> 00:14:42.458
`LocalKey` and, uh brace yourself.

00:14:42.828 --> 00:14:46.638
And it returns a const
pointer, not a reference, to T.

00:14:46.828 --> 00:14:49.428
And there's a comment saying, the
comment is talking about saying, "Well,

00:14:49.428 --> 00:14:51.748
we say `'static`, but it's not actually
`'static`," even though there's no

00:14:51.778 --> 00:14:55.718
mention of `'static` here in the function
signature, it just returns a pointer.

00:14:55.828 --> 00:14:58.448
There is a `'static` up here,
and like, `T` is supposed to be

00:14:58.458 --> 00:14:59.418
`'static`, it's supposed to be owned

00:14:59.878 --> 00:15:05.058
<v James Munns>So it's a function that
takes an optional pointer to a space

00:15:05.118 --> 00:15:09.318
that might hold the thing, so I
guess two layers of indirection here?

00:15:09.818 --> 00:15:13.508
I love the comment to code ratio because
there's like three lines of actual

00:15:13.508 --> 00:15:18.368
functional code here, two configuration
blocks on it, and then like 15

00:15:18.468 --> 00:15:21.498
lines of: "Okay, prepare yourself.".

00:15:23.176 --> 00:15:23.606
<v Amos Wenger>Well...

00:15:23.678 --> 00:15:24.088
Yes.

00:15:24.108 --> 00:15:24.598
Because...

00:15:25.538 --> 00:15:31.187
essentially, all this does, like,
`LocalKey` is really just: Here's a

00:15:31.187 --> 00:15:34.237
function that gives us the address
of the thing you're looking for.

00:15:34.557 --> 00:15:35.747
This is all that it does.

00:15:36.137 --> 00:15:39.987
But in the best possible case, in the
simplest case, the compiler can see

00:15:39.987 --> 00:15:43.517
through that function — because it
sees all your code — and it's like:

00:15:43.517 --> 00:15:45.357
okay, so I see a lot of indirection.

00:15:45.377 --> 00:15:48.087
I see like we're calling a function
that returns the address of the thing,

00:15:48.437 --> 00:15:51.811
but we know statically that the thing
is there and you're using it, so

00:15:51.835 --> 00:15:54.385
let's just ignore the functions, just
inline everything, and this is just a

00:15:54.385 --> 00:15:57.335
regular access at this point, and then
you can inline some more, and then...

00:15:57.855 --> 00:16:01.365
so this is what the comment says as
well, which I don't love, because I

00:16:01.375 --> 00:16:05.005
don't love when comments say, "Well, the
optimizer is surely going to take care

00:16:05.005 --> 00:16:08.035
of this," and then, you know, the next
point release of Rust is like, "Well, it

00:16:08.035 --> 00:16:11.215
turns out it didn't," and now everything
is faster, or slower, or whatever.

00:16:12.295 --> 00:16:15.695
So, now that you perfectly
understand how thread-locals work,

00:16:15.835 --> 00:16:18.380
which is you know: Everything is
relative to some base address.

00:16:18.380 --> 00:16:20.430
The base address changes
when you switch threads.

00:16:20.660 --> 00:16:25.800
And in Rust, really, a
thread-`LocalKey` is just the

00:16:25.800 --> 00:16:28.360
address of a function that gives you
the address of your thread-local.

00:16:28.970 --> 00:16:30.494
And that works for all possible scenarios.

00:16:30.504 --> 00:16:34.084
So now that we know all that, where
are thread-locals actually used?

00:16:34.107 --> 00:16:34.997
What are they useful for?

00:16:35.007 --> 00:16:37.201
Well, for data that's local to a thread.

00:16:37.661 --> 00:16:41.835
So asynchronous runtimes, some of
them- `tokio` specifically- does

00:16:41.835 --> 00:16:48.190
that, because sometimes you want to
sleep for a few seconds and to sleep

00:16:48.200 --> 00:16:51.087
for a few seconds, you don't block
for a few seconds in async code.

00:16:51.287 --> 00:16:53.907
You tell your executor?

00:16:53.967 --> 00:16:55.627
I'm looking at James for approval.

00:16:55.727 --> 00:16:58.157
You tell your executor, no, your reactor!

00:16:58.584 --> 00:17:01.060
<v James Munns>Well, you asked the
executor, which asks the reactor.

00:17:01.110 --> 00:17:03.660
<v Amos Wenger>Anyway, so sometimes you
have a future, you want to sleep, but you

00:17:03.660 --> 00:17:04.950
don't want it to block, so what do you do?

00:17:05.050 --> 00:17:08.290
You find out what the ambient- or
I like to call it ambient, I'm the

00:17:08.290 --> 00:17:10.000
only one- the current runtime is.

00:17:10.510 --> 00:17:12.410
Which is stored in a thread-local.

00:17:12.520 --> 00:17:16.600
And then you say: Okay, wake me up in
whatever, two seconds, and then you yield.

00:17:16.600 --> 00:17:19.096
And then in two seconds,
hopefully, it polls you again.

00:17:19.894 --> 00:17:21.614
Anything with a register
also uses thread-locals.

00:17:21.614 --> 00:17:25.504
So for example, `tracing-subscriber`,
which is nice because it lets you

00:17:25.504 --> 00:17:29.744
have separate threads within a program
and separate subsystems, essentially.

00:17:29.744 --> 00:17:34.074
So you can have like threads A, B, and C,
and they all have their own subscriber...

00:17:35.139 --> 00:17:35.939
handler?

00:17:36.049 --> 00:17:36.519
James, yes.

00:17:37.574 --> 00:17:41.234
<v James Munns>So the reason that you
have a unique one of these per threads

00:17:41.814 --> 00:17:44.814
instead of just one static where you
just say, okay, the `tokio` runtime

00:17:44.824 --> 00:17:47.604
lives at this static or the tracing
subscriber lives at this static.

00:17:47.684 --> 00:17:50.074
Is that to keep the cache locality better?

00:17:50.074 --> 00:17:53.474
Like, is it better to have
one of those per thread versus

00:17:53.474 --> 00:17:55.055
one for the entire process?

00:17:55.319 --> 00:17:56.119
<v Amos Wenger>That's a good question.

00:17:56.199 --> 00:18:01.139
I think it's specifically to allow you
to have multiple unrelated runtimes

00:18:01.169 --> 00:18:03.079
with their own thread pools and whatnot.

00:18:03.479 --> 00:18:05.429
So I don't think it's a performance thing.

00:18:05.719 --> 00:18:07.099
<v James Munns>Needs more
research, looks like.

00:18:07.519 --> 00:18:10.339
Amos has a very confused look on
his face, but he's deep in thought.

00:18:10.717 --> 00:18:11.957
<v Amos Wenger>I'm 97% sure.

00:18:11.957 --> 00:18:16.097
So, in the normal case, I call this the
sane case, but in the normal case, you

00:18:16.117 --> 00:18:20.427
have a single binary, you're happy, you
have one crate that depends on 700 other

00:18:20.427 --> 00:18:22.497
crates because you're doing web things.

00:18:23.507 --> 00:18:24.947
That's speaking from experience.

00:18:25.157 --> 00:18:30.457
And then you have one copy of `tokio`'s
code baked into your binary, which

00:18:30.597 --> 00:18:35.367
includes the `CONTEXT` thread-local,
which stores the current runtime.

00:18:35.407 --> 00:18:37.967
So at the beginning of your program,
you create a runtime and then

00:18:37.967 --> 00:18:39.417
it sets that thread-local to it.

00:18:39.547 --> 00:18:42.457
And then it starts a bunch of
threads for its worker pool.

00:18:42.457 --> 00:18:44.877
And for each of those threads,
it also sets that same

00:18:44.877 --> 00:18:46.177
thread-local to the same runtime.

00:18:46.187 --> 00:18:50.102
Like any- anything you start doing
from that async runtime will happen

00:18:50.122 --> 00:18:54.582
on a thread that has this thread-local
set to that handle to that runtime.

00:18:54.889 --> 00:18:58.709
That is the sane case because there's
only one copy of your singleton.

00:18:58.719 --> 00:19:01.809
A singleton is a variable that you're
only supposed to ever have one copy

00:19:01.809 --> 00:19:05.910
of, and it's, it's been deemed evil
in the past, by people who don't know

00:19:05.910 --> 00:19:07.160
about hardware, I guess, I don't know.

00:19:07.905 --> 00:19:12.105
And then you have a second normal,
completely normal, sane case, which

00:19:12.145 --> 00:19:15.715
is something not many people know, and
I didn't know I think two weeks ago,

00:19:16.245 --> 00:19:19.945
when we first started going down that
rabbit hole together on that podcast,

00:19:19.965 --> 00:19:24.075
which is that you have something called
crate type dylib, not cdylib, I knew

00:19:24.075 --> 00:19:27.745
about this one, I've been using Rust
to make crimes and load them into

00:19:27.815 --> 00:19:33.065
regular C programs for a long time
now, but dylib just means: compile this

00:19:33.065 --> 00:19:36.055
code as like a Rust dynamic library.

00:19:36.065 --> 00:19:37.865
Do not care about ABI stability.

00:19:37.865 --> 00:19:42.715
This will break across compiler
versions, but emit it as a shared object.

00:19:42.715 --> 00:19:47.205
So it's going to be a .so on Linux,
it's going to be a .dylib on macOS,

00:19:47.225 --> 00:19:52.285
it's going to be a DLL on Windows,
and then have the executable depend

00:19:52.285 --> 00:19:54.325
on this and load it at runtime.

00:19:54.678 --> 00:19:58.879
And in this case, if you had- so
your executable depends on the crate,

00:19:58.879 --> 00:20:02.364
which is of type dylib, which itself
depends on `tokio`-  what cargo would

00:20:02.364 --> 00:20:07.084
do is that it would pull out `tokio`
into a dynamic library and have both

00:20:07.114 --> 00:20:11.304
your binary and your dependency,
like all the things would then start

00:20:11.304 --> 00:20:13.004
linking against this dynamic library.

00:20:13.027 --> 00:20:14.917
So that's, those are the
two normal cases so far.

00:20:14.917 --> 00:20:17.247
And then you have my case because...

00:20:17.427 --> 00:20:22.019
I don't know why I bring this upon
myself, but I have been splitting a

00:20:22.049 --> 00:20:25.769
big project of mine, the thing that
runs my website into separate modules.

00:20:25.779 --> 00:20:28.929
So I'm not using crate
type dylib or dylib.

00:20:29.149 --> 00:20:32.969
I'm using crate type cdylib, which
means that every module of my website-

00:20:32.969 --> 00:20:35.009
I have one to compile Markdown to HTML.

00:20:35.009 --> 00:20:38.405
I have one to render
LaTeX to math equations.

00:20:38.605 --> 00:20:41.405
I have several modules like
that, one for CSS, etc.

00:20:41.785 --> 00:20:46.455
These are all actually separate Rust
projects and I built them separately and

00:20:46.455 --> 00:20:48.005
they have nothing to do with each other.

00:20:48.005 --> 00:20:49.435
<v James Munns>But, okay.

00:20:49.905 --> 00:20:54.239
If you're still Rust talking to Rust, why
are you going for cdylib instead of dylib?

00:20:54.679 --> 00:20:55.739
<v Amos Wenger>That is
an excellent question.

00:20:55.879 --> 00:21:00.969
So, if I was doing dylib, I would still
need to have all the code of everything

00:21:00.969 --> 00:21:02.789
in one place checked out at the same time.

00:21:02.869 --> 00:21:05.999
So it's like one big repository, which
I do, but that's, that's one thing.

00:21:06.259 --> 00:21:08.549
The second thing is cargo
would parse everything.

00:21:08.959 --> 00:21:10.679
It would keep everything in memory.

00:21:10.679 --> 00:21:11.599
So would rust-analyzer.

00:21:11.619 --> 00:21:14.669
It would take gigabytes of memory
to do that, which it used to.

00:21:15.099 --> 00:21:18.929
And then whenever you change the
tiniest thing, everything recompiles.

00:21:19.079 --> 00:21:23.219
So for example, all my modules and my
main binary depend on one crate, which

00:21:23.229 --> 00:21:27.279
has the basic set of types and traits
that actually define the interfaces

00:21:27.279 --> 00:21:28.449
between the modules and binary.

00:21:28.699 --> 00:21:32.679
And whenever I change that, I don't always
need to recompile all the modules, right?

00:21:32.679 --> 00:21:35.039
Maybe I'm changing the effects
for another module or something.

00:21:35.279 --> 00:21:37.849
But if it was part of the
same crate graph, cargo would

00:21:37.849 --> 00:21:38.909
definitely go, "Oh, okay.

00:21:38.909 --> 00:21:41.359
The one dependency everyone
has in common changed, time to

00:21:41.359 --> 00:21:43.009
rebuild the entire universe."

00:21:43.819 --> 00:21:46.881
<v James Munns>Right, right, right, because
you're okay with saying, " I still promise

00:21:46.902 --> 00:21:48.582
that I'm only going to use one compiler."

00:21:48.922 --> 00:21:51.722
But you're getting rid of the promise
of, "I promise I'm going to do all

00:21:51.722 --> 00:21:53.232
of this compiling at one time."

00:21:53.542 --> 00:21:53.972
<v Amos Wenger>Exactly.

00:21:54.242 --> 00:21:54.752
<v James Munns>Makes sense.

00:21:55.062 --> 00:21:58.962
<v Amos Wenger>In fact, the way I built
it, none of the modules are built by the

00:21:58.962 --> 00:22:02.802
time you first run the debug binary and
it finds that out and it shells out to

00:22:02.802 --> 00:22:05.272
cargo and builds everything dynamically
and copies it to the right place.

00:22:05.292 --> 00:22:06.485
And does weird linker stuff.

00:22:06.495 --> 00:22:07.235
I- I'm weird.

00:22:07.235 --> 00:22:07.505
Okay.

00:22:07.505 --> 00:22:08.205
I like linker.

00:22:08.325 --> 00:22:10.565
You, you learn about
something and then you use it.

00:22:10.595 --> 00:22:13.615
<v James Munns>That sounds like a whole new
episode and I am excited for that episode.

00:22:13.936 --> 00:22:18.271
<v Amos Wenger>But the problem with my case,
which is not normal, or whatever, is that

00:22:18.510 --> 00:22:20.950
now you have N copies of `tokio` code.

00:22:21.340 --> 00:22:24.990
You have one in the main executable,
the binary, you have one in

00:22:25.030 --> 00:22:28.660
every module, which has been
built independently by cargo.

00:22:29.300 --> 00:22:33.566
And you have not only N copies of
`tokio`'s code, which would be fine,

00:22:33.566 --> 00:22:37.676
it's wasteful, but like, whatever,
it's only a few hundred kilobytes,

00:22:37.686 --> 00:22:38.766
it's on a server, I don't care.

00:22:39.696 --> 00:22:42.786
But you also have N copies of
`tokio`'s CONTEXT thread-local,

00:22:42.896 --> 00:22:44.256
which is much, much worse!

00:22:44.856 --> 00:22:46.986
<v James Munns>Yeah, your
singleton is now a multiverse.

00:22:47.356 --> 00:22:51.776
<v Amos Wenger>Yes, depending on which
version of the `tokio` code is running.

00:22:51.926 --> 00:22:54.886
So of course the first problem- which
I haven't even put in the slides- is

00:22:54.886 --> 00:22:59.576
that you have to make sure that all the
same features of `tokio` are enabled.

00:22:59.786 --> 00:23:04.546
Otherwise the layout of the internal data
structures of `tokio` is going to differ

00:23:04.546 --> 00:23:07.856
because some fields are only present
if some cargo features are enabled.

00:23:08.418 --> 00:23:11.758
But assuming you get that correctly,
in my case, the solution was just

00:23:11.758 --> 00:23:14.588
enable all possible features for
all possible modules, even if we

00:23:14.588 --> 00:23:17.178
don't use them just to make sure
that the binary layout is the same.

00:23:17.418 --> 00:23:21.048
Then you still have the problem
that you can start a `tokio`

00:23:21.358 --> 00:23:25.358
runtime from the binary and then
load a module and then invoke an

00:23:25.358 --> 00:23:27.338
asynchronous function from that module.

00:23:27.608 --> 00:23:33.448
And then if you call
`RuntimeHandle::current()` from

00:23:33.448 --> 00:23:35.128
the main binary, it's going to
be: yeah, we have a runtime.

00:23:35.148 --> 00:23:36.998
And then from modules can be:
no, we don't have runtime.

00:23:37.008 --> 00:23:40.708
Because they're not checking the same
copy of the CONTEXT thread-local, because

00:23:40.708 --> 00:23:44.528
there are different slots because there's
N copies of it, which should never happen.

00:23:45.538 --> 00:23:46.878
So what do you do?

00:23:47.308 --> 00:23:50.608
Well, you don't have to
rely on the current runtime.

00:23:50.628 --> 00:23:52.208
You don't have to rely
on the thread-local.

00:23:52.268 --> 00:23:57.876
Pretty much everything `tokio` lets
you do, it lets you do on a specific

00:23:58.206 --> 00:24:00.426
context or executor runtime or whatever.

00:24:00.556 --> 00:24:03.456
You have a top level `tokio`
spawn function, which does use

00:24:03.456 --> 00:24:05.686
the current runtime, but you can
also have a handle and then call

00:24:05.686 --> 00:24:07.336
spawn on that handle specifically.

00:24:07.336 --> 00:24:09.126
And then it doesn't rely
on the thread-local.

00:24:09.146 --> 00:24:12.756
The problem, of course, is
that you're one of the very few

00:24:12.756 --> 00:24:14.186
people who now cares about this.

00:24:14.296 --> 00:24:15.746
And all the crates are
like: no, it's fine.

00:24:16.286 --> 00:24:18.521
We can just use the ambient
runtime, so good luck trying

00:24:18.521 --> 00:24:19.631
to patch the entire world.

00:24:20.338 --> 00:24:24.448
So if you can't pass an explicit executor,
then you could just, because you control

00:24:24.448 --> 00:24:28.078
the boundary between the binary and the
module, you just say: before switching

00:24:28.078 --> 00:24:32.258
over to the module, let's set their
thread-local to the same value as our

00:24:32.258 --> 00:24:34.378
thread-local, which is a thing you can do.

00:24:34.378 --> 00:24:38.018
And then every time the future
they returned is pulled, let's also

00:24:38.018 --> 00:24:40.058
restore like that to our thread-local.

00:24:40.078 --> 00:24:43.298
So you're manually synchronizing
thread-local values, which works

00:24:43.498 --> 00:24:49.248
until one of your module's futures
spawns a task, that spawns a task,

00:24:49.368 --> 00:24:50.598
and then you've lost the chain.

00:24:51.768 --> 00:24:56.648
And that second, like, I don't know,
grandchildren task does not actually

00:24:57.308 --> 00:24:58.698
run on the, on the right runtime.

00:24:59.918 --> 00:25:02.628
So, this is not an actual solution either.

00:25:02.628 --> 00:25:03.028
So,

00:25:03.068 --> 00:25:05.065
<v Amos Wenger>as much as it pains me
to say this, because I was really

00:25:05.065 --> 00:25:08.125
trying to get this all to work without
patching `tokio`, but the solution

00:25:08.125 --> 00:25:09.255
is, of course, to patch `tokio`.

00:25:09.255 --> 00:25:12.705
When I complained about this online,
on Twitter, or Mastodon, whatever,

00:25:12.705 --> 00:25:16.925
someone tokio-adjacent said, "Oh,
we should just really have a feature

00:25:16.935 --> 00:25:20.975
that you can enable, and it makes this
magically work across shared object

00:25:20.975 --> 00:25:22.445
boundaries," which is exactly what I want.

00:25:22.881 --> 00:25:27.146
But this is a bit more
complicated than it first appears.

00:25:27.146 --> 00:25:31.796
Luckily for every thread-local in
`tokio`, they don't use the standard

00:25:31.796 --> 00:25:34.496
library `thread_local!` macro,
which is what you would usually do,

00:25:34.496 --> 00:25:36.726
and then you would end up with a
`LocalKey`, as we've seen before.

00:25:37.576 --> 00:25:41.392
They have their own `tokio::thread_local!`
macro, because they have a thing

00:25:41.392 --> 00:25:44.772
called `loom` that allows them to
debug async code, essentially, I

00:25:44.792 --> 00:25:45.782
don't actually know how it works.

00:25:45.782 --> 00:25:47.762
I just know they have their
own thing for tests only.

00:25:47.872 --> 00:25:51.172
So in testing, it uses the `loom`
version of thread-locals and in

00:25:51.172 --> 00:25:53.282
production, it uses the standard
library version of thread-locals.

00:25:53.282 --> 00:25:55.892
So this is great because we
get to just redefine that

00:25:55.892 --> 00:25:57.572
macro to do whatever we want.

00:25:58.012 --> 00:26:03.692
And in this case, I've redefined that
macro to: instead of initializing a

00:26:03.987 --> 00:26:08.427
`LocalKey` with something that returns
the address of an actual thread-local.

00:26:08.427 --> 00:26:11.387
Don't reserve a thread-local slot
at all, if the feature is enabled,

00:26:11.527 --> 00:26:17.847
and just have that function
return the value of a static mut.

00:26:18.017 --> 00:26:19.837
This is why I was asking
about static muts-

00:26:20.417 --> 00:26:20.717
<v James Munns>Ah...

00:26:21.397 --> 00:26:23.927
<v Amos Wenger>Which is first set
up when the module is loaded.

00:26:23.927 --> 00:26:27.767
So only the binary has the thread-local,
and then you load the module and you

00:26:27.767 --> 00:26:32.197
call the function that says: okay, set
your `LocalKey` function getter to this

00:26:32.197 --> 00:26:33.767
address, which is a function I export.

00:26:34.307 --> 00:26:38.707
And then that way, when it's trying to
use the thread-local, it's not actually

00:26:38.707 --> 00:26:44.767
using a thread-local from its own
object, it's just calling a function

00:26:44.767 --> 00:26:49.237
that you control that returns the address
of the thread-local from the binary.

00:26:49.247 --> 00:26:51.227
This is very, very complicated to explain.

00:26:51.633 --> 00:26:54.253
<v James Munns>So you're making it
sort of like an extern definition.

00:26:54.253 --> 00:26:57.593
So like in the actual library, instead of
defining the static or the thread-local

00:26:57.613 --> 00:26:59.073
yourself, you're defining an extern.

00:26:59.373 --> 00:27:02.833
And then when you load it dynamically,
because if we were static compiling

00:27:02.843 --> 00:27:06.703
and we did an extern, the linker would
be the one that says, "Ah, this exists

00:27:06.733 --> 00:27:10.283
in another object file," but at the
end I go, "Okay, this is this one."

00:27:10.553 --> 00:27:13.603
But when you're doing it, you leave
it essentially externally defined when

00:27:13.603 --> 00:27:16.983
you're finished making the dynamic
library, and then it actually gets

00:27:16.983 --> 00:27:20.663
resolved at dynamic linking time, so
like when you're loading the library.

00:27:20.803 --> 00:27:24.643
So in the root binary, the actual
application at the bottom, you have to

00:27:24.643 --> 00:27:29.083
make sure that it's the only one that
ever defines this symbol and then all

00:27:29.093 --> 00:27:32.653
50 of your modules or whatever you're
bringing in, they all have extern

00:27:32.653 --> 00:27:36.623
definitions, but as soon as you load
them, they get mapped onto your...

00:27:37.053 --> 00:27:38.043
that's super cool.

00:27:38.043 --> 00:27:41.533
I've- I do a lot with static linking
because in embedded, again, we don't do

00:27:41.553 --> 00:27:47.062
dynamic linking very often, so dynamic
linking is magic for me, but that's

00:27:47.062 --> 00:27:51.382
something that makes sense to me from
static extern to like dynamic extern,

00:27:51.392 --> 00:27:52.702
which I didn't know you could do.

00:27:52.892 --> 00:27:54.202
<v Amos Wenger>So this is almost it.

00:27:54.202 --> 00:27:57.252
So the initial idea is the thing I
just described, which is: okay, you

00:27:57.252 --> 00:28:00.932
have a static mut somewhere and you
have to call some initialization

00:28:00.942 --> 00:28:03.852
function in the beginning to give it
the address it's supposed to look at.

00:28:04.342 --> 00:28:07.382
But then I figured, yeah, well, if you
forget to call, if you call it twice

00:28:07.382 --> 00:28:08.402
or whatever, it's not really good.

00:28:08.402 --> 00:28:10.702
So what I did end up doing is what
you see in the slides, which you

00:28:10.702 --> 00:28:15.042
described, is: you refer to a symbol
that is not defined in this library.

00:28:15.252 --> 00:28:18.372
And then it is going to
be present at load time.

00:28:18.722 --> 00:28:20.882
So when the library is loaded,
that symbol is going to exist.

00:28:20.882 --> 00:28:22.452
And then it's just going to call a symbol.

00:28:22.792 --> 00:28:25.002
So there's no initialization to miss.

00:28:25.012 --> 00:28:26.412
You don't have to call anything.

00:28:26.422 --> 00:28:29.562
If the dynamic loading happens
properly, then it's okay.

00:28:29.572 --> 00:28:29.992
It's fine.

00:28:30.542 --> 00:28:32.932
But there's still several
problems with that.

00:28:32.932 --> 00:28:35.942
First of all, you cannot apparently-
or I didn't find the way to do

00:28:35.942 --> 00:28:39.162
it, I tried a bunch of different
things- you cannot export dynamic

00:28:39.162 --> 00:28:41.472
symbols from a binary in Rust.

00:28:41.532 --> 00:28:45.012
I'm pretty sure it's possible because
we've made up that distinction

00:28:45.012 --> 00:28:47.382
between binaries and shared objects.

00:28:47.702 --> 00:28:54.042
They're both just objects and the, the
binary does export a bunch of symbols.

00:28:54.042 --> 00:28:55.112
It has an entry point.

00:28:55.112 --> 00:28:57.262
It has a bunch of different
symbols that are exported for

00:28:57.262 --> 00:28:58.652
reasons I don't quite understand.

00:28:59.022 --> 00:29:01.562
So you could totally do it, but I
couldn't get rustc and the linker

00:29:01.572 --> 00:29:02.952
to cooperate and make that happen.

00:29:03.662 --> 00:29:09.592
So I had to involve another crate called
tls-slots that only exports that symbol.

00:29:09.622 --> 00:29:12.372
And so that makes the
dynamic linker happy.

00:29:12.672 --> 00:29:15.382
And then the other problem is
that leaving a symbol undefined,

00:29:15.867 --> 00:29:18.967
makes the linker unhappy, even if
you're creating a shared object.

00:29:19.717 --> 00:29:24.537
So you have to, you have to pass a
specific linker thing saying, "Well,

00:29:24.587 --> 00:29:28.277
you're not going to find some symbols,
just look them up at runtime," which

00:29:28.277 --> 00:29:33.277
is a very dangerous, very global, like
nuclear solution, because it could be

00:29:33.277 --> 00:29:36.157
that half the symbols you need are missing
and you only find out at load time.

00:29:36.327 --> 00:29:37.687
So hopefully it's just the one.

00:29:37.697 --> 00:29:39.487
I'm sure there's a much
cleaner way to do this.

00:29:39.537 --> 00:29:41.667
It's just what I could hack last night.

00:29:41.668 --> 00:29:44.117
It was like 10 PM.

00:29:44.117 --> 00:29:45.547
I was like, I need to finish my slides!

00:29:45.793 --> 00:29:47.943
<v James Munns>If it's stupid
and it works, it's not stupid.

00:29:48.218 --> 00:29:50.417
<v Amos Wenger>So of course, now you
want to know, does it actually work?

00:29:50.427 --> 00:29:53.217
Can you like get several shared
objects to cooperate and all

00:29:53.217 --> 00:29:54.317
use the same `tokio` runtime?

00:29:55.142 --> 00:29:55.712
Not really.

00:29:56.882 --> 00:29:57.442
Still.

00:29:58.122 --> 00:30:00.372
Because thread-locals are
just half of the problem.

00:30:00.372 --> 00:30:02.022
You also have process-locals.

00:30:02.322 --> 00:30:05.122
And I used to think it didn't
matter, and actually if you're using

00:30:05.122 --> 00:30:07.992
the `current_thread` runtime, the
simpler of the two, it does not

00:30:07.992 --> 00:30:09.172
matter and it does actually work.

00:30:09.412 --> 00:30:14.042
But if you use a multi-thread runtime,
it has a bunch of atomic statics that

00:30:14.042 --> 00:30:17.672
it updates, and some of those are like
number of parked threads, or like number

00:30:17.672 --> 00:30:19.197
of things that are waiting for that.

00:30:19.217 --> 00:30:22.377
And I think it's checking them at
various places in the multi threaded

00:30:22.697 --> 00:30:24.737
runtime going, "Oh, it's at zero.

00:30:24.737 --> 00:30:25.867
We don't need to do anything."

00:30:26.177 --> 00:30:30.427
So actually sometimes my code gets
stuck and it helps if from the main

00:30:30.427 --> 00:30:35.207
binary before calling into the module,
I spawn a task that's just busy looping.

00:30:35.717 --> 00:30:36.417
Not really.

00:30:36.457 --> 00:30:40.437
It like, sleeps for 10 milliseconds
in a loop, and then that has the

00:30:40.437 --> 00:30:44.557
runtime check on all the tasks again,
every 10 milliseconds, and that helps

00:30:44.917 --> 00:30:46.367
the program actually make progress.

00:30:46.507 --> 00:30:50.627
So no, the solution does not actually
fully work right now, but it's,

00:30:50.637 --> 00:30:51.977
you know, it's one step closer.

00:30:51.977 --> 00:30:56.647
I just have to take care of process
locals now, which are really globals.

00:30:57.073 --> 00:30:59.695
<v James Munns>Yeah, I was going to
say, like somewhat common thing- so

00:30:59.695 --> 00:31:01.255
again, this is static linking brain.

00:31:01.515 --> 00:31:04.495
Static linking might have something like,
they're usually called weak symbols.

00:31:04.605 --> 00:31:07.665
So weak symbols means either I can
provide it, but if someone else

00:31:07.665 --> 00:31:09.495
provides it, you like defer to them.

00:31:09.835 --> 00:31:12.675
But I have no idea how weak
symbol resolution for dynamic

00:31:12.675 --> 00:31:15.242
libraries would work, but it
sounds sort of like what you want-

00:31:15.285 --> 00:31:16.682
<v Amos Wenger>That is, that
is what I tried first.

00:31:16.702 --> 00:31:22.427
And I discovered that not only is it, of
course, perma unstable in Rust, but also

00:31:22.677 --> 00:31:26.137
there's like three different variants
of it, which map just to what LLVM does.

00:31:26.137 --> 00:31:27.267
And half of it is broken.

00:31:27.277 --> 00:31:29.417
And even in the standard
library, they're misusing it.

00:31:29.528 --> 00:31:31.738
I can give you links, but
like the discussions are like:

00:31:31.738 --> 00:31:32.828
none of that makes any sense.

00:31:32.828 --> 00:31:34.398
Do not use weak linkage in Rust.

00:31:34.408 --> 00:31:35.048
That's a bad idea.

00:31:35.681 --> 00:31:37.791
<v James Munns>If you'd like to
know a fun trick: the linker

00:31:37.801 --> 00:31:39.211
can weaken symbols itself.

00:31:39.221 --> 00:31:42.201
So the compiler can produce non weak
symbols and you can come back with the

00:31:42.201 --> 00:31:45.521
linker and re mark the sections as weak.

00:31:45.741 --> 00:31:48.941
Although, I think the reason that it's
unstable is because if the optimizer

00:31:48.941 --> 00:31:52.261
came in and just like pretended that
static isn't there, or cached a value

00:31:52.281 --> 00:31:53.531
because it thinks it never changes.

00:31:53.851 --> 00:31:56.811
Then I could see it miscompiling or
causing undefined behavior, but when

00:31:56.811 --> 00:32:00.574
you start getting into binutils and
linkers, you can start doing extra

00:32:00.574 --> 00:32:02.444
spooky stuff, because linkers are like...

00:32:02.504 --> 00:32:03.094
oh, man.

00:32:03.365 --> 00:32:06.104
<v Amos Wenger>Yeah, I know a bunch
of Linux specific tricks, but I

00:32:06.104 --> 00:32:09.144
also use macOS to develop this, so
it has to work on both platforms.

00:32:09.144 --> 00:32:11.254
So some things I just didn't even try.

00:32:11.414 --> 00:32:13.664
One thing I wanted to try
that seemed universal is just

00:32:13.664 --> 00:32:15.044
go ahead and patch the code.

00:32:15.094 --> 00:32:17.688
Just patch whatever
function gets  thread-local.

00:32:17.918 --> 00:32:20.688
The problem, of course, is inlining,
because you're patching a function

00:32:20.688 --> 00:32:25.858
that might not even get called from
the call sites, it's all inlined.

00:32:26.448 --> 00:32:28.982
Specifically if the thing is
all const, I was like, "I'm just

00:32:28.982 --> 00:32:30.032
going to override the memory."

00:32:30.032 --> 00:32:33.082
And it's like: no, because
nobody's reading from that.

00:32:33.092 --> 00:32:34.392
We've been learning
everything a long time ago.

00:32:34.497 --> 00:32:38.387
<v James Munns>If you make it a pub
static so if you export the static,

00:32:38.397 --> 00:32:41.887
then the optimizer can realize that
it's an exported symbol, and that

00:32:41.887 --> 00:32:43.547
it can't mess with it that much.

00:32:43.557 --> 00:32:46.417
So, if you do replace it
with like a tokio::static!

00:32:46.437 --> 00:32:46.957
instead?

00:32:47.222 --> 00:32:50.202
<v Amos Wenger>That's what the macro does
when you enable the external TLS feature.

00:32:50.202 --> 00:32:53.322
Yeah, yeah, but it was a lot of trial
and error because at first I was like:

00:32:53.322 --> 00:32:54.942
okay, so trying to override a const.

00:32:54.962 --> 00:32:59.032
It was like, no, and then exporting
a static, but a static function?

00:33:00.112 --> 00:33:03.302
Exporting a function and overriding it
was like: again, no, you can't do that.

00:33:03.302 --> 00:33:04.362
And then, okay, it was static-

00:33:05.145 --> 00:33:05.292
<v James Munns>Spooky.

00:33:05.382 --> 00:33:08.272
<v Amos Wenger>Mod, pub, whatever,
export all the things, `#[no_mangle]`,

00:33:08.632 --> 00:33:09.852
pretty please with a cherry on top?

00:33:09.892 --> 00:33:11.062
And then finally it started working.

00:33:11.908 --> 00:33:16.142
So yeah, future work: get it to
actually work and then apply the same

00:33:16.162 --> 00:33:20.552
treatment to process locals or actual
statics and see if everything works.

00:33:20.582 --> 00:33:22.972
I want to do the same technique for
`tracing-subscriber` because right

00:33:22.972 --> 00:33:26.562
now I have to do that same manually
synchronizing thread-locals which is

00:33:26.632 --> 00:33:28.612
really annoying and really error prone.

00:33:29.282 --> 00:33:34.816
And I'd like to get this into `tokio`,
but that's another curt scenario for

00:33:34.816 --> 00:33:39.136
them to worry about and a bunch of
code and it seems unlikely right now.

00:33:39.456 --> 00:33:43.376
I might try because it's not fun to
maintain a patch set against, you

00:33:43.376 --> 00:33:48.176
know, the most popular Rust executor,
but, uh, yeah, maybe, I don't know.

00:33:48.506 --> 00:33:51.236
I'm going to keep trying to make this
cursed thing work because it's really,

00:33:51.236 --> 00:33:56.431
really nice for me to be able to build,
uhh package up, and deploy my website in

00:33:56.461 --> 00:33:58.091
under two minutes all around the world.

00:33:58.647 --> 00:34:02.187
This used to take 15 minutes easy,
and it's just- you start thinking

00:34:02.187 --> 00:34:04.727
about whole different things when you
have that kind of iteration speed.

00:34:04.827 --> 00:34:06.967
So I'm going to keep trying
because I'm stubborn.

00:34:07.922 --> 00:34:10.272
<v James Munns>I'm super excited, because
this is one of those things where you

00:34:10.272 --> 00:34:13.282
figure out the cursed way, and then
you show the internet the cursed things

00:34:13.282 --> 00:34:16.362
that you do, and you find someone on
the internet who goes, "No, no, like

00:34:16.362 --> 00:34:18.722
this," and you find the better way to
do that, and then you figure out how

00:34:18.722 --> 00:34:22.382
to wrap that in a tool, and then all
of a sudden: Oh yeah, that thing we

00:34:22.382 --> 00:34:26.612
said that Rust could not do for a very,
very long time, of having dynamically

00:34:26.613 --> 00:34:27.792
loadable modules and stuff like that?

00:34:27.822 --> 00:34:28.822
Oh, now it's fixed.

00:34:28.892 --> 00:34:31.952
Like, I love that kind of tooling,
even when it starts as like,

00:34:32.172 --> 00:34:35.442
spooky, unsafe stuff, but if we can
figure out a way to, to ship that.

00:34:35.442 --> 00:34:37.162
That'd be super, super cool.

00:34:37.649 --> 00:34:37.989
<v Amos Wenger>That's it.

00:34:38.154 --> 00:34:39.154
<v James Munns>That's a podcast!

00:34:39.234 --> 00:34:39.854
<v Amanda Majorowicz>Doop, doop doop doop,

00:34:40.054 --> 00:34:42.274
<v James Munns>doo.

00:34:47.926 --> 00:34:50.266
<v Amos Wenger> This episode is
sponsored by Ladybird browser.

00:34:50.686 --> 00:34:54.986
Today, every major web browser is funded
or powered by Google's advertising empire.

00:34:55.186 --> 00:34:57.246
Choice is good, but your
only choice is Google.

00:34:57.586 --> 00:35:00.096
The Ladybird browser wants
to do something about this.

00:35:00.496 --> 00:35:03.576
Ladybird is a brand new browser and
web engine written from scratch and

00:35:03.576 --> 00:35:05.426
free of the influences of Big Tech.

00:35:05.986 --> 00:35:08.486
Driven by web standards first
approach, Ladybird aims to

00:35:08.486 --> 00:35:12.196
render the modern web with good
performance, stability, and security.

00:35:12.437 --> 00:35:16.287
From its humble beginnings as an HTML
viewer for the SerenityOS hobby operating

00:35:16.287 --> 00:35:20.247
system project, Ladybird has since grown
into a cross-platform browser supporting

00:35:20.247 --> 00:35:22.817
Linux, macOS, and other Unix like systems.

00:35:23.407 --> 00:35:27.077
In July, Ladybird launched a non-profit
to support development and announced a

00:35:27.077 --> 00:35:31.107
first Alpha for early adopters targeting
2026, but you can support the project

00:35:31.107 --> 00:35:33.237
on GitHub or via donations today.

00:35:33.737 --> 00:35:36.947
Visit ladybird.org for more information
and to join the mailing list.

00:35:37.587 --> 00:35:40.467
Thanks to Ladybird for
sponsoring today's episode.

