WEBVTT

NOTE
This file was generated by Descript <www.descript.com>

00:00:13.555 --> 00:00:15.295
<v Amanda Majorowicz>This
is Self-Directed Research.

00:00:15.355 --> 00:00:19.255
Our hosts, James and Amos get hyped about
different topics and take turns each week,

00:00:19.255 --> 00:00:20.905
presenting their ideas to each other.

00:00:21.295 --> 00:00:23.755
You can check out the website,
YouTube or Spotify to watch

00:00:23.755 --> 00:00:28.375
this episodes presentation and
visit sdr-podcast.com/episodes

00:00:28.375 --> 00:00:31.165
for presentations, videos,
show notes and transcripts.

00:00:31.585 --> 00:00:33.505
New episodes are
published every Wednesday.

00:00:33.887 --> 00:00:35.977
This episode is brought
to you by CodeCrafters.

00:00:36.047 --> 00:00:39.347
Check out the link in our show notes or
listen at the end for more information.

00:00:39.897 --> 00:00:44.277
And now in a rare direct followup to last
week's episode, almost shares an even more

00:00:44.277 --> 00:00:47.017
different take with "Merde is not Serde."

00:00:53.103 --> 00:00:53.653
<v Amos Wenger>Exciting.

00:00:53.683 --> 00:00:55.868
I'm also on ethernet for this computer.

00:00:56.233 --> 00:00:56.593
<v James Munns>Nice.

00:00:56.853 --> 00:00:57.703
<v Amanda Majorowicz>I'm on Wi Fi.

00:00:57.973 --> 00:01:01.233
Also, this is like a perfect
follow up to last week's episode.

00:01:01.373 --> 00:01:02.703
I mean, honestly...

00:01:02.758 --> 00:01:03.418
<v Amos Wenger>Yes!

00:01:03.448 --> 00:01:04.208
That's why I did it!

00:01:04.208 --> 00:01:07.168
That's why I threw away my
other decks and made one.

00:01:07.223 --> 00:01:08.493
<v Amanda Majorowicz>Like,
screw everything else!

00:01:08.523 --> 00:01:09.353
We're doing this.

00:01:10.033 --> 00:01:10.523
Perfect.

00:01:10.523 --> 00:01:14.053
<v James Munns>Did you see that
dtolnay commented on my post on the

00:01:14.133 --> 00:01:16.563
approach that postcard-forth used
and was like, "Yeah, I'm pretty sure

00:01:16.563 --> 00:01:17.917
that's the optimal way to do that.

00:01:18.456 --> 00:01:21.759
That's what we should be doing if
it, stabilized in the compiler."

00:01:21.769 --> 00:01:22.529
And I was like, "Oh!"

00:01:22.809 --> 00:01:25.669
And he linked a post from like a year ago
where he's like, "Yeah, I'm pretty sure

00:01:25.669 --> 00:01:29.479
you want to turn this into a bytecode
format and then just run through that.

00:01:29.489 --> 00:01:30.979
Cause that's going to
end up being optimal."

00:01:30.989 --> 00:01:31.639
And I was like, "Oh!

00:01:31.759 --> 00:01:32.139
Okay..."

00:01:32.289 --> 00:01:34.728
Well, one: he beat me to the
punch and two: well, that's

00:01:34.728 --> 00:01:35.908
a little, uh, reassuring.

00:01:36.728 --> 00:01:38.338
<v Amos Wenger>I did so many
cool things with my website.

00:01:38.388 --> 00:01:41.176
You know the saying about
history, like, " Sometimes, uh...

00:01:41.235 --> 00:01:43.925
weeks happen in years and years
happen in weeks" or something?

00:01:44.129 --> 00:01:45.929
Well- years happened
this week, I don't know.

00:01:46.269 --> 00:01:48.909
I did a lot of things for my website
and I'm excited to talk about every

00:01:48.909 --> 00:01:51.656
single aspect of it, but this is
not all we are talking about today.

00:01:51.656 --> 00:01:52.638
Today we're talking about: merde.

00:01:54.411 --> 00:01:57.365
It's pronounced 'mer- day.'
Okay, this is a pretty fun joke.

00:01:57.539 --> 00:02:00.062
The title of today's presentation
is "merde is not serde."

00:02:00.543 --> 00:02:03.923
The subtitle is "another take on
(de)serialization in Rust," and I have

00:02:03.923 --> 00:02:07.283
been forced against my will to talk about
it now, even though it's not perfect yet.

00:02:07.303 --> 00:02:09.083
It's gonna be perfect
soon, but it's not yet.

00:02:09.428 --> 00:02:14.118
Because James did his own take on
another serde, another possible

00:02:14.118 --> 00:02:20.014
serde, and so I had to present mine,
which is actually in version 8.1.2.

00:02:20.354 --> 00:02:24.004
I've been doing some major iterating
on this thing, and people have

00:02:24.004 --> 00:02:27.904
asked me a lot, but "Are you aware
that 'merde' in French means poop?"

00:02:28.414 --> 00:02:29.644
And yes, yes, I am.

00:02:29.674 --> 00:02:31.104
This is the logo for the library.

00:02:31.424 --> 00:02:32.694
It's drawn by Misia.

00:02:32.714 --> 00:02:35.474
There's going to be a link to
her website in the show notes on

00:02:35.474 --> 00:02:41.054
sdr-podcast.com/episodes, where
you can find the slides as well.

00:02:41.282 --> 00:02:42.232
And, um...

00:02:42.292 --> 00:02:45.222
it's what I want serde to be, which
is very different from what you

00:02:45.222 --> 00:02:48.995
want serde to be, James, because
I want something that builds fast.

00:02:48.995 --> 00:02:51.765
I think actually we're aligned on
that part, but then I also want a

00:02:51.765 --> 00:02:55.215
lot of functionality, and I mostly
want to deserialize a bunch of

00:02:55.215 --> 00:02:57.852
JSON, because I have a website.

00:02:57.872 --> 00:03:02.146
So I deserialize a bunch of stuff, and
I hate the big Codegen, I hate the long

00:03:02.146 --> 00:03:04.516
compile times, I hate the proc macros.

00:03:04.516 --> 00:03:07.859
So  I did like a quick hack for
myself and then things got out

00:03:07.859 --> 00:03:09.919
of control as they tend to do.

00:03:10.064 --> 00:03:12.004
So now I'm maintaining a whole ecosystem.

00:03:12.404 --> 00:03:16.874
It does support deserializing JSON,
YAML and MessagePack, just because

00:03:16.874 --> 00:03:18.904
that's what I was using on my website.

00:03:19.144 --> 00:03:22.534
And it does support serializing
JSON and not any of the other ones.

00:03:22.534 --> 00:03:25.510
There's no reason why it doesn't
just- I haven't gotten around to

00:03:25.556 --> 00:03:26.786
-
<v James Munns>I haven't needed it yet...

00:03:26.844 --> 00:03:29.294
<v Amos Wenger>Someone said, "Hey,
can I contribute KDL support?"

00:03:29.294 --> 00:03:30.454
I was like, "Knock yourself out."

00:03:30.474 --> 00:03:34.394
I mean, things are still majorly shifting
in the crate, as you can tell by the

00:03:34.394 --> 00:03:36.504
major version number increasing rapidly.

00:03:36.944 --> 00:03:39.134
But yeah, it is in
production in my website.

00:03:39.444 --> 00:03:43.174
So if you can pwn my server somehow, cause
I forgot about something in merde, then...

00:03:43.184 --> 00:03:44.264
you know, more power to you.

00:03:44.340 --> 00:03:46.410
<v James Munns>I'm getting my
denial-of-service vectors ready.

00:03:47.590 --> 00:03:49.410
<v Amos Wenger>One big thing that's
been bothering me in the serde

00:03:49.510 --> 00:03:53.830
ecosystem is that everyone is
using the serde JSON value type.

00:03:53.830 --> 00:03:56.330
When you deserialize to something,
you don't know the shape of it,

00:03:56.330 --> 00:03:59.955
you're not sure, it's kind of the
Any type of the deserialization word.

00:03:59.955 --> 00:04:02.345
You use the serde JSON
value, which looks like this.

00:04:02.585 --> 00:04:07.080
It has null, it has bool, it has a
number, string, array, and object.

00:04:07.529 --> 00:04:09.669
And you don't really get to choose much.

00:04:09.689 --> 00:04:11.049
This is pretty much set in stone.

00:04:11.049 --> 00:04:12.569
I don't think they can
change it at this point.

00:04:12.609 --> 00:04:15.979
Like, adding a variant would
be breaking, changing the type

00:04:15.979 --> 00:04:17.179
of variant would be breaking.

00:04:17.179 --> 00:04:18.679
We're pretty much stuck with this.

00:04:18.939 --> 00:04:21.639
Even for formats that are not
JSON, you can take a binary

00:04:21.639 --> 00:04:23.309
format and deserialize it to that.

00:04:23.739 --> 00:04:27.397
But then, if you deserialize, if
there's like a byte slice in there,

00:04:27.472 --> 00:04:29.630
it's gonna be an array of U8.

00:04:29.852 --> 00:04:32.352
<v James Munns>Yeah, I actually have
another crate, postcard-dyn, which

00:04:32.372 --> 00:04:37.982
transcodes postcard data into JSON, and
I actually just used `serde_json::Value`

00:04:38.042 --> 00:04:41.246
for this, because  when you don't
know the shape of it, it's an easy,

00:04:41.256 --> 00:04:45.349
if you have a heap, you can stack
items here, and it just works, but...

00:04:45.589 --> 00:04:48.400
I guess that's the trappings of
success, is once it's popular, it

00:04:48.400 --> 00:04:50.210
becomes very difficult to change.

00:04:50.340 --> 00:04:53.465
<v Amos Wenger>Yeah, I, want to make
it very clear that part of the reason

00:04:53.465 --> 00:04:56.705
why I'm able to experiment with merde
is that Rust has changed and I don't

00:04:56.705 --> 00:04:58.325
have any compatibility guarantees.

00:04:58.325 --> 00:05:01.761
I'm sure that  they would like to
change some things in serde if he

00:05:01.761 --> 00:05:05.661
could, but serde v2 is a hard sell
in the Rust community, the v1 is a

00:05:05.661 --> 00:05:08.791
big selling point so changing it has
to have some significant upsides.

00:05:08.811 --> 00:05:12.061
And this is not what I'm trying to do
I'm just exploring the space in a very

00:05:12.111 --> 00:05:15.711
different part of space than you were
James, so that's why it was funny to me.

00:05:15.711 --> 00:05:20.926
One thing that I noticed in that
enum is that object has a map type?

00:05:20.926 --> 00:05:22.536
And I was like: why not just use HashMap?

00:05:22.536 --> 00:05:26.006
Clearly they are using the base type,
the string is just an own string in Rust.

00:05:26.356 --> 00:05:29.236
Number, I don't know what number
is hiding, but the thing that

00:05:29.236 --> 00:05:33.926
is hidden behind map is whether
insertion order is preserved or

00:05:33.926 --> 00:05:36.186
not, so it's an alias, or like...

00:05:36.216 --> 00:05:39.431
It's a struct with a hidden impl
in there that forwards all the

00:05:39.431 --> 00:05:41.531
implementations to the underlying thing.

00:05:41.821 --> 00:05:46.981
And it's either a BTreeMap, which
is ordered, but not by insertion

00:05:46.981 --> 00:05:50.201
order, or it's an index map, which
is like a regular map, but it

00:05:50.211 --> 00:05:52.141
also keeps the insertion order.

00:05:52.141 --> 00:05:56.175
And so it iterates by the order by
which objects were inserted into it.

00:05:56.285 --> 00:05:56.963
I'm sure you know what I mean.

00:05:57.009 --> 00:06:00.128
<v James Munns>Yeah, I do think
BTreeMap preserves insertion order.

00:06:02.073 --> 00:06:02.563
<v Amos Wenger>No it doesn't!

00:06:02.563 --> 00:06:04.833
It orders keys, so keys must be
comparable, like they must implement

00:06:04.853 --> 00:06:08.895
Ord, and then when you iterate, it
iterates from smallest to largest.

00:06:08.929 --> 00:06:10.729
But it doesn't preserve insertion order.

00:06:10.899 --> 00:06:11.699
That's why I was confused.

00:06:11.699 --> 00:06:13.869
I was like, "Oh, BTreeMap,
if it's ordered- no, wait."

00:06:15.077 --> 00:06:15.647
You can look it up.

00:06:15.647 --> 00:06:16.337
Now we have time...

00:06:17.201 --> 00:06:17.711
<v James Munns>I'll trust you.

00:06:17.778 --> 00:06:20.597
I definitely know what you mean with index
map, cause we also have that on embedded.

00:06:20.737 --> 00:06:22.687
<v Amos Wenger>Because if BTreeMap
preserved insertion order,

00:06:22.687 --> 00:06:23.917
why would you need index map?

00:06:24.007 --> 00:06:24.337
Right?

00:06:24.550 --> 00:06:26.980
<v James Munns>But I thought that's
why I use BTreeMap instead of

00:06:26.980 --> 00:06:28.520
HashMap in a lot of places, but...

00:06:28.867 --> 00:06:30.917
maybe it's just a consistent
iteration or- well, yeah.

00:06:30.917 --> 00:06:31.177
Okay.

00:06:31.177 --> 00:06:32.657
I'm going to look it up,
cause we're talking about it.

00:06:32.787 --> 00:06:33.687
<v Amos Wenger>We are going to look it up.

00:06:33.687 --> 00:06:33.837
Yeah.

00:06:33.837 --> 00:06:34.887
that does make sense.

00:06:35.167 --> 00:06:36.037
<v James Munns>Tap, tap, tap, tap, tap.

00:06:36.427 --> 00:06:36.787
<v Amos Wenger>Mm-Hmm.

00:06:37.468 --> 00:06:39.568
<v James Munns>Actually, we should
go to the top of std collections

00:06:39.568 --> 00:06:40.898
cause they talk about this.

00:06:40.908 --> 00:06:44.208
"Use BTreeMap when you want
a map sorted by its keys."

00:06:44.218 --> 00:06:47.348
So yeah, I guess you're right-  it
has a consistent iteration order,

00:06:47.348 --> 00:06:50.981
but it's going to be sorted by its
keys and not in the insertion order.

00:06:51.146 --> 00:06:51.356
<v Amos Wenger>Okay.

00:06:51.356 --> 00:06:55.411
Yeah, I was confused too 'cause I remember
looking at the code base for cargo-dist

00:06:55.431 --> 00:06:59.591
a lot and it has type aliases for
like ordered map versus unordered map.

00:06:59.811 --> 00:07:04.084
It's good for comparing things  because
it's going to have a predictable order as

00:07:04.084 --> 00:07:07.274
opposed to HashMap which could resize in
the middle or like the different random

00:07:07.274 --> 00:07:09.764
seed that turn to avoid denial of service.

00:07:10.354 --> 00:07:15.554
In typical SDR fashion, I have 32 slides
and we're spending 10 minutes on slide 4.

00:07:15.655 --> 00:07:19.451
So anyway, it's just using basic
Rust types, but it's serde JSON.

00:07:19.451 --> 00:07:20.771
People kind of standardize on that.

00:07:20.831 --> 00:07:23.801
You can serialize to a serde JSON
value, you can deserialize from a

00:07:23.801 --> 00:07:27.509
serde JSON value instead of like
some input in some markup format.

00:07:27.883 --> 00:07:33.093
I decided to make the value type a first
class citizen in merde and add things like

00:07:33.103 --> 00:07:35.423
bytes and add things like copy-on-write.

00:07:35.443 --> 00:07:38.663
You can see `CowStr`, you
can see `CowBytes` and

00:07:38.692 --> 00:07:40.177
different array and map types.

00:07:40.197 --> 00:07:42.072
You can see I64 and U64.

00:07:43.152 --> 00:07:48.162
Because, well, U64 can have larger values
than I64, so not everything fits in there.

00:07:48.282 --> 00:07:51.932
You can see float is an ordered
float, so the whole type is ordered.

00:07:52.392 --> 00:07:53.572
That's a compromise.

00:07:53.922 --> 00:07:57.676
But basically, F64 does not implement
Ord because you're not supposed

00:07:57.676 --> 00:08:00.456
to be able to order NaNs, I think?

00:08:00.551 --> 00:08:01.401
<v James Munns>Yeah, exactly.

00:08:01.543 --> 00:08:04.036
<v Amos Wenger>NaN is, smaller than NaN
is always false, but NaN is bigger

00:08:04.036 --> 00:08:07.766
than NaN is always false as well,
and there's like 4 million NaNs?

00:08:07.911 --> 00:08:08.664
It's a lot of NaNs.

00:08:08.841 --> 00:08:10.191
<v James Munns>In F64 probably.

00:08:11.341 --> 00:08:13.441
<v Amos Wenger>A big thing I wanted
to do, cause I was like: I'm gonna

00:08:13.441 --> 00:08:14.601
compromise on a bunch of things.

00:08:14.601 --> 00:08:18.071
I'm gonna compromise on monomorphization,
cause I want better build speeds.

00:08:18.271 --> 00:08:22.095
I'm gonna compromise on- I don't
know, doing dynamic dispatch, which

00:08:22.095 --> 00:08:23.145
is the same kind of compromise.

00:08:23.335 --> 00:08:27.135
One thing we can have for free is just
borrow from the input instead of copying

00:08:27.135 --> 00:08:30.955
to the heap whenever we can, so we
have the `CowStr` and `CowBytes` types.

00:08:31.165 --> 00:08:35.065
Every Rust project has their versions
of `CowStr`, because there's a CoW

00:08:35.115 --> 00:08:40.045
type in the standard library, and
you can pass it a reference type...

00:08:40.690 --> 00:08:46.744
like str, and then if that type implements
ToOwned, then you have your pair of types.

00:08:46.744 --> 00:08:49.264
You have the borrowed type and the
owned type, and in this case the

00:08:49.264 --> 00:08:54.024
borrowed type would be, str slice,
so ampersand str, and then the owned

00:08:54.024 --> 00:08:58.481
type would be the, um, standard
library type String with a capital S

00:08:58.491 --> 00:09:00.041
which is the own version of a string.

00:09:00.211 --> 00:09:03.921
But I don't want String to be the owned
type I want compact string to be the

00:09:03.921 --> 00:09:08.161
owned type because if you have a lot
of short strings it's kind of silly

00:09:08.161 --> 00:09:12.381
to do a tiny allocation on the heap
and put like 8 bytes there if you have

00:09:12.381 --> 00:09:13.701
a lot of first names or something.

00:09:13.835 --> 00:09:16.895
In the space that it would take to
point somewhere in the heap, you can

00:09:16.895 --> 00:09:18.895
just store the data inline directly.

00:09:18.895 --> 00:09:21.565
There's a slew of small string crates.

00:09:21.660 --> 00:09:21.840
<v James Munns>Yeah.

00:09:21.840 --> 00:09:23.090
I was going to say,
which one are you using?

00:09:23.090 --> 00:09:25.970
Because I know there's like eight and
they all go back and forth on which one

00:09:26.085 --> 00:09:30.506
-
<v Amos Wenger>I'm using the best one
obviously, which is compact_str.

00:09:30.755 --> 00:09:32.535
There's going to be again
links in the show notes.

00:09:32.875 --> 00:09:35.895
And the same thing for bytes, you
can do the same trick for bytes.

00:09:35.905 --> 00:09:36.945
You just need a different crate.

00:09:36.955 --> 00:09:39.725
It's a crate that was
actually based on compact_str.

00:09:39.915 --> 00:09:42.235
I did a review of all the
small string crates a few years

00:09:42.235 --> 00:09:43.195
ago, and it's out of date.

00:09:43.215 --> 00:09:44.925
And now the best one is compact_str.

00:09:44.945 --> 00:09:45.970
So I should update it.

00:09:46.020 --> 00:09:46.547
<v James Munns>Pro tip.

00:09:46.710 --> 00:09:50.980
<v Amos Wenger>As I mentioned, a priority
for merde is build speed, because I

00:09:50.980 --> 00:09:52.480
like to iterate on my website a lot.

00:09:52.610 --> 00:09:53.720
I interact with a lot of APIs.

00:09:53.720 --> 00:09:56.540
I interact with the Reddit API, the
Patreon API, the GitHub Sponsors

00:09:56.540 --> 00:09:58.530
API, my own APIs internally.

00:09:58.950 --> 00:10:02.002
And so I don't want to be spending
my whole time compiling serde

00:10:02.002 --> 00:10:05.318
generated code, which we've already
brought up a lot in this podcast.

00:10:05.338 --> 00:10:06.278
We know that it's an issue.

00:10:06.673 --> 00:10:09.333
One of the things that make
compiling projects with a

00:10:09.333 --> 00:10:10.633
lot of serde derived types...

00:10:11.828 --> 00:10:13.151
it's serde_derive just to be clear.

00:10:13.151 --> 00:10:14.111
It's not serde itself.

00:10:14.121 --> 00:10:17.322
It's serde_derive specifically,
which I think most people are using.

00:10:17.392 --> 00:10:21.807
I think very few people are doing manual
implementation of serde's Serialize and

00:10:21.807 --> 00:10:23.597
Deserialize type if they can help it.

00:10:23.645 --> 00:10:26.285
<v James Munns>And even if you aren't using
it personally, if you have dependencies

00:10:26.285 --> 00:10:29.525
on types that have a serde feature, you're
still going to be pulling it in in your

00:10:29.525 --> 00:10:31.585
dependencies and paying that cost, really.

00:10:32.015 --> 00:10:32.435
<v Amos Wenger>Yes.

00:10:32.435 --> 00:10:35.107
And proc macros are really
hard to cache properly.

00:10:35.157 --> 00:10:39.307
There have been some experiments to
enable caching for proc macro output.

00:10:39.307 --> 00:10:42.620
If all the inputs are the same, then
there's no point in even compiling

00:10:42.620 --> 00:10:44.300
the proc macro code and running it.

00:10:44.350 --> 00:10:45.280
It's a long story.

00:10:45.280 --> 00:10:49.710
We could have a whole episode on
that, but there's gains, but it's

00:10:49.710 --> 00:10:53.190
hard to determine what the actual
inputs of the proc macros are.

00:10:53.190 --> 00:10:54.290
It could be arbitrary code.

00:10:54.520 --> 00:10:56.390
And even if you sandbox all the things.

00:10:56.710 --> 00:11:00.789
You still have to compute
cache keys and all that.

00:11:01.329 --> 00:11:04.449
And at the end of the day, it's really
hard to come up with a solution that

00:11:04.449 --> 00:11:07.329
works for everyone, that actually speeds
up build rather than slowing them down.

00:11:07.469 --> 00:11:08.319
Caching is hard.

00:11:09.199 --> 00:11:11.149
<v James Munns>Especially when you
have non sandboxed items and the

00:11:11.149 --> 00:11:14.219
fact that proc macros can be side
effectful and are allowed to do things

00:11:14.219 --> 00:11:18.489
like write or read from disk or make
network connections like Diesel does

00:11:18.679 --> 00:11:19.959
<v Amos Wenger>But I have
used my superpower.

00:11:19.969 --> 00:11:22.839
I have complained online about it,
which let me know the people were

00:11:22.839 --> 00:11:24.719
already discussing, "What can we do?"

00:11:24.969 --> 00:11:29.249
And yeah, constraining what proc
macros can do, giving them a way to

00:11:29.259 --> 00:11:32.219
make up their own cache key, and if
they get it wrong, it's their fault.

00:11:32.229 --> 00:11:33.139
There's a lot of things to do.

00:11:33.169 --> 00:11:35.529
Again, different episode idea, for later.

00:11:35.809 --> 00:11:38.639
But in merde, no proc macros,
only declarative macros.

00:11:38.639 --> 00:11:41.008
So this is what serde would look
like: you use the serialize and

00:11:41.008 --> 00:11:45.138
deserialize trait and derive macros
that both symbols are named the same.

00:11:45.263 --> 00:11:47.433
Rust namespacing rules are fun.

00:11:47.528 --> 00:11:50.788
And then you have this attribute on top
of your struct, so if you have a struct

00:11:50.828 --> 00:11:56.498
points with two fields x and y of type
i32, on top of that you just slap pound,

00:11:56.708 --> 00:12:01.374
hash, octothorpe, whatever, little
sharp symbol, and then square brackets,

00:12:01.654 --> 00:12:03.754
derive, serialize, deserialize, debug.

00:12:03.754 --> 00:12:08.023
So, nice thing about that, you can derive
serde's traits the same way you can derive

00:12:08.023 --> 00:12:10.637
debug, which is a built in derive macro.

00:12:10.997 --> 00:12:15.047
But in merde, you don't do that because
that would be slow, so instead you

00:12:15.047 --> 00:12:17.457
just have a normal declarative macro.

00:12:17.497 --> 00:12:21.711
So first you declare your struct, just
as usual, derive debug, struct point,

00:12:22.031 --> 00:12:27.161
two fields x and y of type i32, and then
separately, you call merde, colon, colon,

00:12:27.161 --> 00:12:30.161
derive, or you can import derive into
your namespace and then in there you

00:12:30.161 --> 00:12:35.096
have this kind of weird DSL, a domain
specific language, and you'd say which

00:12:35.096 --> 00:12:36.296
traits of serde you want to implement.

00:12:36.316 --> 00:12:39.446
Maybe you only need to deserialize, you
don't need to serialize, just like with

00:12:39.446 --> 00:12:40.743
serde you can implement or the other.

00:12:40.743 --> 00:12:43.810
So impl, what looks like a tuple,
but it's just a list of traits.

00:12:43.870 --> 00:12:48.682
So impl, open deserialize, comma,
serialize, close parentheses,

00:12:48.892 --> 00:12:52.702
and then for, struct point, and
then a list of fields, so struct

00:12:52.732 --> 00:12:54.981
point, pointy brackets, x, y.

00:12:55.151 --> 00:12:56.461
You just have to list the fields again.

00:12:56.511 --> 00:12:59.901
Because it's a declarative macro,
not a proc macro, it doesn't see

00:12:59.901 --> 00:13:02.701
the body of the struct declaration,
so you have to list the fields.

00:13:02.931 --> 00:13:06.891
I'm very happy that you don't have to
also repeat the field types, but you

00:13:06.891 --> 00:13:08.750
do have to give it the field names.

00:13:09.117 --> 00:13:11.847
<v James Munns>I'm very interested to what
happens if these fall out of sync, and

00:13:11.847 --> 00:13:14.377
I'm sure there are errors that go on.

00:13:14.566 --> 00:13:15.896
<v Amos Wenger>It's actually not that bad.

00:13:15.946 --> 00:13:19.006
A thing that's great about declarative
macros is, sure, it's not DRY.

00:13:19.066 --> 00:13:23.310
Don't repeat yourself, dry in the
acronym, but the error reporting

00:13:23.340 --> 00:13:24.550
is actually pretty solid.

00:13:25.018 --> 00:13:27.719
Rust-Analyzer is able to see through
the invocation and everything.

00:13:27.719 --> 00:13:30.989
It's not as awkward to use
as I thought it might be.

00:13:31.108 --> 00:13:33.608
And, uh, yeah, if those get
out of sync, you do get errors.

00:13:33.608 --> 00:13:36.238
If you specify a field that doesn't
exist, it's going to be like, "Well, point

00:13:36.248 --> 00:13:38.174
doesn't have a field called ' blah'."

00:13:38.453 --> 00:13:40.177
And if it's missing one, it's
going to be missing field.

00:13:40.187 --> 00:13:42.803
Like when you do a struct literal
and you're missing a field, it's just

00:13:42.803 --> 00:13:43.946
going to say, you're missing field.

00:13:43.988 --> 00:13:45.511
It's actually not that bad in practice.

00:13:45.551 --> 00:13:47.821
I just don't like that you
need to repeat yourself.

00:13:48.191 --> 00:13:50.461
So that's why I was thinking
about something more like Codegen.

00:13:50.481 --> 00:13:53.621
That's actually my last slide, but I was
thinking about code generation because

00:13:53.621 --> 00:13:57.101
now that I have those declarative macros,
I could just have like a separate schema

00:13:57.151 --> 00:14:00.692
files that would generate both the struct
definition and also the macro invocation

00:14:00.712 --> 00:14:02.452
or even just the trait implementations.

00:14:02.744 --> 00:14:04.744
<v James Munns>Yeah, I've looked
at that for postcard as well.

00:14:04.883 --> 00:14:07.922
I think that's sort of the ultimate
aim, or not ultimate aim, but like...

00:14:07.972 --> 00:14:10.432
it's the last step you have to hit,
and it's one of those things where you

00:14:10.432 --> 00:14:14.462
end up with something like protoc from
protobufs, where you just have a schema

00:14:14.462 --> 00:14:16.952
file and you just do Codegen from them.

00:14:17.332 --> 00:14:22.011
And either for flexibility reasons or
for one of the things for postcard is

00:14:22.011 --> 00:14:23.541
to be able to support other languages...

00:14:23.541 --> 00:14:27.981
'cause serde is very nice, but it's
very Rust, which means if you want

00:14:27.981 --> 00:14:32.310
to generate decoding and encoding
libraries for another language, the

00:14:32.320 --> 00:14:35.340
proc macro is probably not going
to help you very specifically.

00:14:35.550 --> 00:14:35.790
<v Amos Wenger>Yeah.

00:14:35.800 --> 00:14:40.470
There's a crate called `schemars`
which supports the same annotation that

00:14:40.470 --> 00:14:45.258
serde does, and you can generate JSON
schema definition files so that you get

00:14:45.278 --> 00:14:49.568
autocomplete and editors and whatnot
and that's great, but it's all the hack.

00:14:49.588 --> 00:14:52.578
It's all like serde was the first
big thing that took off, it happened

00:14:52.588 --> 00:14:55.468
after rustc_serialize, which
now you can see some traces of.

00:14:55.468 --> 00:14:56.388
They're like, "Don't use that.

00:14:56.398 --> 00:14:57.078
That was early on.

00:14:57.078 --> 00:14:58.109
We deprecated it.

00:14:58.119 --> 00:14:59.649
We removed it from the standard library.

00:14:59.649 --> 00:15:00.219
Don't- don't look at it."

00:15:00.564 --> 00:15:04.029
But yeah, serde is the standard,
everyone's adopted it I

00:15:04.029 --> 00:15:06.106
think you don't see a lot of
experimentation outside of that.

00:15:06.166 --> 00:15:09.760
There's like only the zero copy frameworks
because they really don't have a choice.

00:15:09.852 --> 00:15:11.942
The zero copy serialization
and deserialization cannot

00:15:11.942 --> 00:15:13.292
just use the serde traits.

00:15:13.292 --> 00:15:15.832
But apart from that, everyone
else is just like kind of stuck

00:15:15.832 --> 00:15:17.642
with the serde interface, which
is a blessing and a curse.

00:15:17.733 --> 00:15:19.003
So back to merde.

00:15:19.003 --> 00:15:20.807
This- again, should be looking at slides.

00:15:20.807 --> 00:15:21.077
I'm sorry.

00:15:21.077 --> 00:15:22.277
This is going to be a slides-heavy
one: go to sdr-podcast.com/episodes

00:15:25.111 --> 00:15:26.121
to look at the slides.

00:15:26.661 --> 00:15:30.961
You can derive, deserialize and serialize
for structs that are fully owned.

00:15:30.961 --> 00:15:33.211
So this struct doesn't have a
lifetime parameter, but even if

00:15:33.231 --> 00:15:35.981
you do, that's kind of the thing,
copy-on-write all the things.

00:15:36.541 --> 00:15:40.780
If you have some `CowStr` fields, if
you have regular `CowStr` lifetime

00:15:40.814 --> 00:15:44.223
fields, if you have whatever
things might borrow from the input.

00:15:44.223 --> 00:15:47.963
There's only one lifetime
allowed as opposed to serde,

00:15:47.963 --> 00:15:49.013
which has more flexibility.

00:15:49.023 --> 00:15:51.083
You can have different line time
and specify which one's actually

00:15:51.083 --> 00:15:52.228
borrowing from the input here.

00:15:52.338 --> 00:15:53.778
You can only have zero or one.

00:15:53.778 --> 00:15:58.551
And if you have one, then in the
invocation of the derive macro, you

00:15:58.551 --> 00:16:00.101
just add the lifetime parameter.

00:16:00.101 --> 00:16:03.957
So it's struct name, angle
brackets, single quote s

00:16:04.117 --> 00:16:05.084
and then the list of fields.

00:16:05.192 --> 00:16:06.682
<v James Munns>Do you support generics too?

00:16:06.752 --> 00:16:08.332
If that's the next slide,
just go to the next slide.

00:16:08.332 --> 00:16:12.373
But I run into this with Postcard-RPC
has a macro where I wanted to

00:16:12.373 --> 00:16:14.388
accept lifetimes for borrowed types.

00:16:14.398 --> 00:16:19.088
It's not so far from this, but trying
to support both lifetimes and generics

00:16:19.088 --> 00:16:24.708
between the angle brackets in a macro by
example is challenging because I couldn't

00:16:24.708 --> 00:16:26.506
figure out how to get them separate.

00:16:26.506 --> 00:16:29.276
Because for what I was doing specifically
I needed to have the lifetimes separate,

00:16:29.386 --> 00:16:34.046
I couldn't just have a token tree of all
of the characters, because when I used

00:16:34.046 --> 00:16:36.206
them in different positions I needed
to put the generics in one place and I

00:16:36.206 --> 00:16:37.436
needed to put the lifetimes in another

00:16:37.436 --> 00:16:40.248
So I'm wondering if you solved that, or
if you just said, "Not my problem yet."

00:16:40.264 --> 00:16:41.094
<v Amos Wenger>I have not.

00:16:41.647 --> 00:16:43.577
No, don't have any generic types.

00:16:43.577 --> 00:16:46.582
Like I said, it's in production
on my website because I was tired

00:16:46.582 --> 00:16:47.642
of waiting for things to build.

00:16:47.642 --> 00:16:50.952
So I moved everything to merde and
a lot of the iterations are like me

00:16:50.952 --> 00:16:54.772
running into the next step of: Oh,
for this scenario, it doesn't work.

00:16:54.772 --> 00:16:55.982
So I need to change the design.

00:16:56.232 --> 00:16:59.192
But no, I haven't done
generic type parameters yet.

00:16:59.242 --> 00:17:01.392
Only lifetime parameters
and only one of them.

00:17:01.461 --> 00:17:05.568
So what do you do: you want to turn that
into name static and how do you do that?

00:17:05.578 --> 00:17:08.408
Usually I don't know what people
do, they just do to owned, they

00:17:08.408 --> 00:17:09.098
implement to owned manually?

00:17:09.098 --> 00:17:09.198
Well

00:17:10.544 --> 00:17:15.217
...
you can derive, kind of, it's not
really a derive macro, but it implements

00:17:15.227 --> 00:17:21.443
into static for you, which has a tag
parameter called Output, which is

00:17:21.443 --> 00:17:22.863
constrained to have the static lifetime.

00:17:22.893 --> 00:17:25.503
And it's a very weird trait
because Output is supposed to

00:17:25.503 --> 00:17:28.024
be the same as self, but static.

00:17:28.072 --> 00:17:31.842
This is the first merde presentation
on this podcast, and maybe not the

00:17:31.842 --> 00:17:35.462
last, because there's a whole bunch-
there's a with lifetime trait so that I

00:17:35.462 --> 00:17:40.842
can define a deserialized owned trait,
and like map from any lifetime to that

00:17:40.852 --> 00:17:42.272
type, but with the static lifetime.

00:17:42.542 --> 00:17:43.392
I didn't figure it out.

00:17:43.392 --> 00:17:45.252
Someone figured it out
for me on social media.

00:17:45.252 --> 00:17:48.212
I asked the question, "Hey, is
that even possible in Rust?"

00:17:48.232 --> 00:17:50.732
And I got 80 percent questions like,
"Why are you trying to do that?

00:17:51.202 --> 00:17:52.142
Why don't you know Rust?

00:17:52.189 --> 00:17:54.479
You clearly, you should know
that it's not possible in Rust."

00:17:54.719 --> 00:17:57.833
And then there's like, some
person from somewhere, they've

00:17:57.988 --> 00:18:00.208
following me forever and they're
like, "I think I found something."

00:18:00.208 --> 00:18:01.798
And you're like, "Oh wow,
that's dirty, but it works."

00:18:01.833 --> 00:18:03.707
This is a teaser for a future episode.

00:18:04.372 --> 00:18:07.212
<v James Munns>Have you ever seen
Manish's three part blog post

00:18:07.212 --> 00:18:11.202
series on zero-copy, yoke, and I
forget what the third one is, but-

00:18:11.257 --> 00:18:14.307
<v Amos Wenger>I have an open issue
for yoke support, because it's fun.

00:18:14.337 --> 00:18:14.737
Yeah.

00:18:14.842 --> 00:18:16.362
<v James Munns>Yeah, I was gonna
say, it's the exact same thing

00:18:16.362 --> 00:18:20.790
where you're trying to support
copy-on-write types or really...

00:18:20.840 --> 00:18:25.288
instead of just taking the slice
from a cow input, kind of taking

00:18:25.288 --> 00:18:27.658
the cows, or doing a clone of
the cow and things like that.

00:18:27.668 --> 00:18:31.561
It's a tricky lifetime problem
because all of this is geared towards

00:18:31.561 --> 00:18:34.617
like what the input is, but if
the inputs a borrow of the cow...

00:18:34.617 --> 00:18:35.204
<v Amos Wenger>Yeah, yeah...

00:18:35.289 --> 00:18:36.869
<v James Munns>Sub lifetimes
and stuff like that.

00:18:36.962 --> 00:18:39.919
<v Amos Wenger>Okay, yoke is super
interesting because it's a middle ground

00:18:39.919 --> 00:18:44.089
between: We're borrowing from an input
and so we can only exist as long as the

00:18:44.099 --> 00:18:48.879
input exists and you can only call deeper
you can call to a lot of sub functions.

00:18:48.879 --> 00:18:52.109
We can never return anything
tied to that input or...

00:18:52.126 --> 00:18:52.886
it's complicated.

00:18:52.966 --> 00:18:57.346
Or you copy everything to the heap,
on compact type like compact string

00:18:57.686 --> 00:19:01.156
and yoke is like: no, if you move
the source along with everything that

00:19:01.206 --> 00:19:03.356
borrows from it, then that's fine.

00:19:03.736 --> 00:19:06.216
But the Rust type system doesn't
really let you encode that.

00:19:06.216 --> 00:19:09.426
So we need to use a bunch of
unsafe code and like expose a crate

00:19:09.436 --> 00:19:11.976
that lets you do crimes, but in
a sort of controlled environment.

00:19:11.986 --> 00:19:12.846
It's like rubicon.

00:19:12.846 --> 00:19:13.711
It's, it's really-

00:19:13.711 --> 00:19:14.610
<v James Munns>Bounded crimes.

00:19:14.610 --> 00:19:15.050
Yeah.

00:19:15.074 --> 00:19:18.704
<v Amos Wenger>I want to bring yoke
support into merde but it's not done yet.

00:19:18.763 --> 00:19:23.605
The hazard here, I guess, is imagine
deserializing a two gigabyte document, and

00:19:23.605 --> 00:19:25.135
you're borrowing a tiny string from it.

00:19:25.165 --> 00:19:27.825
Yeah, you're lugging around
the entire source document.

00:19:28.023 --> 00:19:29.185
JavaScript has the same problem.

00:19:29.395 --> 00:19:35.938
In browsers, strings are transparently
borrows or copies of things, and sometimes

00:19:35.938 --> 00:19:41.872
they can retain, like, act as garbage
collector roots for very large datasets.

00:19:41.992 --> 00:19:45.887
And that's what memory usage
is caused by sometimes.

00:19:45.997 --> 00:19:46.167
<v James Munns>Yeah.

00:19:46.167 --> 00:19:50.427
The Bytes crate in tokio as well has this
problem where it tries to do that kind

00:19:50.427 --> 00:19:52.197
of like copy-on-write sort of behavior.

00:19:52.197 --> 00:19:55.087
But if you have a whole one megabyte
buffer and you're borrowing three

00:19:55.087 --> 00:19:58.941
characters from it: surprise, you get
to keep the whole buffer live for as

00:19:58.941 --> 00:20:00.871
long as that little borrow is alive for.

00:20:00.981 --> 00:20:04.371
<v Amos Wenger>And that's really hard
to find out because when you're

00:20:04.391 --> 00:20:06.761
designing, when you're writing the
code, you don't know what the input's

00:20:06.761 --> 00:20:08.151
going to look like necessarily.

00:20:08.311 --> 00:20:10.951
So it could be that the design
was sound, but then later on the

00:20:10.961 --> 00:20:13.961
shape of the input changed and now
suddenly it's using a lot of memory.

00:20:14.311 --> 00:20:17.941
So it's a good reminder that our
instincts are usually wrong, and

00:20:17.941 --> 00:20:20.311
it's better to just go and measure
things with the right tooling.

00:20:20.311 --> 00:20:23.481
Just like serde, merde
has a Deserialize trait.

00:20:23.641 --> 00:20:26.071
It looks a little bit funny,
but James, don't say anything.

00:20:26.127 --> 00:20:30.190
It takes a lifetime parameter
called s for source.

00:20:30.480 --> 00:20:33.990
It is sized for unclear
reasons, I forget why.

00:20:34.317 --> 00:20:40.936
It takes a mutable reference to a
deserializer because what the deserializer

00:20:41.001 --> 00:20:44.651
does is simply yield a bunch of events.

00:20:44.760 --> 00:20:46.610
So that's the big difference from serde:

00:20:46.610 --> 00:20:52.791
serde has methods per data types in the
visitor interface or something, in the

00:20:52.791 --> 00:20:53.901
deserializer interface, I don't know.

00:20:54.011 --> 00:20:57.827
It has one method per type, which is
great because it's static dispatch.

00:20:58.056 --> 00:21:01.078
The compiler knows exactly the path
of the code it's going to take, it can

00:21:01.078 --> 00:21:02.688
inline everything, it's very, very fast.

00:21:02.978 --> 00:21:06.988
I just like enums for some reason,
so we have a big event enum

00:21:07.028 --> 00:21:08.428
that has one lifetime parameter.

00:21:08.448 --> 00:21:14.215
It kind of mirrors the value enum,
but it also has map start and map

00:21:14.265 --> 00:21:16.123
end, array start and array end.

00:21:16.143 --> 00:21:20.237
Some formats are self descriptive,
and so you get a hint as to the

00:21:20.237 --> 00:21:23.287
size, the number of elements in the
map, or the number of elements in

00:21:23.287 --> 00:21:25.004
the array, and that's your queue.

00:21:25.134 --> 00:21:27.398
Some are also self descriptive?

00:21:27.398 --> 00:21:30.488
I always think JSON is technically
self descriptive, but you don't know

00:21:30.488 --> 00:21:31.468
how long an array is going to be.

00:21:31.478 --> 00:21:34.102
You just know when it starts and
when it ends, and you have the

00:21:34.102 --> 00:21:36.722
difference between I64, U64, F64.

00:21:37.232 --> 00:21:40.822
I haven't shown an implementation
of deserialize, but basically,

00:21:40.822 --> 00:21:43.062
yeah,  you ask for the next
event and you look at what it is.

00:21:43.062 --> 00:21:46.205
And if it's not what you expected, then
you return, which another thing, and

00:21:46.205 --> 00:21:49.435
it's not really covered in the slides is:
what do you do if you get the next event?

00:21:49.805 --> 00:21:52.205
And it turns out you shouldn't have?

00:21:52.205 --> 00:21:53.385
There's no such thing as peek.

00:21:53.385 --> 00:21:54.435
There is only next.

00:21:54.655 --> 00:21:59.095
So you can't put back an event in there,
but then what happens instead is that

00:21:59.095 --> 00:22:02.415
there's a "deserialize starting with",
and you can pass the first event.

00:22:02.605 --> 00:22:05.055
So you can kind of inject an
event back into the stream, which

00:22:05.055 --> 00:22:06.702
is needed in some scenarios.

00:22:06.702 --> 00:22:07.409
It's kind of dirty.

00:22:07.409 --> 00:22:09.245
I don't really like it, but it works.

00:22:09.245 --> 00:22:10.475
<v James Munns>If you're taking
a stream of events, how do you

00:22:10.475 --> 00:22:13.155
handle out of order struct field?

00:22:13.155 --> 00:22:17.235
So if Struct says that it's A, B, and
C, but because JavaScript or JSON or

00:22:17.235 --> 00:22:21.465
whatever has reordered the fields,
they might send CAB on the wire.

00:22:21.555 --> 00:22:23.660
Does that still work within the map start?

00:22:23.660 --> 00:22:25.875
Is it just like collecting
all of the items and you go,

00:22:25.875 --> 00:22:27.075
"Ah, that's still in the list.

00:22:27.075 --> 00:22:27.555
It's fine."

00:22:27.800 --> 00:22:31.240
<v Amos Wenger>Basically what it does
is it just asks for events and then

00:22:31.240 --> 00:22:36.860
you look at the key names and it has
in scope a bunch of bindings that

00:22:36.860 --> 00:22:38.620
are option the type of the field.

00:22:38.780 --> 00:22:41.720
And that's an interesting part that I
didn't really show, but there's a way

00:22:41.720 --> 00:22:44.621
to- without knowing the type of a field.

00:22:44.631 --> 00:22:48.671
If you just have the struct name and
the field name, you can declare a local

00:22:48.711 --> 00:22:51.101
of type option the type of the field.

00:22:51.266 --> 00:22:53.417
And that's what the
declarative macro relies on.

00:22:53.557 --> 00:22:55.487
So you just have a bunch of
fields and you just assign them.

00:22:55.687 --> 00:22:58.392
And then at the end, if they're
none, then you're like, "Oh,

00:22:58.392 --> 00:22:59.412
I guess we missed that field."

00:22:59.578 --> 00:23:01.248
And you also know if there's
duplicate fields, you can

00:23:01.248 --> 00:23:02.208
decide what to do about that.

00:23:02.208 --> 00:23:03.308
There's a bunch of things you can do.

00:23:03.585 --> 00:23:07.265
This brings us to our next slide,
which is how do you specify without

00:23:07.285 --> 00:23:10.645
the flexibility of a field level
annotations like you have in serde,

00:23:10.755 --> 00:23:12.375
how do you specify if you should...

00:23:12.655 --> 00:23:16.025
well, deny unknown fields, but that's
container level annotation, or whether

00:23:16.025 --> 00:23:19.948
you should allow- like, just fall back
to a default value for some field if

00:23:19.948 --> 00:23:21.608
it's absent or something like that.

00:23:21.923 --> 00:23:24.773
<v James Munns>You've made up a crate on the
internet, just so it can have opinions.

00:23:25.613 --> 00:23:26.853
<v Amos Wenger>I truly did.

00:23:27.551 --> 00:23:33.197
There's a DeserOpinions trait that
has a default implementation in merde.

00:23:33.647 --> 00:23:36.047
And I gave the trait definition here.

00:23:36.077 --> 00:23:40.227
It has a deny unknown fields
function, which returns a Boolean,

00:23:40.397 --> 00:23:41.567
pretty straightforward one.

00:23:41.617 --> 00:23:43.007
There's a map key name function.

00:23:43.037 --> 00:23:46.517
It gives you the name of the key
for a map, and then you can either

00:23:46.517 --> 00:23:49.827
return it itself because it takes
a `CowStr` and returns a `CowStr`,

00:23:49.847 --> 00:23:51.057
so you don't need to copy anything.

00:23:51.175 --> 00:23:52.575
Or you can map it to something else.

00:23:52.602 --> 00:23:53.862
And I have a little example here.

00:23:54.062 --> 00:23:57.658
On my website, I tend to deploy before
an article is ready, so it's in draft

00:23:57.658 --> 00:24:00.758
mode, and sometimes I want to get others
to proofread it, and so it has a draft

00:24:00.758 --> 00:24:04.088
code in the front matter, the YAML front
matter at the beginning of the markdown.

00:24:04.608 --> 00:24:05.928
And it used to be called...

00:24:06.168 --> 00:24:09.878
well, actually, the Rust field is called
draft underscore code in snake case.

00:24:09.952 --> 00:24:12.757
But in Markdown, I want to write it
in kebab case, so it's draft-code.

00:24:14.395 --> 00:24:18.415
The third method currently in
DeserOpinions is "default field value".

00:24:18.759 --> 00:24:23.541
That one made people angry on the
internet, because it takes   a key, this

00:24:23.562 --> 00:24:25.122
time it's borrowed, you can't mess with.

00:24:25.242 --> 00:24:27.831
It happens after "map key name",
so first you get to translate the

00:24:27.831 --> 00:24:30.666
key name to something else, and
this is less efficient than serde.

00:24:30.841 --> 00:24:34.041
It's also more flexible in a way?

00:24:34.475 --> 00:24:37.485
You could look up that key name
somewhere if you wanted to.

00:24:37.485 --> 00:24:41.655
You could have a generic change
all snake case things to kebab case

00:24:41.665 --> 00:24:43.745
things or to camel case things.

00:24:44.092 --> 00:24:45.275
So it's a compromise.

00:24:45.275 --> 00:24:48.235
Something costs at runtime, but
it's less code also in the binary.

00:24:48.465 --> 00:24:51.575
Honestly, in the context of web
application servers, it's totally fine

00:24:51.585 --> 00:24:53.195
unless you're Amazon, but I'm not.

00:24:53.255 --> 00:24:53.875
So I'm fine.

00:24:54.209 --> 00:24:57.159
And then the default field value takes
the key as a borrowed string, and then

00:24:57.159 --> 00:24:59.977
it takes a slot, and that's awkward?

00:25:00.127 --> 00:25:05.151
Because, again, we're not in the serde
universe where we can generate things

00:25:05.311 --> 00:25:07.411
precisely based on the type of fields.

00:25:07.451 --> 00:25:12.951
Here, that single function has to
work for every field of any type.

00:25:13.351 --> 00:25:18.508
So the field slots type cannot be
generic over the type of the field.

00:25:18.797 --> 00:25:23.577
Which brings us to a problem:
what happens if you pass like

00:25:23.587 --> 00:25:27.517
the address of an option I64 and
someone tries to put a string there?

00:25:27.676 --> 00:25:28.456
<v James Munns>What does happen?

00:25:28.773 --> 00:25:29.453
<v Amos Wenger>What does happen?

00:25:29.453 --> 00:25:34.364
Well, what happens is: I have a
vendored version of mini type ID, and

00:25:34.367 --> 00:25:36.041
it just says, it's not the right type.

00:25:36.246 --> 00:25:37.306
But it's not very safe.

00:25:38.092 --> 00:25:38.552
<v James Munns>Okay.

00:25:38.892 --> 00:25:39.362
Okay.

00:25:40.307 --> 00:25:42.854
<v Amos Wenger>But basically
in Rust, there's a type ID

00:25:42.854 --> 00:25:44.575
function in the standard library.

00:25:44.695 --> 00:25:46.315
You can pass it any type.

00:25:46.439 --> 00:25:50.345
Well, as a type parameter,
there's also type ID of.

00:25:50.395 --> 00:25:53.295
I forget the exact name of the function,
but yeah, there's a type ID type.

00:25:53.295 --> 00:25:58.301
You can get some value that is unique
to that type, and you can compare

00:25:58.301 --> 00:26:01.888
those so you can tell if two things
are the same type or not, but that

00:26:01.888 --> 00:26:03.548
doesn't work if you have lifetimes.

00:26:03.853 --> 00:26:07.573
Which we do because we borrow things so
that's why you need a separate thing.

00:26:08.113 --> 00:26:11.273
And that's why it doesn't use the standard
library one it's the thing I stole from

00:26:11.283 --> 00:26:14.283
somewhere, but it's in the comments where
I stole it from so I think it's okay

00:26:14.443 --> 00:26:16.033
license-wise, I don't know don't sue me.

00:26:16.161 --> 00:26:17.571
Just Chat GPTed that shit, bro.

00:26:18.066 --> 00:26:18.506
<v James Munns>Uh oh.

00:26:19.726 --> 00:26:20.416
Hate that...

00:26:22.468 --> 00:26:25.798
<v Amos Wenger>Oh, and then how do you
use opinions in the derive, like DSL

00:26:25.818 --> 00:26:30.488
for merde, after imple deserialized,
four structs, blah, with a list of

00:26:30.488 --> 00:26:34.358
fields, you can do via, and then
the name or the opinion- the type

00:26:34.378 --> 00:26:36.218
that implements DeserOpinions.

00:26:36.254 --> 00:26:40.704
So I thought that was interesting:
all of that could be done via a more

00:26:40.749 --> 00:26:45.279
complicated DSL inside the declarative
macro, but the declarative macros

00:26:45.279 --> 00:26:47.099
in merde are already pretty bad.

00:26:47.099 --> 00:26:50.294
They're already  at the
limit of what I can read.

00:26:50.644 --> 00:26:53.454
I got LLMs to generate bits of it,
and I had to step back like, "Wait,

00:26:53.494 --> 00:26:54.784
wait, I'm not sure what happens here."

00:26:54.784 --> 00:26:57.404
There's several level of expansions.

00:26:57.694 --> 00:26:59.869
There's a lot of
different syntax variants.

00:27:00.007 --> 00:27:02.741
I figured out how to simplify them
in the last release, but it's really

00:27:02.747 --> 00:27:06.127
kind of pushing the boundaries of what
you should be doing with declarative

00:27:06.137 --> 00:27:09.087
macros, which is why I was excited to
see you also doing the declarative macro

00:27:09.087 --> 00:27:11.377
crimes, James, with the whole um...

00:27:11.427 --> 00:27:16.547
generating schemas at compile time and
like mixing const and declarative macros.

00:27:16.547 --> 00:27:16.847
That was fun.

00:27:16.870 --> 00:27:18.438
<v James Munns>I don't think I
picked up on this originally,

00:27:18.438 --> 00:27:19.228
but I think you mentioned it.

00:27:19.238 --> 00:27:21.858
So you have your own Deserialize
and Serialize traits.

00:27:21.868 --> 00:27:25.489
These aren't using the serde
Deserialize, Serialize traits.

00:27:25.986 --> 00:27:27.194
You have your own version of it.

00:27:27.474 --> 00:27:27.844
<v Amos Wenger>That's correct.

00:27:27.844 --> 00:27:30.784
Merde has its own Serialize and
Deserialize traits for a very good reason.

00:27:30.784 --> 00:27:31.364
We'll get to that.

00:27:31.364 --> 00:27:35.064
So let's actually look
at the serialize trait.

00:27:35.484 --> 00:27:37.104
The serialize trait is kind of boring.

00:27:37.344 --> 00:27:42.743
It takes a reference to self and
it takes a mutable reference to a

00:27:42.913 --> 00:27:44.655
serializer and it returns a result.

00:27:44.873 --> 00:27:47.910
The result is the happy path is
the empty tuple and the error path

00:27:47.930 --> 00:27:49.690
is the error from the serializer.

00:27:49.846 --> 00:27:50.576
Pretty boring.

00:27:50.746 --> 00:27:53.026
Again, there's only one
thing that's kind of weird.

00:27:53.316 --> 00:27:54.236
Uh, it's async.

00:27:54.309 --> 00:27:58.329
It's an async fn in trait, which is
something we've gotten since Rust 1.75.

00:27:58.349 --> 00:28:01.509
I know because I'm working on a draft
of an article that I started months

00:28:01.509 --> 00:28:03.949
ago and I'm now updating for release.

00:28:04.052 --> 00:28:04.272
<v James Munns>Yeah.

00:28:04.272 --> 00:28:04.742
Hell yeah.

00:28:04.844 --> 00:28:06.704
<v Amos Wenger>This is the serializer trait.

00:28:06.754 --> 00:28:10.974
Just like in serde, there's serialize
and serializer- one letter difference.

00:28:11.464 --> 00:28:15.380
Serializer has an error associated
type, so that each serializer can

00:28:15.380 --> 00:28:16.480
have different types of errors.

00:28:16.578 --> 00:28:19.999
And then the important function,
I guess there's a bunch of other

00:28:19.999 --> 00:28:23.145
ones that I've hidden from you,
but the important one is "write".

00:28:23.365 --> 00:28:27.059
And it takes a mutable
reference to self and an event.

00:28:27.289 --> 00:28:31.759
So events are used both for deserializing
and for serializing, which means

00:28:31.919 --> 00:28:35.859
you can pipe a deserializer straight
into a serializer and it should work.

00:28:36.242 --> 00:28:39.722
Now we're getting into the interesting
part, which is why is that function async?

00:28:39.876 --> 00:28:43.390
And the answer is, well, sometimes...

00:28:43.620 --> 00:28:44.180
<v James Munns>Oh no...

00:28:44.180 --> 00:28:45.820
that's not the reason I was expecting.

00:28:45.830 --> 00:28:47.390
That was not what I was hoping for!

00:28:48.408 --> 00:28:51.631
<v Amos Wenger>Okay, so sometimes
you serialize deeply nested

00:28:51.721 --> 00:28:53.761
data structures like this one.

00:28:53.941 --> 00:29:00.083
This is example code that generates
100,000 nested JSON arrays.

00:29:00.273 --> 00:29:02.468
So it's an array that contains an array
that contains an array that contains

00:29:02.468 --> 00:29:05.933
an array that contains blah blah
blah- 100,000 layers, an empty array.

00:29:06.683 --> 00:29:09.423
And if you deserialize that
to a serde JSON value, you're

00:29:09.423 --> 00:29:10.293
going to blow the stack.

00:29:10.823 --> 00:29:13.163
Because it's a function that calls a
function that calls a function, it's

00:29:13.173 --> 00:29:18.693
recursive, so it's piling up those stack
frames, and the stack is a space reserved

00:29:18.713 --> 00:29:23.330
for return addresses, arguments, some
locals, and eventually you're going to

00:29:23.340 --> 00:29:28.210
run out of space because every thread
has some fixed amount dedicated to the

00:29:28.220 --> 00:29:32.723
stack, which can range from, I don't
know, on desktops maybe one megabyte

00:29:32.773 --> 00:29:34.243
to eight megabytes or something?

00:29:34.703 --> 00:29:35.503
<v James Munns>There's
a couple of megabytes.

00:29:35.705 --> 00:29:37.335
<v Amos Wenger>If you have a lot of
threads and you know you're not going

00:29:37.335 --> 00:29:40.365
to go over, you can make threads with
less, you can make threads with more.

00:29:40.365 --> 00:29:43.065
Okay, so why is this a big problem
and not just a small problem?

00:29:43.065 --> 00:29:47.630
It's a big problem because if you're
accepting user input, people can send

00:29:47.630 --> 00:29:51.580
you that weird payload, which is just
a hundred thousand nested arrays.

00:29:51.870 --> 00:29:54.060
And if they can do that and
crash your application, then

00:29:54.060 --> 00:29:54.830
that's a problem for you.

00:29:55.000 --> 00:29:56.400
So you have to do either of two things.

00:29:56.400 --> 00:29:59.460
You have to stop them and decide:
okay, I'm not deserializing that.

00:29:59.700 --> 00:30:01.070
Clearly this is a bomb.

00:30:01.070 --> 00:30:01.900
This is like a zip bomb.

00:30:01.900 --> 00:30:02.660
This is malicious.

00:30:02.910 --> 00:30:04.090
I don't want to run out of memory.

00:30:04.090 --> 00:30:05.330
so I'm just protecting that.

00:30:05.530 --> 00:30:08.752
Or you deserialize it properly
without actually blowing the stack.

00:30:08.885 --> 00:30:11.944
<v James Munns>So you also mentioned that
you support deserializing YAML, I believe?

00:30:12.054 --> 00:30:13.144
Or just serializing.

00:30:13.555 --> 00:30:14.645
<v Amos Wenger>Just deserializing.

00:30:14.912 --> 00:30:18.120
<v James Munns>Because YAML has the other
fun thing where it has referential

00:30:18.175 --> 00:30:20.976
support, so you can reference other
things, which I think is also a

00:30:20.976 --> 00:30:22.810
pretty big denial of service vector.

00:30:22.839 --> 00:30:24.003
<v Amos Wenger>I have not tried that.

00:30:24.515 --> 00:30:27.525
I'm using like an event based
YAML parser and then just kind

00:30:27.525 --> 00:30:29.485
of translating it to merde types.

00:30:29.535 --> 00:30:30.375
I have not tried that.

00:30:30.375 --> 00:30:31.025
That's a good point.

00:30:31.250 --> 00:30:33.510
<v James Munns>Because those are the two
attacks, yeah, you get the super nested

00:30:33.520 --> 00:30:37.060
stack overflow and then you get like the
zip bomb where you reference something

00:30:37.060 --> 00:30:39.970
that references 10 other things which
references 10 other things which refere-

00:30:39.970 --> 00:30:42.886
and in some parsers just explodes.

00:30:42.916 --> 00:30:44.701
<v Amos Wenger>So, merde doesn't
actually solve that because it

00:30:44.701 --> 00:30:46.121
solves one half of the problem...

00:30:46.441 --> 00:30:49.229
I used to call it infinite stack,
but that's not really true.

00:30:49.259 --> 00:30:50.854
So I'm calling it metastack now.

00:30:51.184 --> 00:30:52.394
The prior art for this is stacker.

00:30:52.544 --> 00:30:54.227
So stacker is a crate.

00:30:54.228 --> 00:30:58.245
You can inject calls to it at several
points in your program where you think

00:30:58.255 --> 00:31:01.360
you might run out of stack and what it's
going to do is check how much stack you're

00:31:01.360 --> 00:31:05.390
using and if you're using too much it
just grows the stack and by 'just'- this

00:31:05.390 --> 00:31:08.960
is a load bearing 'just'- I should really
have looked into that before today's

00:31:08.960 --> 00:31:11.901
presentation but what I assume it does
is it allocates an entirely different

00:31:11.901 --> 00:31:16.501
stack, copies everything to it and then
changes the stack pointer essentially.

00:31:16.641 --> 00:31:18.909
But now that I'm saying it, it
cannot work that way because there's

00:31:18.959 --> 00:31:20.289
things pointing to the stack.

00:31:20.560 --> 00:31:21.660
So, I'm not sure.

00:31:21.810 --> 00:31:23.130
Maybe it just chains stack?

00:31:23.130 --> 00:31:26.272
Like it allocates another chunk
of the stack and moves there

00:31:26.272 --> 00:31:27.612
and changes the stack pointer...

00:31:27.632 --> 00:31:30.822
but then when you return, it
restores the old stack pointer.

00:31:30.822 --> 00:31:32.442
That must be how it works, right?

00:31:32.442 --> 00:31:33.172
There's no other way.

00:31:33.422 --> 00:31:35.372
<v James Munns>This is one of those
cursed things where I'm used

00:31:35.382 --> 00:31:38.890
to the embedded level of cursed
where you can't extend the stack-

00:31:38.896 --> 00:31:39.221
<v Amos Wenger>Yeah.

00:31:39.253 --> 00:31:40.588
<v James Munns>The stack
is statically known.

00:31:40.598 --> 00:31:41.978
And when you run out, you're out.

00:31:42.076 --> 00:31:43.631
<v Amos Wenger>You barely
even have a heap allocator.

00:31:43.631 --> 00:31:43.851
Yeah.

00:31:43.851 --> 00:31:44.141
Yeah.

00:31:44.189 --> 00:31:44.939
<v James Munns>Yeah, yeah.

00:31:45.129 --> 00:31:48.874
So it's one of those things where I know
how it works on like a very cursed bare

00:31:48.874 --> 00:31:53.654
metal kernel or operating system level,
but no idea how user space handles this

00:31:53.654 --> 00:31:56.694
kind of thing or chaining stacks or
resizing the stack and stuff like that.

00:31:56.694 --> 00:31:56.834
<v Amos Wenger>I don't know.

00:31:56.835 --> 00:32:00.054
Stacker worries me because now that I
think about it- yeah, it cannot move

00:32:00.064 --> 00:32:01.544
things that are already on the stack.

00:32:01.984 --> 00:32:04.174
I'm pretty sure, because that would
invalidate a bunch of pointers.

00:32:04.274 --> 00:32:05.724
In C# you can do things like that.

00:32:05.764 --> 00:32:08.284
It has a moving, compacting
garbage collector.

00:32:08.553 --> 00:32:11.375
It's aware of all the
things that point anywhere.

00:32:11.545 --> 00:32:14.585
And so it's able to update
pointers, or actually references,

00:32:14.815 --> 00:32:16.115
when it moves objects in memory.

00:32:16.115 --> 00:32:21.005
But there's no such thing in the
C, C++ Rust cinematic universe.

00:32:21.017 --> 00:32:22.587
If something is there,
it's not going to move.

00:32:22.587 --> 00:32:25.958
That's the whole reason we
have pin and friends, another

00:32:26.088 --> 00:32:27.228
complicated topic in Rust.

00:32:27.668 --> 00:32:29.528
So i'm pretty sure this is how it works.

00:32:29.528 --> 00:32:32.178
It's like: okay, this stack is
full, we'll get another and then

00:32:32.178 --> 00:32:33.418
just stack things onto there.

00:32:33.508 --> 00:32:36.598
And this is kind of what I do in
merde except instead of changing

00:32:36.598 --> 00:32:40.860
the stack pointer and things which
is scary, it's terrifying because

00:32:40.860 --> 00:32:43.000
what if some other language unwinds?

00:32:43.272 --> 00:32:44.652
I don't know how that works.

00:32:44.712 --> 00:32:47.312
Maybe it works well because you
have return address in the stack?

00:32:48.172 --> 00:32:48.922
I don't know!

00:32:48.992 --> 00:32:50.512
I'm not sure, now I'm interested.

00:32:50.849 --> 00:32:54.039
But basically, instead of doing all
that, you can just have an async

00:32:54.049 --> 00:32:55.769
function, which is a state machine.

00:32:55.841 --> 00:32:59.434
And what you get to do with an
async function is return all

00:32:59.434 --> 00:33:00.604
the way back to the runtime.

00:33:00.614 --> 00:33:03.923
You've been stacking these calls to
async functions and you get to yield,

00:33:03.923 --> 00:33:07.453
you get the return poll pending, which
doesn't mean you're done, but it just

00:33:07.453 --> 00:33:09.573
means you're waiting for something.

00:33:09.653 --> 00:33:12.543
And in this case, you're waiting for more
stack, which is never going to happen.

00:33:12.653 --> 00:33:15.889
It's not like you go all the way back to
the runtime, I mean, I guess it could...

00:33:16.079 --> 00:33:19.369
what I did instead is that: when
you're about to run out of stack, it

00:33:19.369 --> 00:33:23.309
creates a next future, it stores that
in a global, and then it yields all

00:33:23.309 --> 00:33:26.072
the way to the runtime, and then the
runtime is like: oh, it wasn't ready.

00:33:26.273 --> 00:33:30.133
I polled the deserialize function,
the future, and it didn't return

00:33:30.133 --> 00:33:31.643
poll ready, it returned poll pending.

00:33:31.883 --> 00:33:32.953
Let me check the global.

00:33:32.993 --> 00:33:34.773
Ah, sure, okay, there's
more work to be done.

00:33:34.913 --> 00:33:36.003
Let's do that again.

00:33:36.503 --> 00:33:38.783
Let's do that next work on
the stack we already have.

00:33:38.919 --> 00:33:39.829
And so on and so forth.

00:33:39.859 --> 00:33:44.299
If that one returns poll pending, it's
pushed onto a queue and then we'll run the

00:33:44.299 --> 00:33:48.409
next future and so on and so forth until
we're done deserializing and then we pop

00:33:48.419 --> 00:33:51.070
from the queue and then we resolve back.

00:33:51.120 --> 00:33:53.700
You should see my hands go around,
which we're not going to be in the edit.

00:33:55.055 --> 00:33:57.905
<v James Munns>So, you're heap
allocating new futures cause

00:33:58.185 --> 00:34:02.096
essentially futures are just an
enum of all the state of essentially

00:34:02.096 --> 00:34:04.286
each of the await points at that.

00:34:04.526 --> 00:34:06.716
So what you're doing is you're
creating a whole new future.

00:34:07.091 --> 00:34:11.341
And then sort of like, daisy chaining all
the futures together to get it there...

00:34:11.391 --> 00:34:12.411
okay, that's interesting.

00:34:12.461 --> 00:34:15.131
<v Amos Wenger>The weird part is that
initially none of this is async.

00:34:15.141 --> 00:34:16.981
We're not actually doing
async I/O at this point.

00:34:16.991 --> 00:34:19.531
We're just trying to deserialize
something synchronously.

00:34:19.791 --> 00:34:22.141
So on the outside,
everything is synchronous.

00:34:22.161 --> 00:34:24.391
The public API for all
this is synchronous.

00:34:24.411 --> 00:34:26.041
You don't have tokio going
on, you have nothing.

00:34:26.842 --> 00:34:30.396
In there at some point in the
internals of merde, it sets up a

00:34:30.406 --> 00:34:35.236
dummy waker that doesn't actually
allow registering for being woken up.

00:34:35.416 --> 00:34:36.886
It doesn't actually have timers.

00:34:36.886 --> 00:34:38.556
It's not an actual reactor.

00:34:38.556 --> 00:34:39.466
It's entirely made up.

00:34:39.556 --> 00:34:43.646
The only purpose is so that we have
something to pass to the poll function.

00:34:43.646 --> 00:34:45.965
That's part of the future
trait in the standard library.

00:34:46.262 --> 00:34:48.572
So that we can yeah, call all that.

00:34:48.592 --> 00:34:51.452
And if it returns pending, we
know that we needed more stack.

00:34:52.002 --> 00:34:55.329
And we just run the next
future and so on and so forth.

00:34:55.616 --> 00:34:56.006
<v James Munns>Interesting.

00:34:56.006 --> 00:34:59.506
So it's sort of like the gen await
thing, where you're really wanting

00:34:59.506 --> 00:35:05.168
generators, you're wanting like pausable
iteration, but you're wrapping the stable

00:35:05.178 --> 00:35:09.198
interface, which is async, to sort of
get something generator ish out of it.

00:35:09.638 --> 00:35:14.109
<v Amos Wenger>And that's also why the
deserialized trait is taking a mutable

00:35:14.129 --> 00:35:17.059
reference to a deserializer and not
like taking ownership of it because

00:35:17.199 --> 00:35:19.329
you're lending it to the next future.

00:35:19.894 --> 00:35:23.714
And then when it finishes, it completes,
it's going back to the previous future.

00:35:23.874 --> 00:35:25.434
It really needs to be able to do that.

00:35:25.444 --> 00:35:27.874
Like you said, daisy chaining, I
think, is a good term for that.

00:35:28.226 --> 00:35:32.268
The result is that you can actually
deserialize something like a hundred

00:35:32.268 --> 00:35:37.518
thousand nested arrays to the merde
value type, the dynamic- we don't

00:35:37.518 --> 00:35:40.505
know what it is, so some of it's going
to have to be heap allocated because

00:35:40.505 --> 00:35:43.845
you cannot have infinitely recursive
types that would be infinitely large.

00:35:44.495 --> 00:35:47.935
So, if you look at the value enum,
you can map an array, they are heap

00:35:47.945 --> 00:35:49.635
allocated, because you can't do otherwise.

00:35:49.705 --> 00:35:51.705
How do we explain that to
someone who's never thought

00:35:51.705 --> 00:35:53.235
about infinitely recursive types?

00:35:53.865 --> 00:35:56.045
Like a type that contains
itself is infinitely large.

00:35:56.109 --> 00:35:56.879
<v James Munns>How do you explain it?

00:35:57.139 --> 00:36:01.360
When you have that level of indirection,
like you can have a vec that contains vecs

00:36:01.360 --> 00:36:02.920
that contains vecs that contains vecs.

00:36:02.993 --> 00:36:06.685
You run into this where the compiler will
actually warn you if it realizes this

00:36:06.685 --> 00:36:10.195
has happened and it'll tell you: you need
some level of indirections either through

00:36:10.195 --> 00:36:13.105
references or boxing or things like that.

00:36:13.267 --> 00:36:15.757
<v Amos Wenger>I guess imagine
an enum with like two variants.

00:36:16.147 --> 00:36:19.347
One of them doesn't have any
data associated to it, and

00:36:19.347 --> 00:36:21.302
the other one has itself.

00:36:21.777 --> 00:36:26.367
Then the size of the entire thing is
either, it has to be at least one byte

00:36:26.397 --> 00:36:30.067
because you need to have the discriminant
between one variant and the other, and

00:36:30.067 --> 00:36:34.187
then it also has to have the size of
the largest of the variants, which is

00:36:34.187 --> 00:36:36.567
itself, so it's itself plus one byte...

00:36:37.307 --> 00:36:39.477
plus one byte plus one byte plus
one- because it keeps going.

00:36:39.477 --> 00:36:41.597
If you keep computing the
size, it just recurses and that

00:36:41.697 --> 00:36:42.877
you get infinitely size type.

00:36:42.897 --> 00:36:43.467
So that doesn't work.

00:36:43.517 --> 00:36:46.300
But in this case, it does work
because it goes through the heap.

00:36:46.450 --> 00:36:48.730
So you can recurse, as
long as you have memory.

00:36:48.776 --> 00:36:52.671
Which is why I renamed it from
infinite stack to metastack, cause you

00:36:52.671 --> 00:36:53.991
don't actually have infinite memory.

00:36:54.201 --> 00:36:54.951
However!

00:36:55.255 --> 00:36:57.425
James, can you foresee
any problems with this?

00:36:57.478 --> 00:36:59.558
<v James Munns>You're not running out
of stack, but if you're putting an

00:36:59.568 --> 00:37:02.608
infinite number of things on the heap,
eventually you'll have heap exhaustion.

00:37:02.743 --> 00:37:03.543
<v Amos Wenger>That is an issue.

00:37:03.623 --> 00:37:05.333
Yes, that's a problem with current merde.

00:37:05.333 --> 00:37:08.703
This is more dangerous than serde
actually, because running out of

00:37:08.713 --> 00:37:10.683
stack on a desktop OS is fine.

00:37:11.323 --> 00:37:12.153
I think it's just going to restart.

00:37:12.153 --> 00:37:14.974
But running out of memory, it's going
to make the whole machine  start

00:37:14.974 --> 00:37:16.684
swapping and become very slow.

00:37:16.873 --> 00:37:18.749
<v James Munns>Depends on how you
set your ops things like that.

00:37:19.132 --> 00:37:19.430
<v Amos Wenger>Yeah.

00:37:19.430 --> 00:37:20.708
you better set your quotas, right?

00:37:20.738 --> 00:37:21.128
Because-

00:37:21.128 --> 00:37:22.508
<v James Munns>What your memory limit is.

00:37:22.518 --> 00:37:25.116
If you do force the whole system
into swap because it's unbounded,

00:37:25.126 --> 00:37:29.366
then yeah, you'll start swapping
into whatever  your cheap host is.

00:37:29.416 --> 00:37:31.416
<v Amos Wenger>But there's
another issue with that.

00:37:31.481 --> 00:37:34.181
I'm really curious if you can guess
what the next slide is because

00:37:34.181 --> 00:37:35.371
you can't see the presenter view.

00:37:35.391 --> 00:37:36.991
I see what the next
slide is, but you can't.

00:37:37.001 --> 00:37:37.841
Can you guess?

00:37:38.321 --> 00:37:40.271
So you deserialize, you build that value?

00:37:40.271 --> 00:37:44.651
That's like very very deeply
nested and then at some point you

00:37:44.651 --> 00:37:45.791
don't need that value anymore.

00:37:46.101 --> 00:37:48.062
<v James Munns>Oh, does
drop become incredibly...

00:37:48.126 --> 00:37:49.626
<v Amos Wenger>Yes, you
just made a drop bomb.

00:37:50.016 --> 00:37:50.580
<v James Munns>Yeah.

00:37:51.956 --> 00:37:53.806
<v Amos Wenger>Because drop is synchronous.

00:37:54.056 --> 00:37:57.316
So you drop the outer array, which
drops the array inside of it, which

00:37:57.316 --> 00:38:01.376
drops the array inside of it, and all
those drop calls pile up on the stack

00:38:01.626 --> 00:38:03.176
and eventually blow up the stack.

00:38:03.176 --> 00:38:04.866
So you made a type you can never drop.

00:38:04.866 --> 00:38:05.876
You made a drop bomb.

00:38:06.186 --> 00:38:07.126
<v James Munns>Oh, interesting.

00:38:07.126 --> 00:38:07.546
Yeah.

00:38:07.645 --> 00:38:09.895
I do think I've seen this in
the serde JSON code before

00:38:09.998 --> 00:38:11.718
<v Amos Wenger>Other people
have made drop bombs before.

00:38:11.871 --> 00:38:13.211
You have bomb disposal code.

00:38:13.501 --> 00:38:15.921
You can like go deep in there
and  start dropping things

00:38:15.921 --> 00:38:18.910
manually,  shift things around so
that the stack never gets as deep.

00:38:19.121 --> 00:38:24.072
Or I guess in my case, I would just make
another trait to dispose of them safely.

00:38:24.072 --> 00:38:28.054
Like use the same metastack
async function things so that

00:38:28.054 --> 00:38:29.114
you could write natural code.

00:38:29.164 --> 00:38:32.194
So the whole point is, it's weird
machinery with a fake runtime

00:38:32.194 --> 00:38:35.044
and everything, but then you
just get to write natural code.

00:38:35.044 --> 00:38:38.154
You just get the right aysnc function and
you get to recurs as much as you want.

00:38:38.154 --> 00:38:39.474
You don't need to think about any of it.

00:38:39.744 --> 00:38:43.154
All de serializer can be protected
against too much memory usage.

00:38:43.184 --> 00:38:44.954
'Cause you can measure
how much memory you use.

00:38:45.412 --> 00:38:47.594
It has to go through this weird
runtime thing, which decides

00:38:47.594 --> 00:38:48.794
if you get more stack or not.

00:38:48.794 --> 00:38:50.654
So actually, I haven't implemented
the protection, but there's

00:38:50.654 --> 00:38:51.704
a single point to do it.

00:38:52.119 --> 00:38:56.099
Whereas in the serde ecosystem, serde JSON
specifically has protections against that,

00:38:56.099 --> 00:38:59.269
I think, but all the other deserializers
would have to do their own thing.

00:38:59.269 --> 00:39:01.459
There's no standard mechanism for
that because it's not baked into the

00:39:01.459 --> 00:39:03.908
deserializer or deserialized interfaces.

00:39:04.104 --> 00:39:07.224
I have the upper hand because I
have zero compatibility guarantees.

00:39:07.244 --> 00:39:08.274
I get to use modern Rust.

00:39:08.324 --> 00:39:09.884
I get to break things whenever I want.

00:39:10.124 --> 00:39:13.694
Some people have started porting things to
merde v3 and I'm like, sorry, it's v8 now.

00:39:13.929 --> 00:39:18.129
Like, you're five breaking changes
behind, so don't use it, but

00:39:18.129 --> 00:39:19.538
do come with me and experiment

00:39:19.777 --> 00:39:23.545
You know, other thing, it's that
now that you have those functions as

00:39:23.565 --> 00:39:26.155
async, you can actually do async I/O.

00:39:26.795 --> 00:39:26.965
<v James Munns>Yeah.

00:39:26.965 --> 00:39:28.805
That's where I was hoping
you were going with this

00:39:28.865 --> 00:39:28.935
-
<v Amos Wenger>Yeah!

00:39:28.935 --> 00:39:30.325
<v James Munns>That is a very useful thing.

00:39:30.415 --> 00:39:32.205
<v Amos Wenger>Because if you're
going to change everything, you're

00:39:32.205 --> 00:39:34.215
going to change all the traits.

00:39:34.215 --> 00:39:37.325
You're not going to completely go
off course, off the serde ecosystem

00:39:37.365 --> 00:39:38.825
and everything's async anyway.

00:39:39.065 --> 00:39:41.735
Initially, I assumed you would
read everything and your source

00:39:41.735 --> 00:39:43.172
is always like a byte slice...

00:39:43.593 --> 00:39:44.963
but actually you can have a reader.

00:39:44.963 --> 00:39:46.998
You can have a network socket.

00:39:46.998 --> 00:39:49.488
You can have any type of IO underneath it.

00:39:49.620 --> 00:39:52.770
And you can call all the merde
methods synchronously with that fake

00:39:52.770 --> 00:39:54.420
runtime thing or asynchronously.

00:39:54.510 --> 00:39:58.784
And initially I was confused of how it
was gonna work 'cause there's real async

00:39:58.814 --> 00:40:01.882
where you have tokio and you have weird
fake async just for the stack thing.

00:40:02.142 --> 00:40:03.192
But actually it's pretty easy!

00:40:03.342 --> 00:40:06.557
If the next future global is set to
something, you know you need more stack.

00:40:06.687 --> 00:40:08.127
If not, you're going back to tokio.

00:40:08.407 --> 00:40:11.177
So passing through and saying like:
okay, is it just a stack thing

00:40:11.177 --> 00:40:14.097
or does it actually waiting for
a read or a timer or something?

00:40:14.117 --> 00:40:15.417
So it does work.

00:40:15.458 --> 00:40:19.568
You can do streaming deserialization
from like an HTTP response coming

00:40:19.568 --> 00:40:21.318
in and streaming deserialization.

00:40:21.328 --> 00:40:22.588
You don't need to keep
everything in memory.

00:40:22.732 --> 00:40:25.445
<v James Munns>So you said global
a couple of times, and that makes

00:40:25.445 --> 00:40:29.355
me nervous because what if you're
decoding on eight threads and you

00:40:29.355 --> 00:40:30.875
have that, is that a real global?

00:40:30.919 --> 00:40:33.485
<v Amos Wenger>No, it's a
thread local, but now that I'm

00:40:33.565 --> 00:40:35.535
thinking that it's real async...

00:40:36.225 --> 00:40:38.625
<v James Munns>If it's real async,
you're going to be moving around

00:40:38.665 --> 00:40:42.325
different worker threads and you
could technically have two threads.

00:40:42.325 --> 00:40:44.905
Like if you yield waiting for
more data, you could end up-

00:40:44.921 --> 00:40:45.571
<v Amos Wenger>No, it's fine!

00:40:45.675 --> 00:40:47.423
It's fine, I just thought
about it, you're right.

00:40:47.439 --> 00:40:50.027
Okay, first of all, it's a thread local,
so as long as things were actually

00:40:50.027 --> 00:40:51.677
synchronous from the outside, it was fine.

00:40:51.857 --> 00:40:54.707
But now that we're actually doing
async I/O, is it still fine?

00:40:54.707 --> 00:40:56.877
And the answer is yes, because
whenever you yield, you pass

00:40:56.887 --> 00:40:58.389
through our fake runtime.

00:40:58.576 --> 00:41:01.626
<v James Munns>Wait, but do you use
the fake runtime for real async or

00:41:01.626 --> 00:41:04.426
do you only use the fake runtime
when you use the blocking code?

00:41:04.672 --> 00:41:06.477
<v Amos Wenger>Yeah, because
it's the thing that calls poll.

00:41:06.565 --> 00:41:10.855
It calls poll, and then  whenever
anything downstream of that, whenever

00:41:10.895 --> 00:41:13.145
they return poll pending, we see it.

00:41:13.385 --> 00:41:14.395
So we intercept it.

00:41:14.682 --> 00:41:19.086
<v James Munns>The interceptions not in the
executor, it's in the future, like your

00:41:19.086 --> 00:41:21.543
manual future impl that wraps the thing.

00:41:21.593 --> 00:41:26.468
When the actual child future returns
pending the parent future, okay.

00:41:26.468 --> 00:41:28.704
That's what I missed is that
it's the future catching that

00:41:28.704 --> 00:41:30.496
and not the runtime or something.

00:41:30.578 --> 00:41:30.858
<v Amos Wenger>Yes.

00:41:30.979 --> 00:41:35.019
Outside of the deserialization,
tokio would never get to observe

00:41:35.259 --> 00:41:38.629
that ThreadLocal being anything other
than none, because it's synchronously

00:41:38.639 --> 00:41:42.023
being checked and then immediately
starting work on a separate future,

00:41:42.023 --> 00:41:43.543
but that all happens synchronously.

00:41:43.623 --> 00:41:46.223
It never yields in the middle of
deserialization, which might be an issue

00:41:46.223 --> 00:41:50.470
actually, but I guess it would because
tokio has budget thing, but yeah, I

00:41:50.470 --> 00:41:52.584
think the ThreadLocal thing is sound.

00:41:52.644 --> 00:41:54.626
<v James Munns>The thread will
yield, but the task won't.

00:41:54.676 --> 00:41:56.636
You can't force a task to yield.

00:41:56.759 --> 00:41:58.559
<v Amos Wenger>I mean, you
just return poll pending.

00:41:58.621 --> 00:41:59.359
Yeah, yeah, well.

00:41:59.388 --> 00:42:00.098
<v James Munns>You can't force it.

00:42:00.098 --> 00:42:00.898
Yeah, it has to be...

00:42:01.001 --> 00:42:01.831
it's cooperative.

00:42:01.831 --> 00:42:03.924
<v Amos Wenger>It's not preemptive,
yeah, it's cooperative, exactly.

00:42:04.032 --> 00:42:05.797
Okay, we're also almost out of slides.

00:42:05.987 --> 00:42:11.133
One last thing, except for CodeGen, that
I'm excited about, is that currently,

00:42:11.213 --> 00:42:16.406
the deserialize implementation has a
deserialize function that is generic

00:42:16.586 --> 00:42:18.326
over the type of the deserializer.

00:42:18.546 --> 00:42:20.346
But it shouldn't need to.

00:42:20.796 --> 00:42:22.796
Because it's only taking
a mutable reference.

00:42:22.926 --> 00:42:25.596
So we don't need to know the
size of the deserializer.

00:42:25.868 --> 00:42:30.358
We're taking a reference to a trait,
so there's two ways to do that in Rust.

00:42:30.458 --> 00:42:35.114
You either do ampersand mut, and then
some generic type, or impl trait.

00:42:35.322 --> 00:42:40.025
Or you take a mutable reference to a trait
object, but that wording I just learned

00:42:40.025 --> 00:42:42.755
today, this morning, has been renamed.

00:42:42.997 --> 00:42:46.657
We used to say that a trait is either
object safe or not object safe, but

00:42:46.667 --> 00:42:51.487
they renamed that to dyn compatibility,
which actually, means what it says.

00:42:51.847 --> 00:42:53.437
It's like: can you make a dyn out of it.

00:42:53.537 --> 00:42:56.457
And in this case because we have an
async function in the trait which is

00:42:56.477 --> 00:43:00.937
something you can do starting from
Rust 1.75 it is not currently dyn

00:43:00.957 --> 00:43:07.647
compatible but there is a crate called
dynosaur with a y: d y n o saur.

00:43:07.817 --> 00:43:09.457
<v James Munns>I was really
wondering whether you were going

00:43:09.457 --> 00:43:11.007
to say dinosaur or dynosaur.

00:43:11.577 --> 00:43:12.987
<v Amos Wenger>Obviously dinosaur!

00:43:13.217 --> 00:43:14.877
But yeah I say dyn but yeah okay.

00:43:15.147 --> 00:43:17.917
Do you say 'dine' like
'dine' compatibility?

00:43:18.165 --> 00:43:18.607
<v James Munns>No...

00:43:19.047 --> 00:43:20.057
I would say 'dyn trait'.

00:43:20.077 --> 00:43:20.467
Yeah.

00:43:20.572 --> 00:43:22.512
<v Amos Wenger>It's the
dinning philosopher problem.

00:43:23.577 --> 00:43:27.103
So what I want the trait to look
like is to actually take a reference

00:43:27.163 --> 00:43:30.623
of a dyn deserialize, but I can't
because that's an async function in

00:43:30.623 --> 00:43:32.883
trait, but I could if I use dynosaur.

00:43:32.903 --> 00:43:35.463
So this is why I was forced to
give the presentation now, which

00:43:35.463 --> 00:43:37.683
is good because it's already
long, but this is my next step.

00:43:37.716 --> 00:43:41.636
One of the big problems with serde is
that you derive serialize and deserialize,

00:43:41.866 --> 00:43:45.166
and then they get instantiated when
you actually compile the thing.

00:43:45.166 --> 00:43:47.996
So if you have a library with a
lot of types, the things that gets

00:43:48.016 --> 00:43:50.096
cached is kind of just the templates.

00:43:50.196 --> 00:43:52.506
And then when you actually use them
in your application, everything

00:43:52.516 --> 00:43:55.036
gets instantiated and it takes
a long time to build everything.

00:43:55.216 --> 00:43:58.566
What I want is a single implementation
of deserialize and serialize,

00:43:58.566 --> 00:44:01.366
a single copy of the code that
works with any deserializer.

00:44:01.556 --> 00:44:02.926
I want a full use dynamic dispatch.

00:44:03.146 --> 00:44:03.876
Because we don't care.

00:44:03.896 --> 00:44:05.896
Because if it's an application
server anyway, we're parsing

00:44:05.936 --> 00:44:07.292
JSON, for crying out loud.

00:44:07.292 --> 00:44:09.325
We would use another format
if we cared about performance.

00:44:09.325 --> 00:44:09.755
But we don't.

00:44:09.811 --> 00:44:10.841
Dynamic dispatch is fine.

00:44:10.931 --> 00:44:12.101
Heap allocations are fine.

00:44:12.207 --> 00:44:13.227
we want rapid iteration.

00:44:13.237 --> 00:44:15.447
We want to be able to deploy a
change in a few seconds, like

00:44:15.457 --> 00:44:16.427
rebuild the entire website.

00:44:16.792 --> 00:44:19.612
So I want to actually use dynamic
dispatch and that's something that

00:44:19.612 --> 00:44:21.292
dynosaur is playing with for now.

00:44:21.292 --> 00:44:23.342
It's kind of like the async trait crate.

00:44:23.362 --> 00:44:26.782
It's like allowing you to do what the
language will eventually permit you to do

00:44:27.172 --> 00:44:28.552
and you can experiment with the design.

00:44:28.622 --> 00:44:30.492
Just the fact that you were
able to use async fn trait.

00:44:30.602 --> 00:44:33.242
Again- it's a Rust 1.75, which
came out a few months ago.

00:44:33.242 --> 00:44:34.532
I don't remember when exactly.

00:44:34.532 --> 00:44:36.382
This is six week release train.

00:44:37.032 --> 00:44:38.756
<v James Munns>What are we
on, 84 or something now?

00:44:38.756 --> 00:44:41.226
So it's like 10- 60 weeks or something?

00:44:41.276 --> 00:44:43.902
<v Amos Wenger>1.83 just released
this morning  when recording.

00:44:43.952 --> 00:44:44.192
Yeah.

00:44:44.471 --> 00:44:45.141
<v James Munns>Oh, 1.83.

00:44:45.161 --> 00:44:45.461
Okay.

00:44:45.481 --> 00:44:46.928
So just about a year then.

00:44:46.989 --> 00:44:50.089
<v Amos Wenger>As I've been adding features
to merde, this is where I confess:

00:44:50.549 --> 00:44:54.201
initially when it was very simple,
of course, it was faster than serde.

00:44:54.351 --> 00:44:56.827
With serde, if you want type
to conditionally support serde,

00:44:56.857 --> 00:44:58.946
you have to do cfg attributes.

00:44:59.282 --> 00:45:00.561
It's kind of annoying to do.

00:45:00.721 --> 00:45:03.701
Whereas with merde derive, if
you don't enable the flags,

00:45:03.753 --> 00:45:05.153
macro just expands to nothing.

00:45:05.503 --> 00:45:07.716
So you can have it on all
the time, which is great.

00:45:07.735 --> 00:45:08.397
It's very convenient.

00:45:09.072 --> 00:45:12.222
But yeah, over time, as I added
deserialized and serialized impls

00:45:12.452 --> 00:45:17.562
for a lot of traits, like all tuples
up to size 20: it's a bunch of

00:45:17.562 --> 00:45:20.002
code and, you know, rustc churns.

00:45:20.222 --> 00:45:23.572
So I would like to have this dynamic
dispatch thing and see it fixes anything.

00:45:23.942 --> 00:45:27.423
I also thought about, if I did all
this for nothing and you can just,

00:45:27.423 --> 00:45:32.518
like, instantiate types  in crates
so that they're cached and reused.

00:45:32.838 --> 00:45:34.691
And I don't know if I told
you about this idea, James.

00:45:34.691 --> 00:45:37.534
<v James Munns>You had the aha moment when
I was talking about postcard-forth, you

00:45:37.534 --> 00:45:39.094
went, " *gasp* You could do- nevermind!

00:45:39.094 --> 00:45:40.104
We'll talk about that later..."

00:45:40.449 --> 00:45:41.992
<v Amos Wenger>Yeah, it's about
instantiating those types.

00:45:41.992 --> 00:45:44.322
Like I said, you have those
generic types in the crates.

00:45:44.322 --> 00:45:45.992
So they don't really get compiled.

00:45:46.002 --> 00:45:48.496
They're ready for being
monomorphized later.

00:45:48.766 --> 00:45:51.573
But if you force the crate to
monomorphize them by like having a

00:45:51.573 --> 00:45:55.423
JSON features, then anything that
depends on that could just use those.

00:45:55.463 --> 00:45:57.173
But that's depending
on the compiler flag...

00:45:57.330 --> 00:45:58.480
More research is needed.

00:45:58.715 --> 00:46:00.795
That's gonna be in another episode.

00:46:00.805 --> 00:46:02.055
Thanks for coming to my show.

00:46:02.055 --> 00:46:03.945
You can use merde right now,
but you really shouldn't.

00:46:04.066 --> 00:46:07.023
I know it's version 8, but
even I feel funny about it.

00:46:07.312 --> 00:46:07.982
It's usable.

00:46:08.102 --> 00:46:11.062
It's just less flexible than serde,
and it's questionable whether

00:46:11.062 --> 00:46:13.682
it actually builds faster, but
I've had a lot of fun with it,

00:46:13.752 --> 00:46:17.082
and I'm looking forward to like adding
stuff like yoke support, doing actual

00:46:17.082 --> 00:46:18.422
dynamic dispatch, maybe doing Codegen.

00:46:18.422 --> 00:46:21.226
It's aggravating   to have
to re list all the fields.

00:46:21.596 --> 00:46:24.946
I got LLMs to animate that for me, but
still, I don't know, I don't like it.

00:46:25.018 --> 00:46:25.788
<v James Munns>That's good to prototype.

00:46:26.160 --> 00:46:30.081
It's good to mess around because want
to be able to figure out what is and

00:46:30.081 --> 00:46:32.611
isn't good before you really commit.

00:46:32.611 --> 00:46:35.032
it's the opposite of, well, you're
testing in production, but it's,

00:46:35.053 --> 00:46:38.444
testing in a more scoped production,
which means you get actual feedback

00:46:38.444 --> 00:46:39.544
from it and you get runtime data.

00:46:39.544 --> 00:46:42.684
And you can see if like you have asserts
on and they hit and things like that,

00:46:42.843 --> 00:46:45.513
<v Amos Wenger>Another thing I put
the emphasis on when I was designing

00:46:45.513 --> 00:46:46.230
this is to get good diagnostics.

00:46:46.230 --> 00:46:52.763
When   deserialization fails, unless
you opt into like some other crates,

00:46:52.953 --> 00:46:54.383
you get very little information.

00:46:54.383 --> 00:46:55.809
It's like missing field.

00:46:56.329 --> 00:46:56.659
That's it.

00:46:56.669 --> 00:46:58.769
Somewhere deep in your document
that you don't know where.

00:46:58.929 --> 00:47:03.867
So with merde JSON implementation, has
this nice syntax highlighting thing

00:47:03.877 --> 00:47:05.587
and it points exactly to the path.

00:47:05.587 --> 00:47:10.427
It costs memory to keep track of those
things but to me it's worth it because

00:47:10.427 --> 00:47:13.447
in an application server you absolutely
want to know exactly where it failed.

00:47:13.557 --> 00:47:17.657
Some third party server returned a funny
response once in a blue moon, and you

00:47:17.687 --> 00:47:18.797
absolutely want to know what happened.

00:47:18.797 --> 00:47:21.407
So yeah, it's fun to make
different trade offs from serde.

00:47:21.737 --> 00:47:25.270
I still think most people should be
using serde, but I'm excited that

00:47:25.270 --> 00:47:26.339
we get to experiment with uh...

00:47:26.396 --> 00:47:28.019
other parts of the design space.

00:47:28.241 --> 00:47:28.671
<v James Munns>Hell yeah!

00:47:28.781 --> 00:47:29.585
<v Amos Wenger>Of course, now
I want to trash everything

00:47:30.055 --> 00:47:31.157
and replace it with bytecode.

00:47:31.248 --> 00:47:32.328
No thanks to you, James.

00:47:33.044 --> 00:47:33.511
<v James Munns>You're welcome.

00:47:38.661 --> 00:47:40.941
This episode is sponsored by CodeCrafters.

00:47:41.128 --> 00:47:44.398
CodeCrafters is a service for
learning programming skills by doing.

00:47:45.068 --> 00:47:48.358
CodeCrafters offers a curated list
of exercises for learning programming

00:47:48.358 --> 00:47:51.368
languages like Rust or learning
skills like building an interpreter.

00:47:51.843 --> 00:47:55.383
Instead of just following a tutorial, you
can instead clone a repo that contains

00:47:55.393 --> 00:47:59.123
all of the boilerplate already, and make
progress by running tests and pushing

00:47:59.123 --> 00:48:02.463
commits that are checked by the server,
allowing you to move on to the next step.

00:48:03.103 --> 00:48:06.073
If you enjoy learning by doing,
sign up today using the link at

00:48:06.073 --> 00:48:10.193
sdr-podcast.com/codecrafters,
or use the link in the show

00:48:10.193 --> 00:48:11.423
notes to start your free trial.

00:48:11.843 --> 00:48:14.363
If you decide to upgrade, you'll
get a discount and a portion of

00:48:14.363 --> 00:48:15.903
the sale will support this podcast.

00:48:16.348 --> 00:48:20.038
That's sdr-podcast.com/codecrafters.