WEBVTT

NOTE
This file was generated by Descript <www.descript.com>

00:00:13.816 --> 00:00:15.256
<v Amanda Majorowicz>This
is Self-Directed Research!

00:00:15.286 --> 00:00:18.946
Our hosts, James, and Amos, get hyped
about different topics and take turns each

00:00:18.946 --> 00:00:20.566
week presenting their ideas to each other.

00:00:20.946 --> 00:00:23.346
You can check out the website,
YouTube or Spotify to watch the

00:00:23.346 --> 00:00:27.726
episode's presentation and visit
sdr-podcast.com/episodes for

00:00:27.726 --> 00:00:31.416
previous episodes, presentations and
videos, show notes and transcripts.

00:00:31.688 --> 00:00:33.518
New episodes are
published every Wednesday.

00:00:33.840 --> 00:00:38.040
This episode, and all previous episodes
honestly, is brought to you by Descript.

00:00:38.400 --> 00:00:41.250
Check out the link in our description,
or listen at the end of the episode

00:00:41.250 --> 00:00:42.510
to hear what it's all about.

00:00:42.857 --> 00:00:46.337
This week James presents, "What
good is partial understanding?"

00:00:53.899 --> 00:00:57.679
<v James Munns>This week I want to talk
about: what good is partial understanding.

00:00:57.749 --> 00:01:03.262
And this is actually sort of a, redux of
a post I wrote on cohost, RIP cohost...

00:01:03.342 --> 00:01:04.112
<v Amos Wenger>RIP cohost.

00:01:04.191 --> 00:01:05.941
<v James Munns>When I was figuring
out a lot of things that ended

00:01:05.941 --> 00:01:10.041
up becoming Postcard-RPC and some
other stuff that I'm working on.

00:01:10.051 --> 00:01:14.511
But when it comes to two machines,
two programs communicating with each

00:01:14.511 --> 00:01:19.604
other I was trying to figure out what
benefit partially understanding messages

00:01:19.787 --> 00:01:23.787
actually got you, and whether it was
a reasonable thing to try and do.

00:01:24.277 --> 00:01:26.897
<v Amos Wenger>So it's, it's
still a technical presentation?

00:01:26.927 --> 00:01:27.657
<v James Munns>Yeah...

00:01:27.707 --> 00:01:29.867
<v Amos Wenger>Because we talked
about doing like talking about

00:01:29.907 --> 00:01:31.567
other topics at some point.

00:01:31.607 --> 00:01:34.697
I, that's something I'd like to
do, personally in my career, but

00:01:34.697 --> 00:01:35.987
it's like- partial understanding...

00:01:35.997 --> 00:01:38.057
I was like, "Okay, we're going
to the human sciences now?

00:01:38.057 --> 00:01:38.817
What are we, what are we doing?"

00:01:39.177 --> 00:01:42.367
<v James Munns>No, I'm extremely stuck
on machine to machine communication-

00:01:42.367 --> 00:01:43.140
<v Amos Wenger>I can see!

00:01:43.153 --> 00:01:45.438
<v James Munns>It absorbs
all of my idle thoughts.

00:01:45.468 --> 00:01:47.158
<v Amos Wenger>Yes, alright, I'm listening.

00:01:47.175 --> 00:01:49.625
<v James Munns>But this actually has no
code and you don't have to know any

00:01:49.645 --> 00:01:51.295
encoding formats and we're going to use...

00:01:51.555 --> 00:01:52.475
well, we'll get there.

00:01:52.815 --> 00:01:55.725
But let's say you ask me
what time it is right now.

00:01:55.725 --> 00:01:57.625
We are two computers-

00:01:57.665 --> 00:01:58.555
<v Amos Wenger>James, what time is it?

00:01:59.183 --> 00:01:59.673
<v James Munns>We are two..

00:01:59.673 --> 00:02:02.468
uh,  out of order, reset, reset.

00:02:02.628 --> 00:02:06.088
We're two computers, we're programs
written around the same time with

00:02:06.088 --> 00:02:07.598
a common understanding of things.

00:02:07.598 --> 00:02:09.538
And you asked me what
time it is right now.

00:02:09.837 --> 00:02:16.107
Today, I might just say, "11 04
27"  that's my whole response.

00:02:16.107 --> 00:02:17.097
You asked me what time it is.

00:02:17.097 --> 00:02:18.937
This is how we've agreed
to talk to each other.

00:02:19.547 --> 00:02:23.087
We've agreed to say: hour hour,
minute minute, second second.

00:02:23.597 --> 00:02:28.237
This is our common understanding of how
we will uh, communicate with each other.

00:02:28.617 --> 00:02:33.057
We know generally that our
hours, zero to 24, not inclusive.

00:02:33.057 --> 00:02:35.627
So we're using a 24 hour time cycle.

00:02:35.627 --> 00:02:39.067
We're using 60 minutes in
an hour, 60 seconds  in a

00:02:39.067 --> 00:02:40.204
minute, those kinds of things.

00:02:40.204 --> 00:02:43.793
We know generally what's
expected to be in these messages.

00:02:43.793 --> 00:02:46.331
It's going to be those
three sets of digits.

00:02:47.063 --> 00:02:49.323
We're both programs, we are
written around the same time.

00:02:49.553 --> 00:02:53.471
You're off running in production,
and I've decided to improve

00:02:53.471 --> 00:02:57.901
myself, or my programmer has
decided to improve me unilaterally.

00:02:57.971 --> 00:03:02.001
You're off running, you don't get a
chance to be recompiled or reprogrammed

00:03:02.001 --> 00:03:05.621
or whatever, but someone decides
that hour hour, minute minute, second

00:03:05.621 --> 00:03:08.791
second is insufficient for what we
would like to be doing, so I'm going

00:03:08.791 --> 00:03:13.731
to decide as the sender of this what
I would like the format to look like.

00:03:13.826 --> 00:03:19.296
And I've decided, hey, let's get closer
to a good standard, RFC 3339, and I'm

00:03:19.296 --> 00:03:25.944
going to send year, month, date, hour,
minute, second, sub second, millisecond.

00:03:25.994 --> 00:03:32.744
So 2024, 10, 17, 11, 04, 27, 014.

00:03:33.514 --> 00:03:34.894
A very reasonable thing to do.

00:03:35.194 --> 00:03:37.044
I've decided to give you more information.

00:03:37.044 --> 00:03:40.734
I go, "How could anyone be upset at me
for sending this additional information?

00:03:40.734 --> 00:03:42.814
I'm just giving you more options!"

00:03:43.145 --> 00:03:46.605
But if you were a program that was
written, assuming the only thing you

00:03:46.605 --> 00:03:50.215
would ever receive is three integers
in the range format that you expect,

00:03:50.535 --> 00:03:54.285
those kind of things, you would be very
confused because you are not a human,

00:03:54.655 --> 00:03:58.105
you are a computer, and computers really
can only do what we've told them to

00:03:58.105 --> 00:04:02.095
do, and if we decide to say, this is
what you're going to get, then you're

00:04:02.115 --> 00:04:05.255
going to be very confused when you go
to read an hour that is four digits

00:04:05.555 --> 00:04:10.915
and starts with 2024, because you go:
that's, that's not a very good hour.

00:04:11.235 --> 00:04:13.315
There's various failure
modes that could happen here.

00:04:13.315 --> 00:04:15.445
You could just say, "I
got a bad response."

00:04:15.785 --> 00:04:20.487
You could parse that
first 20 as: it's 8 PM.

00:04:20.877 --> 00:04:23.807
And then you could try and parse
14 as the minutes- you know,

00:04:23.837 --> 00:04:26.687
there's a lot of failure modes we
could do from totally rejecting

00:04:26.687 --> 00:04:28.587
it, which you should probably do.

00:04:29.057 --> 00:04:30.487
And then totally misinterpreting it.

00:04:31.141 --> 00:04:33.071
<v Amos Wenger>I was going to say
the only reasonable option here

00:04:33.071 --> 00:04:34.431
is to completely reject it.

00:04:34.661 --> 00:04:39.691
But then I remembered about parse int in
browsers, which would definitely take like

00:04:39.701 --> 00:04:41.751
2024 and be like: okay, that's a number.

00:04:41.751 --> 00:04:44.401
And then encounter a space and be
like: that's no longer a number.

00:04:44.411 --> 00:04:45.681
Let's just return 2024.

00:04:45.961 --> 00:04:46.811
I think that's what it does.

00:04:46.901 --> 00:04:47.501
I'm pretty sure.

00:04:47.786 --> 00:04:51.576
<v James Munns>Yep, and then once you
mask that down, you either overflow

00:04:51.586 --> 00:04:54.522
or you just get a random number that
you don't expect, but things are not

00:04:54.532 --> 00:04:56.996
going well in our communication today.

00:04:57.153 --> 00:04:59.473
And that's because this format
that I've described- you know,

00:04:59.493 --> 00:05:02.603
hour hour, minute minute, second
second- is not self describing.

00:05:02.693 --> 00:05:04.963
The message doesn't tell us what it is.

00:05:05.703 --> 00:05:11.122
It just assumes that we've pre negotiated
what a reasonable set of things are,

00:05:11.532 --> 00:05:15.104
and if you know, you know, and if
you don't, then you're out of luck.

00:05:15.858 --> 00:05:17.208
So we say, okay, you know what?

00:05:17.248 --> 00:05:18.778
We've broken someone's code.

00:05:18.788 --> 00:05:20.508
We updated our time server.

00:05:20.508 --> 00:05:21.738
We thought it was going to be lovely.

00:05:21.748 --> 00:05:24.878
Our users are now incredibly upset
because all of a sudden their

00:05:24.878 --> 00:05:27.728
things are breaking because they
didn't know how to understand that.

00:05:27.958 --> 00:05:28.888
We go, you know what?

00:05:29.338 --> 00:05:32.188
We're going to fix this by
using a self describing format.

00:05:32.188 --> 00:05:36.678
We're going to include enough information
into the message that you can figure it

00:05:36.678 --> 00:05:39.303
out, even if we change things over time.

00:05:39.303 --> 00:05:43.143
We make it forward compatible or, or
whatever you want to describe it as.

00:05:43.803 --> 00:05:47.173
So we take our 11 04 27

00:05:48.303 --> 00:05:50.923
and we add a suffix on
each of those numbers.

00:05:50.923 --> 00:05:55.098
So we say 11h 04m 27s.

00:05:55.878 --> 00:05:58.508
And as a human, you read
this and you go: okay, cool.

00:05:58.508 --> 00:05:59.988
That's hours, minutes, and seconds.

00:06:00.318 --> 00:06:04.418
I can look at that and figure out which
one's marked hour, which one's marked

00:06:04.418 --> 00:06:06.068
minutes, which one's marked seconds.

00:06:06.338 --> 00:06:08.748
And now all of a sudden our
machines can figure it out too.

00:06:08.998 --> 00:06:12.528
It's readable to a human still, and
it's still readable to a machine.

00:06:13.208 --> 00:06:16.228
So we now semantically
are thinking about this.

00:06:16.268 --> 00:06:19.737
I've colored each of these so we can
see what the machine maybe parses

00:06:19.757 --> 00:06:21.307
this as, as when we look at it.

00:06:21.327 --> 00:06:25.317
We've got hours minutes and
seconds as a separate concept

00:06:25.337 --> 00:06:27.017
that our computer is thinking in.

00:06:27.647 --> 00:06:29.737
The first thing we notice is
that our messages are now bigger.

00:06:29.797 --> 00:06:34.107
I had to add a delimiter or I had to
add some extra information that allows

00:06:34.107 --> 00:06:37.102
you to recover the shape of the message.

00:06:37.312 --> 00:06:41.352
So instead of just sending my three
integers, I'm sending a character or some

00:06:41.352 --> 00:06:44.242
kind of field delimiter in my messages.

00:06:44.282 --> 00:06:45.482
But you know what?

00:06:45.622 --> 00:06:46.902
We understand things.

00:06:46.902 --> 00:06:48.672
And maybe that's the
cost of doing business.

00:06:48.702 --> 00:06:52.392
We were paying a little overhead,
but what we gain in flexibility

00:06:52.483 --> 00:06:53.643
surely makes it worth it.

00:06:53.837 --> 00:06:56.859
So then I go back to making the
upgrade that I always planned to do.

00:06:56.909 --> 00:07:07.369
And now I send " 2024 y 10  M
17 d 11 h 0 4 m 27 s 0 1 4 i."

00:07:07.859 --> 00:07:12.769
And this is a lot more than the system was
asking for when it just wanted the time.

00:07:13.109 --> 00:07:19.629
But because we have this self describing
format ability, we can ignore all   that

00:07:19.629 --> 00:07:21.769
extra suffixes that we don't understand.

00:07:21.989 --> 00:07:23.949
And we find the three
things that we care about.

00:07:23.949 --> 00:07:25.769
We care about the H, the M and the S.

00:07:26.239 --> 00:07:30.349
We've successfully received this message,
even though- like your high school word

00:07:30.349 --> 00:07:33.139
problems, you've got a lot of extra
information and you had to figure out

00:07:33.139 --> 00:07:36.999
which ones you didn't care about, but
we have solved this word problem even

00:07:36.999 --> 00:07:39.019
as a computer and not just as a human.

00:07:40.229 --> 00:07:42.959
<v Amos Wenger>Is it bad that
looking at this I'm I'm immediately

00:07:42.979 --> 00:07:45.839
thinking of: Oh, this should be
like length prefixed or something.

00:07:45.849 --> 00:07:49.912
Also: What are, what if something's
not Or like, do you first, split on

00:07:49.912 --> 00:07:52.652
space and then that's how you know?

00:07:52.652 --> 00:07:55.662
Like you get the last character and
so- what what if you have more than

00:07:55.662 --> 00:08:00.412
256 or whatever number of ASCII
characters, different field types, like

00:08:00.432 --> 00:08:03.855
all of those things are immediately
springing into my mind and that's why

00:08:03.865 --> 00:08:05.335
I charge the hourly rate that I do.

00:08:05.525 --> 00:08:07.265
It's because I'm a senior engineer, baby.

00:08:07.265 --> 00:08:09.325
<v James Munns>I was going to say:
you are a person who has been burned

00:08:09.335 --> 00:08:13.155
by protocol design before, which
I think is how we- our collective

00:08:13.155 --> 00:08:14.815
trauma is how we've ended up here.

00:08:14.815 --> 00:08:16.115
<v Amos Wenger>But I'm assuming
you're building up to that.

00:08:16.158 --> 00:08:17.048
<v James Munns>Oh, we'll get there.

00:08:18.005 --> 00:08:18.655
So this is neat.

00:08:18.925 --> 00:08:20.405
This is, this is what we wanted, right?

00:08:20.405 --> 00:08:21.585
We want to be able to change stuff.

00:08:21.585 --> 00:08:24.725
We have flexibility, we admit that
our servers and clients are not

00:08:24.725 --> 00:08:27.375
written by the same people who have
different wants and different needs,

00:08:27.375 --> 00:08:28.935
but we need to make something work.

00:08:29.225 --> 00:08:29.825
So cool.

00:08:29.845 --> 00:08:31.005
We've got our extra overhead.

00:08:31.195 --> 00:08:34.495
We're processing all these extra
fields that we don't care about, but

00:08:34.905 --> 00:08:37.225
whatever overhead is overhead, it works.

00:08:37.405 --> 00:08:37.935
Who cares?

00:08:37.955 --> 00:08:38.795
Computers are cheap.

00:08:39.713 --> 00:08:40.593
So something has changed.

00:08:40.593 --> 00:08:42.113
I'm having, you know, a reckless day.

00:08:42.113 --> 00:08:46.414
I've decided to switch from the ordered
messages that I was sending before.

00:08:46.414 --> 00:08:49.894
And now I represent these internally
as a hash map, which means all my

00:08:49.894 --> 00:08:54.424
fields get shuffled depending on
whatever my hash seed is for the day.

00:08:54.634 --> 00:08:58.724
And I send you 27 S 11 H 0 4 M.

00:08:59.294 --> 00:09:02.014
This is not hours, minutes,
seconds, like we talked about,

00:09:02.344 --> 00:09:04.054
but the suffixes are still there.

00:09:04.314 --> 00:09:08.354
So when we go to parse this, we say
the S is the seconds, the H is the

00:09:08.354 --> 00:09:09.824
hours, and the M is the minutes.

00:09:10.544 --> 00:09:15.795
Even though this was out of order,
this self describing message format got

00:09:15.795 --> 00:09:17.205
us the ability to recover from that.

00:09:17.265 --> 00:09:20.275
We say we know which field is
which, even if they didn't show up

00:09:20.505 --> 00:09:22.775
exactly how we expected them to.

00:09:24.070 --> 00:09:26.980
<v Amos Wenger>I get a chance to talk
about, I think it's Hyrum's rule?

00:09:27.140 --> 00:09:31.090
Which is that everything that's
noticeable, even if it's not documented,

00:09:31.350 --> 00:09:34.400
ends up being part of a public
interface, whether you like it or not.

00:09:34.750 --> 00:09:38.970
So I am also immediately thinking of
the case where people assumed: Oh, if

00:09:38.970 --> 00:09:43.265
we get like "number number h" then it's
going to be followed by m and then s.

00:09:43.605 --> 00:09:46.795
So they didn't actually check the
suffixes that is like, if they

00:09:46.795 --> 00:09:50.775
encounter h, they just keep reading
that exact number of characters and

00:09:50.785 --> 00:09:52.185
interpret them as minutes and seconds.

00:09:52.185 --> 00:09:55.315
And then this would completely break
if you actually changed the order.

00:09:55.405 --> 00:09:57.962
But yeah, that's, I think, I think
it's Hyrum's rule: you have not

00:09:57.972 --> 00:10:01.652
documented this anywhere, but people
have been using your API for a while.

00:10:01.672 --> 00:10:04.722
They've noticed some patterns and
they've started relying on them.

00:10:05.232 --> 00:10:08.227
And now you have to uphold that even
if that wasn't part of your plan.

00:10:08.837 --> 00:10:12.056
<v James Munns>Yeah, this is
the spacebar heating xkcd.

00:10:12.602 --> 00:10:16.451
But yeah, this is also " pains of
why you might not want to design

00:10:16.451 --> 00:10:22.122
your own encoding format and home
roll your own parsers or serializers

00:10:22.152 --> 00:10:25.092
because edge cases are a thing."

00:10:25.792 --> 00:10:29.172
This is wonderful because I, even
though you shuffled things for me,

00:10:29.172 --> 00:10:32.452
I was still able to understand it
because there was enough information

00:10:32.552 --> 00:10:36.642
in the message itself that told me
how to think about this message.

00:10:36.684 --> 00:10:39.824
A bit of the schema of the
message was within itself.

00:10:40.014 --> 00:10:44.464
I know what fields are what, even if
they don't show up in the same order.

00:10:45.048 --> 00:10:48.718
The downside is, I can't understand
this message in one pass Like you were

00:10:48.718 --> 00:10:52.078
saying with the home-rolled parser,
if I wanted to be very clever and

00:10:52.078 --> 00:10:55.338
efficient, I might go: take an integer,
take an integer, take an integer.

00:10:55.388 --> 00:10:57.778
You know, I can do that
linearly in one pass.

00:10:57.938 --> 00:11:01.078
I've convinced myself that it is very
optimized and a good thing to do...

00:11:01.528 --> 00:11:02.598
but you can't do that.

00:11:02.708 --> 00:11:05.828
If we admit that our format is
allowed to have out of order

00:11:05.828 --> 00:11:09.576
identifiers, because we didn't say
they always appear in this order.

00:11:09.576 --> 00:11:12.728
We just said:  if there are
hours, they will appear with an H.

00:11:12.738 --> 00:11:14.158
If there are minutes, they
will appear with this.

00:11:14.398 --> 00:11:17.238
And so when we get bonus data,
great, we can skip those.

00:11:17.238 --> 00:11:19.798
And when things are out of
order, we can recover from that.

00:11:20.533 --> 00:11:24.153
<v Amos Wenger>Bonus data is forever
seared in my brain as like...

00:11:24.363 --> 00:11:25.703
memory corruption in C.

00:11:25.733 --> 00:11:26.473
<v James Munns>Yeah, yeah, yeah.

00:11:26.640 --> 00:11:29.340
<v Amos Wenger>It's the thing you don't get
in Rust, but in C, if you accidentally

00:11:29.340 --> 00:11:32.850
read past the end of a buffer, which
happens a lot, then you get bonus data.

00:11:33.065 --> 00:11:34.235
<v James Munns>So much bonus data.

00:11:34.765 --> 00:11:37.065
We can see that we can't parse this
in one thing, because if we were

00:11:37.065 --> 00:11:38.515
going: hey, what are the hours?

00:11:38.585 --> 00:11:40.185
We have to go to the
middle of the message.

00:11:40.235 --> 00:11:41.725
If we go: what are the minutes?

00:11:41.725 --> 00:11:43.995
We have to go to the end of the
message, or what are the seconds?

00:11:44.495 --> 00:11:46.015
We have to go to the
beginning of the message.

00:11:46.015 --> 00:11:47.325
So we couldn't do this in one pass.

00:11:47.355 --> 00:11:51.565
Maybe we could turn it into a hash map
that we can query, but again, we have to

00:11:51.565 --> 00:11:54.416
do sort of out of order grabs from this.

00:11:54.416 --> 00:11:57.023
There's no more just, grab the
next integer, grab the next

00:11:57.023 --> 00:11:58.213
integer, grab the next integer.

00:11:58.821 --> 00:12:01.761
And this is an important step
because we've gone from decoding that

00:12:01.761 --> 00:12:04.421
message to querying that message.

00:12:04.461 --> 00:12:08.043
This has gone from something that
we are just slurping the bytes and

00:12:08.063 --> 00:12:13.477
transforming them into our internal
format, to: we now have this object

00:12:13.577 --> 00:12:15.317
that we have to ask questions.

00:12:15.337 --> 00:12:16.227
Do you have this?

00:12:16.227 --> 00:12:17.127
Do you have this?

00:12:17.417 --> 00:12:18.387
Do you have this?

00:12:18.417 --> 00:12:19.770
And that's not bad.

00:12:20.070 --> 00:12:23.320
This is a common thing, but it's
important to note that even though

00:12:23.320 --> 00:12:27.760
these messages seem very similar,
our mode of interacting with them has

00:12:27.760 --> 00:12:32.782
changed very, very dramatically to
go to this self describing format.

00:12:32.979 --> 00:12:35.969
And then what if I just send 11h04m?

00:12:37.079 --> 00:12:40.203
Maybe I have a weird thing in my
code, where when the seconds are

00:12:40.203 --> 00:12:43.693
zero, I just null them out for
some reason because I'm trying to

00:12:43.693 --> 00:12:45.133
be clever or something like that.

00:12:45.133 --> 00:12:48.333
But the receiver of this message
doesn't know that convention.

00:12:48.633 --> 00:12:52.093
They go to query hours, minutes, and
seconds out of this message they've

00:12:52.093 --> 00:12:53.873
received, and they get an error.

00:12:53.963 --> 00:12:57.203
They say, "There are no seconds
field," and then you have to decide.

00:12:58.033 --> 00:12:59.633
What do I do in that case?

00:12:59.643 --> 00:13:03.183
Is that just: Oh, well, I just returned
the error because my query failed.

00:13:03.423 --> 00:13:06.493
Is that: Oh, when there's no
seconds, I default to zero...

00:13:06.823 --> 00:13:11.153
but then how do I know that zero is more
reasonable than 30 seconds or 59 seconds.

00:13:11.413 --> 00:13:15.443
Maybe as a human, you could go:
whatever, if I just say 1104, I'll

00:13:15.443 --> 00:13:18.883
assume that it's a rounded number or
zero seconds or something like that.

00:13:18.883 --> 00:13:22.863
But computers don't guess
unless we've told them to guess.

00:13:23.160 --> 00:13:26.220
<v Amos Wenger>Or, if the format
you're using is based on a language

00:13:26.240 --> 00:13:30.525
that insists that zero values are
always meaningful and good, and

00:13:30.555 --> 00:13:32.545
every field is actually optional.

00:13:32.816 --> 00:13:33.476
If you know, you know.

00:13:34.241 --> 00:13:36.321
<v James Munns>Now we have to
admit that queries can fail.

00:13:36.731 --> 00:13:40.061
It's not just the deserialization
failed because we received a well

00:13:40.071 --> 00:13:44.829
formed message, but our information
that we wanted to pull from it failed.

00:13:45.129 --> 00:13:48.079
So it was a good message,
but not what we needed.

00:13:48.229 --> 00:13:49.606
You mentioned this about types before.

00:13:49.606 --> 00:13:53.066
What if I send the word "eleven" in
quotes, the word "four" in quotes

00:13:53.066 --> 00:13:57.276
and the word "twelve" in quotes, and
I still prefix them with H M and S.

00:13:57.466 --> 00:13:59.956
This is a reasonable
way of formatting this.

00:13:59.956 --> 00:14:04.146
And if you're whatever wire format
you're using has the ability to have both

00:14:04.166 --> 00:14:06.716
numbers and strings in the messages...

00:14:07.013 --> 00:14:08.013
what's wrong with this?

00:14:08.013 --> 00:14:10.803
I said, I was going to send hour
minutes and seconds with a suffix.

00:14:10.803 --> 00:14:14.257
We maybe never agreed that
numbers were the only way

00:14:14.257 --> 00:14:15.397
that we were going to do this.

00:14:16.257 --> 00:14:17.947
<v Amos Wenger>Just the large air
quotes while you were saying, "It's

00:14:17.947 --> 00:14:19.917
a 'reasonable' way to format this.

00:14:20.347 --> 00:14:20.587
<v James Munns>Yeah.

00:14:20.587 --> 00:14:22.547
I'm making a bit of a
straw man here, but...

00:14:22.547 --> 00:14:26.797
<v Amos Wenger>Yes, well, it's so wasteful
to just actually use, well, JSON...

00:14:27.067 --> 00:14:30.137
uh, actually is double quotes, but
also the hilarious part is that

00:14:30.137 --> 00:14:33.307
I believe on your slides, they're
not even actually double quotes.

00:14:33.307 --> 00:14:34.487
They're smart quotes.

00:14:34.987 --> 00:14:36.897
So that makes it even less reasonable.

00:14:37.022 --> 00:14:37.822
<v James Munns>Typing on a Mac.

00:14:38.037 --> 00:14:38.647
<v Amos Wenger>Yeah.

00:14:39.262 --> 00:14:42.122
<v James Munns>But now we have to admit
that types are part of our queries too.

00:14:42.122 --> 00:14:44.892
We're not just saying the H, the
M, and the S are important to us.

00:14:44.892 --> 00:14:48.582
We are saying we are specifically
querying: I want an integer with

00:14:48.582 --> 00:14:52.000
the H suffix, I want an integer
with the M suffix, and I want

00:14:52.000 --> 00:14:53.310
an integer with the S suffix.

00:14:53.320 --> 00:14:58.183
So now not only are our queries based on
the key or the specific tag that we're

00:14:58.183 --> 00:15:00.523
using, but also the type of the message.

00:15:00.543 --> 00:15:03.973
If we're using a format that has
more than one type that is numbers.

00:15:04.217 --> 00:15:07.884
And this is a tricky thing to realize
when we're writing our client code

00:15:07.904 --> 00:15:12.854
is that messages can be well formed,
parsable, and queryable, and still

00:15:12.854 --> 00:15:14.604
be insufficient for our needs.

00:15:14.924 --> 00:15:19.124
And this means that we need to handle
errors at every single one of those steps.

00:15:19.604 --> 00:15:20.954
Did we receive a message?

00:15:21.194 --> 00:15:23.882
Did we receive a message
that can be formatted in this

00:15:23.891 --> 00:15:25.401
format that we've decided on?

00:15:25.841 --> 00:15:30.135
Is it a message that has
fields that we expect in there?

00:15:30.165 --> 00:15:32.785
Are the fields of a form
that we expect them in?

00:15:33.145 --> 00:15:35.925
And we just said: well, we'll make this
self describing, you know, the overhead

00:15:35.925 --> 00:15:40.455
is just the three characters that we're
adding as tags, or sometimes we have

00:15:40.455 --> 00:15:42.465
to add bonus data and things like that.

00:15:42.465 --> 00:15:47.385
But when you switch to a format that
has this level of flexibility, you have

00:15:47.385 --> 00:15:52.286
to figure out what our response is when
we hit all of those flexibility cases.

00:15:52.626 --> 00:15:55.545
And this is one of those things
that I think people don't always

00:15:55.545 --> 00:15:57.235
realize the full extent of cost.

00:15:57.235 --> 00:16:01.809
I think Rust makes it a little bit
easier because when you access something

00:16:02.029 --> 00:16:06.159
like a JSON message, you have to call
the get APIs on it and it might return

00:16:06.159 --> 00:16:08.169
none or the null representation.

00:16:08.169 --> 00:16:11.529
So in Rust, you're probably having a match
statement that's exhaustive and you're

00:16:11.529 --> 00:16:14.629
going to handle this, or you're going
to have to unwrap an option or whatever.

00:16:14.629 --> 00:16:17.379
So Rust, I think at least
surfaces this concern.

00:16:18.539 --> 00:16:24.498
But if you just slap a question mark
after everything, then: oops, no seconds?

00:16:24.508 --> 00:16:25.268
We're done here.

00:16:25.358 --> 00:16:30.798
And one second out of every 60 of every
minute, we just fail to retrieve the time.

00:16:31.753 --> 00:16:33.113
Maybe your program's cool with that.

00:16:33.983 --> 00:16:35.993
<v Amos Wenger>I guess this is the
whole point of your presentation

00:16:36.003 --> 00:16:40.033
is that on the one hand, you can
now have partial understanding.

00:16:40.043 --> 00:16:43.073
You can like some fields, you don't
know how to decode, but at least

00:16:43.073 --> 00:16:44.503
you got the timestamp or something.

00:16:45.033 --> 00:16:47.823
But on the other hand, now you
have an explosion of combinations

00:16:47.823 --> 00:16:49.013
of cases to deal with.

00:16:49.483 --> 00:16:52.983
And it's true that there's- I see
the parallel between languages like

00:16:52.983 --> 00:16:56.923
Rust that force you to deal with
like error, no error, none or some.

00:16:57.413 --> 00:17:02.185
As opposed to languages that will
somewhat work up to a point, but just

00:17:02.185 --> 00:17:05.995
like propagate null or NaN or whatever,
well, NaN is a problem anywhere.

00:17:06.435 --> 00:17:08.115
But that partial functionality thing...

00:17:08.125 --> 00:17:12.615
I remember people being upset coming
from PHP to a compiled language because

00:17:12.625 --> 00:17:16.025
they were like, "Well, but I liked it
when half my website was broken because

00:17:16.025 --> 00:17:17.685
at least the other half still worked!

00:17:17.895 --> 00:17:20.655
Now everything's either broken
or everything compiles..."

00:17:20.785 --> 00:17:22.660
which is a really different
way to approach things.

00:17:22.868 --> 00:17:23.088
<v James Munns>Yeah.

00:17:23.088 --> 00:17:26.858
I have an extremely strongly
typed programming language brain

00:17:26.928 --> 00:17:28.058
when it comes to these things.

00:17:28.068 --> 00:17:31.351
So I think you can definitely pick up
on my biases when I'm talking about- I

00:17:31.361 --> 00:17:34.091
mean, I'm kind of picking on JSON here
because all of these things that I'm

00:17:34.101 --> 00:17:37.521
mentioning are really just like a tiny
version of the kind of problems you

00:17:37.521 --> 00:17:39.121
can run into JSON and things like that.

00:17:39.121 --> 00:17:39.231
But...

00:17:39.231 --> 00:17:42.244
JSON is by far not the only offender here.

00:17:42.244 --> 00:17:45.804
And as you mentioned, different languages
and different protocols run into this.

00:17:45.804 --> 00:17:50.334
Like the zeros of a value mean
something is directly picking on

00:17:50.483 --> 00:17:54.043
ProtoBuf, which is another one that
I'm going to throw some stones at.

00:17:54.053 --> 00:17:56.463
But the point is, it's
not necessarily bad.

00:17:56.503 --> 00:17:57.583
Like, don't get me wrong.

00:17:57.803 --> 00:18:00.383
It is not a purely
good, bad decision here.

00:18:00.403 --> 00:18:04.385
It is just: you should know what
costs you're signing up for when you

00:18:04.385 --> 00:18:07.615
say, "Ah, I will choose this because
it gives me this flexibility," of

00:18:07.855 --> 00:18:10.115
what that flexibility really costs.

00:18:10.740 --> 00:18:13.500
And the thing is that when you switch
to these self describing formats,

00:18:13.500 --> 00:18:17.220
because they have all this flexibility,
essentially every single self

00:18:17.220 --> 00:18:20.120
describing format is a key value store.

00:18:20.457 --> 00:18:24.757
At least the common ones that I'm aware of
today is essentially they all boil down to

00:18:24.757 --> 00:18:30.167
a very hash map or dictionary or key value
store sort of interface in that you're not

00:18:30.303 --> 00:18:34.863
getting information, you're getting a mini
database that you have to query and deal

00:18:34.863 --> 00:18:37.023
with what happens if your queries fail.

00:18:37.023 --> 00:18:38.953
Even binary formats like ProtoBuf.

00:18:39.418 --> 00:18:40.728
It's still a key value store.

00:18:40.728 --> 00:18:44.618
The keys are integers instead of
strings, but it is still a key value

00:18:44.618 --> 00:18:48.398
store, which is how you're allowed
to have bonus fields and fields that

00:18:48.398 --> 00:18:50.098
you move around and things like that.

00:18:50.214 --> 00:18:53.474
<v Amos Wenger>I think the main exception
here would be columnar formats?

00:18:53.474 --> 00:18:57.560
Like arrow, parquet, and there's a new
one I keep forgetting the name of...

00:18:58.150 --> 00:19:02.143
where they actually, they don't describe
each record, they describe the entire set.

00:19:02.143 --> 00:19:04.473
And then there's like: here's
all the timestamps for all the

00:19:04.473 --> 00:19:07.763
records, and then here's all the
names and all the descriptions.

00:19:07.907 --> 00:19:08.357
<v James Munns>That's fair...

00:19:08.837 --> 00:19:10.954
<v Amos Wenger>Those might be different,
but they also have different use

00:19:10.954 --> 00:19:14.624
cases and I think neither you
or I have use cases for them.

00:19:14.784 --> 00:19:17.474
So, yes, in our little corner
of the universe, I agree.

00:19:17.514 --> 00:19:20.474
<v James Munns>Yeah, my big data
is a couple megabytes instead

00:19:20.484 --> 00:19:21.928
of, you know, Apache Arrow.

00:19:22.769 --> 00:19:25.959
Like I said, this is not just
throwing stones at JSON or ProtoBuf.

00:19:25.959 --> 00:19:29.289
I mean, it's JSON, TOML,
YAML, ProtoBuf, CBOR, ASN.

00:19:29.299 --> 00:19:32.319
Like, doesn't matter whether it's
a binary format or an ASCII format

00:19:32.319 --> 00:19:35.549
or a UTF 8 format or if it's
human readable or if it's not.

00:19:35.859 --> 00:19:39.859
If it is a self describing format,
you've chosen that for the flexibility

00:19:39.869 --> 00:19:43.049
and that flexibility comes with
all of these regardless of how fast

00:19:43.049 --> 00:19:45.199
it is to serialize or deserialize.

00:19:45.209 --> 00:19:49.498
You are serializing a view of
an object basically and when you

00:19:49.508 --> 00:19:52.940
deserialize that you have to deal
with  the externalities of that.

00:19:53.480 --> 00:19:55.860
<v Amos Wenger>Can't help but notice
you misspelled "MessagePack"

00:19:55.860 --> 00:19:58.300
in there, you wrote it "CBOR,"
I'm sure that's a mistake.

00:19:58.920 --> 00:19:59.670
<v James Munns>MessagePack...

00:19:59.880 --> 00:20:02.910
I have to remember, I'm not sure
if MessagePack is self describing.

00:20:02.910 --> 00:20:07.660
CBOR is like the binary version of JSON
and it has all that same flexibility that

00:20:07.670 --> 00:20:09.630
JSON has, just in a much smaller form.

00:20:09.766 --> 00:20:12.476
<v Amos Wenger>I think the main
difference is that CBOR has

00:20:12.486 --> 00:20:15.466
like an actual RFC or something?

00:20:15.476 --> 00:20:17.756
But it's otherwise very,
very close to MessagePack.

00:20:18.001 --> 00:20:18.421
<v James Munns>Gotcha.

00:20:18.741 --> 00:20:21.001
Yeah, I know Cap'n Proto and Message Pack.

00:20:21.001 --> 00:20:22.551
I always forget the details between them.

00:20:22.551 --> 00:20:24.808
I think Cap'n Proto is not
self describing, but...

00:20:24.881 --> 00:20:26.371
I included the ones that I knew were.

00:20:26.696 --> 00:20:27.858
<v Amos Wenger>I forget what it does.

00:20:27.921 --> 00:20:30.361
All I remember for Cap'n Proto is
that they have a picture of like

00:20:30.541 --> 00:20:34.371
the captain on a cereal box and
then they're like, "Infinite speed!

00:20:34.381 --> 00:20:36.721
Because you don't have
to decode anything..."

00:20:36.891 --> 00:20:38.711
and I'm like, "That's not really...

00:20:38.761 --> 00:20:39.281
but sure..."

00:20:39.711 --> 00:20:41.321
<v James Munns>It's the same
trick that rkyv pulls.

00:20:41.371 --> 00:20:41.851
<v Amos Wenger>Yeah yeah.

00:20:42.069 --> 00:20:44.709
And abomonation and a bunch
of other zero copy things.

00:20:44.769 --> 00:20:46.899
<v James Munns>Yeah,
abomonation's extra special.

00:20:46.899 --> 00:20:50.299
That's where the whole extra thing, but-
we're sticking with a self describing

00:20:50.299 --> 00:20:51.789
formats when I'm throwing stones.

00:20:51.809 --> 00:20:54.149
So we're talking about these
formats: it's all of them.

00:20:54.339 --> 00:20:56.619
And for better or worse, this
is what we like about them.

00:20:56.619 --> 00:20:59.599
We want to have config files where we
can leave out the ones that are default.

00:20:59.599 --> 00:21:03.039
And we just admit that we have
to define what default values

00:21:03.039 --> 00:21:04.239
of all of these are and...

00:21:04.402 --> 00:21:07.162
we have to admit that there's certain
fields that we care about and certain

00:21:07.162 --> 00:21:10.972
fields we don't, and we just deal with
that because it makes the user experience

00:21:10.972 --> 00:21:15.835
of one frame, whether that's a config file
on the disk or a message we receive back

00:21:15.835 --> 00:21:19.115
from GitHub's API, what we care about.

00:21:19.165 --> 00:21:21.425
For better or worse, they all
allow failable queries and

00:21:21.425 --> 00:21:24.405
bonus data and missing data and
things like that which means

00:21:24.705 --> 00:21:26.475
Did this switch actually help us?

00:21:27.175 --> 00:21:29.945
And the answer I would
probably say is maybe.

00:21:29.995 --> 00:21:34.016
It helped us if we realized
what we were signing up for.

00:21:34.556 --> 00:21:38.660
But if we just switched our code over
from saying, " Grab integer, grab

00:21:38.660 --> 00:21:45.337
integer, grab integer," to, "Parse
message, grab H M S," are we better off?

00:21:45.677 --> 00:21:48.967
Well, we could handle some cases
better, like with bonus data, but

00:21:48.967 --> 00:21:52.047
if something's missing or the wrong
type, we're back at square one.

00:21:52.047 --> 00:21:55.277
We just go: I have no idea
what this message means.

00:21:55.437 --> 00:21:57.674
I failed to obtain this timestamp.

00:21:58.058 --> 00:21:59.508
And there's going to be a lot of
people who are going to be like,

00:21:59.508 --> 00:22:01.798
"Well, you could just-" and I'm sure
the Go people would say, "Well, you

00:22:01.798 --> 00:22:03.838
just trust zero as  a magic value.

00:22:03.838 --> 00:22:05.098
And you would just do this."

00:22:05.231 --> 00:22:08.271
It's just gets into "No
true Scotsman" of: yes.

00:22:08.361 --> 00:22:12.646
If you were just git gud enough
and you handled all these edge

00:22:12.646 --> 00:22:14.486
cases, you would be well sorted.

00:22:14.496 --> 00:22:17.066
And I think that's a
reasonable position to take.

00:22:17.516 --> 00:22:22.648
But I think that most people don't
necessarily take all of these things.

00:22:22.658 --> 00:22:26.218
If people were already such pros that
they have the discipline to keep messages

00:22:26.498 --> 00:22:30.608
semver compatible or whatever you want
to call like that, we only make non

00:22:30.608 --> 00:22:32.468
breaking forwards compatible changes.

00:22:32.648 --> 00:22:34.748
If you already have that
level of discipline,

00:22:34.963 --> 00:22:39.253
then couldn't we just admit that we have
more than one kind of messages and just

00:22:39.253 --> 00:22:42.703
admit that this isn't the same message as
the one that we were sending before and

00:22:43.233 --> 00:22:48.423
put that on a different API or a different
version ID of this or something like that.

00:22:48.423 --> 00:22:51.373
Where we just say, "There
are two different messages,"

00:22:51.532 --> 00:22:53.872
instead of just saying, "Well,
there's only one kind: JSON."

00:22:54.812 --> 00:22:59.592
But then the semantical shape of the
message that I'm sending becomes load

00:22:59.602 --> 00:23:04.022
bearing versus just: oh, I'm sending
you JSON versus something else.

00:23:04.642 --> 00:23:07.832
<v Amos Wenger>I see what you're doing
here: you just saying all that because

00:23:07.832 --> 00:23:09.692
Postcard is not self describing.

00:23:09.742 --> 00:23:11.332
I see what you're selling.

00:23:11.614 --> 00:23:12.524
<v James Munns>It's
certainly what I'm doing.

00:23:12.574 --> 00:23:13.254
Yeah, yeah.

00:23:13.693 --> 00:23:17.003
And I don't want to get it wrong:
self describing formats have benefits,

00:23:17.206 --> 00:23:21.356
but it's important for me to realize
the costs are a bit more nuanced.

00:23:21.806 --> 00:23:25.146
And yeah, this does exactly come from
Postcard because Postcard's non self

00:23:25.166 --> 00:23:30.185
describing and furthermore, it's very easy
if you were to send a message of one kind

00:23:30.185 --> 00:23:34.022
and to receive it in another way, there's
a good chance that it would succeed in

00:23:34.022 --> 00:23:36.682
deserialization, but it would be wrong.

00:23:36.902 --> 00:23:40.856
Like, if you swap two fields, and I
was just going integer integer, and

00:23:40.856 --> 00:23:43.566
you read those and those are both
integers, I might say: cool, the

00:23:43.576 --> 00:23:47.606
minutes are 57, and the seconds are 4,
when it's really the other way around.

00:23:48.066 --> 00:23:50.906
And this post is exactly me trying
to work that out, because at

00:23:50.906 --> 00:23:52.046
some point I went: you know what?

00:23:52.046 --> 00:23:54.746
Maybe as a non self describing
format, that isn't what I want.

00:23:54.806 --> 00:23:57.725
Maybe I've tried to push
performance too far, and the

00:23:57.725 --> 00:23:59.645
user experience is just not good.

00:23:59.875 --> 00:24:02.444
And I was actually  trying to
figure out how I could make a self

00:24:02.444 --> 00:24:06.504
describing format version of Postcard
and what that would get you, and...

00:24:06.644 --> 00:24:10.504
It was then that I realized that
the amount of changes you'd have

00:24:10.504 --> 00:24:14.848
to write in your code to handle
all of these edge cases was a lot.

00:24:15.008 --> 00:24:17.598
Where instead of just saying:
did I get the message or not?

00:24:17.608 --> 00:24:18.778
Did it decode or not?

00:24:18.835 --> 00:24:23.455
I have to start handling failable queries
and my accessors get much larger in

00:24:23.455 --> 00:24:27.390
terms of code and cost and things like
that, because I'm not running linearly

00:24:27.390 --> 00:24:28.790
through the encoding and decoding.

00:24:28.790 --> 00:24:31.828
I am storing it in some kind
of collection and querying it.

00:24:32.189 --> 00:24:34.209
A lot of this ended up to...

00:24:34.308 --> 00:24:37.418
I don't know if it's an invention or just
something that I didn't know about before,

00:24:37.418 --> 00:24:39.448
but actually sort of a middle ground.

00:24:39.458 --> 00:24:43.548
So not a non self describing
format or a self describing format.

00:24:43.973 --> 00:24:46.973
But what I've sort of been
calling a self identifying format.

00:24:47.273 --> 00:24:52.573
So you don't necessarily have all of
those field items in there, or you don't

00:24:52.573 --> 00:24:57.223
have the flexibility to skip fields
or remove fields or things like that.

00:24:57.603 --> 00:25:02.153
But instead, you at least have a unique
tag on a message so that our two senders

00:25:02.183 --> 00:25:05.723
can cross check with each other of
going, "Are you sending me an hour

00:25:05.733 --> 00:25:07.653
hour, minute minute, second second?"

00:25:07.763 --> 00:25:12.213
And you can say, "Yes, I am looking for an
hour hour, minute minute, second second."

00:25:12.433 --> 00:25:17.397
And the interesting part that I'm doing is
instead making the self describing schema

00:25:17.397 --> 00:25:19.647
part of these things a side channel.

00:25:20.097 --> 00:25:23.957
Where if you and I know that we agree with
each other, we just put that unique tag in

00:25:23.957 --> 00:25:25.907
every message so that we can cross check.

00:25:25.937 --> 00:25:28.247
But if all of a sudden I send
you something that you don't

00:25:28.247 --> 00:25:32.312
understand, you can go, "Wait a
minute, you just sent me tag FAB.

00:25:32.442 --> 00:25:33.732
What is the schema for FAB?

00:25:33.932 --> 00:25:35.282
I don't know what you're talking about."

00:25:35.472 --> 00:25:38.452
And maybe you could have some slower
failure path where you do fall back

00:25:38.452 --> 00:25:43.002
to a more self describing format where
you use the schema and the blob of

00:25:43.002 --> 00:25:46.472
message and get back something that
looks like a serde-json Value that

00:25:46.472 --> 00:25:49.069
you can query and maybe recover from.

00:25:49.079 --> 00:25:52.959
Or you just at least know up front: I'm
not going to misunderstand this message.

00:25:53.139 --> 00:25:56.899
I will just immediately reject it and
know that it's not what I'm looking for.

00:25:57.259 --> 00:25:59.769
That's sort of the whole research
arc that I went through and when

00:25:59.769 --> 00:26:03.809
I wrote this post a year ago, I
didn't know where I would end up.

00:26:03.839 --> 00:26:06.899
I went back and looked at it and
it's very funny to look at all these

00:26:06.948 --> 00:26:10.133
conclusions that I drew and went,
"Well, I don't know what you do in

00:26:10.133 --> 00:26:14.813
that case," and realizing sort of the:
what checks the boxes for what problems

00:26:14.833 --> 00:26:18.433
people with Postcard actually have,
which is detecting when things change,

00:26:18.603 --> 00:26:22.713
or realizing whether you need to use
a different format or at compile time,

00:26:22.713 --> 00:26:26.723
even having a CI check that says, "Did
I accidentally change the message?

00:26:26.963 --> 00:26:29.443
Is this going to break all
of my devices in the field?"

00:26:29.543 --> 00:26:31.698
<v Amos Wenger>A la
cargo-semver checks, yep.

00:26:32.118 --> 00:26:33.048
<v James Munns>Yeah, exactly.

00:26:33.398 --> 00:26:35.574
And that was sort of the
middle ground that I reached.

00:26:35.711 --> 00:26:39.851
I don't know if I want to pay for
self describing all of the time, but

00:26:39.851 --> 00:26:43.971
I want people to have the option to
detect it, because they should be,

00:26:44.391 --> 00:26:47.211
and then maybe a slow failure path...

00:26:47.244 --> 00:26:50.254
because you might have sort of
asymmetrical systems where our tiny

00:26:50.254 --> 00:26:53.064
embedded device, we don't want to
burden it with the ability to send

00:26:53.064 --> 00:26:56.914
15 different kinds of messages, but
our desktop server doesn't care if

00:26:56.914 --> 00:26:59.794
it's got 57 flavors of decoding.

00:26:59.994 --> 00:27:02.659
That's trivial for the application
that's running on your desktop.

00:27:02.897 --> 00:27:07.547
That allows your microcontroller to fly
while still your desktop is flying, but

00:27:07.547 --> 00:27:11.186
that's just because it's a thousand times
bigger and faster and you can afford the

00:27:11.186 --> 00:27:13.246
checks in one place, but not the other.

00:27:13.821 --> 00:27:15.821
<v Amos Wenger>So your conclusion is
that you don't have a conclusion yet.

00:27:15.831 --> 00:27:17.681
It's an open research problem.

00:27:18.031 --> 00:27:19.321
And this is mostly...

00:27:19.361 --> 00:27:21.051
you've been thinking
about all these things.

00:27:21.101 --> 00:27:21.951
Am I wrong?

00:27:22.396 --> 00:27:24.086
<v James Munns>It's, it's, it's, it's...

00:27:25.206 --> 00:27:27.026
I have something that works.

00:27:27.163 --> 00:27:27.613
<v Amos Wenger>Oh, you do?

00:27:28.151 --> 00:27:29.831
<v James Munns>So I've talked
about Postcard-RPC in the past.

00:27:29.831 --> 00:27:33.321
This is the trick that Postcard-RPC pulls:
to generate those unique tags for each

00:27:33.331 --> 00:27:38.601
kind of message I hash the schema and
use that as a small tag in every message

00:27:38.991 --> 00:27:42.911
and I have the ability to serialize
my schemas and so the system that I'm

00:27:42.911 --> 00:27:46.451
building now on top of Postcard-RPC
allows you to say, "Please give me all

00:27:46.451 --> 00:27:52.176
of your schemas," And I can get the full,
like, OpenAPI description type thing from

00:27:52.176 --> 00:27:58.026
the device and that allows my server to
handle messages it doesn't understand.

00:27:58.766 --> 00:28:00.356
You're still limited what
you can do with that.

00:28:00.356 --> 00:28:03.996
You're either querying specific things
or you just store it, or you forward it.

00:28:04.026 --> 00:28:08.667
Where I still can't tell the difference
of the key "temp" versus "temperature"

00:28:08.937 --> 00:28:11.567
but if I'm just sticking the message
in the database to pass it on to

00:28:11.567 --> 00:28:15.457
someone else later down the road that's
probably fine because they'll know.

00:28:15.902 --> 00:28:18.602
As an intermediary, I don't
necessarily have to understand

00:28:18.602 --> 00:28:20.184
everything that transits through me.

00:28:20.202 --> 00:28:21.582
So I have something that works.

00:28:21.622 --> 00:28:21.862
<v Amos Wenger>Right.

00:28:21.862 --> 00:28:24.782
But you could do that even if you
didn't have the schema, right?

00:28:24.782 --> 00:28:27.772
You could just like pass on the
bytes unchanged, but what the

00:28:27.772 --> 00:28:31.513
schema would let you do is, for
example, log it in a structured way.

00:28:31.637 --> 00:28:32.527
We got that message.

00:28:32.527 --> 00:28:34.557
We don't know what it's about,
but we know it has these fields.

00:28:34.577 --> 00:28:35.247
This one is text.

00:28:35.247 --> 00:28:38.177
So maybe if a human looks at it,
they can tell what's going on.

00:28:38.207 --> 00:28:40.907
<v James Munns>Yeah, either dumping
it to logs is great because you

00:28:40.907 --> 00:28:43.827
don't have to understand things
just to convert them to strings.

00:28:44.107 --> 00:28:46.737
You can validate that
messages are still good.

00:28:46.867 --> 00:28:49.247
Like: Oh, is this a poorly formed message?

00:28:49.277 --> 00:28:53.107
I'm not going to proxy this message
because I know it's poorly formed and

00:28:53.107 --> 00:28:56.327
I'm not going to waste the embedded
device's time with a bad message.

00:28:56.397 --> 00:29:01.660
Or you can transpile or transcode the
message to JSON because if i have the

00:29:01.660 --> 00:29:06.010
schema I can actually convert this
non self describing message into a

00:29:06.010 --> 00:29:09.500
self describing message because if
someone's getting this message from

00:29:09.500 --> 00:29:13.000
Python, maybe they would like JSON
more than they would like Postcard.

00:29:13.360 --> 00:29:15.060
And so those are all
sort of the capabilities.

00:29:15.320 --> 00:29:17.090
I need to do the follow through still.

00:29:17.250 --> 00:29:19.990
So I have all of this and it
does work and it's very neat

00:29:20.000 --> 00:29:21.380
and I'm very excited about it.

00:29:21.660 --> 00:29:26.380
We'll see if it ends up having edge cases
that I just haven't run into yet, but I

00:29:26.390 --> 00:29:28.560
think it's an interesting middle ground.

00:29:28.773 --> 00:29:32.073
Not all the way in one camp and not in
the other, but still checking the boxes

00:29:32.073 --> 00:29:36.552
we want to check while also not signing
us up for every message can be queryable

00:29:36.581 --> 00:29:38.472
because the message is still the message.

00:29:38.682 --> 00:29:41.502
You just know whether you have a
good one or a bad one at this point.

00:29:41.807 --> 00:29:44.889
<v Amos Wenger>I think it's
just another case or a similar

00:29:45.669 --> 00:29:48.257
concept to denormalization.

00:29:48.967 --> 00:29:49.847
No, normalization.

00:29:50.083 --> 00:29:51.373
I forget which direction goes where.

00:29:51.473 --> 00:29:56.747
So the basic idea is that: the
thing I have in mind is JSON API,

00:29:56.748 --> 00:30:01.957
because a bunch of APIs will return,
I don't know, a list of articles,

00:30:02.128 --> 00:30:03.417
and then there's your user field.

00:30:03.603 --> 00:30:05.523
And then they're all
articles from the same user.

00:30:05.573 --> 00:30:09.673
So they just duplicate the information
about that user for every article.

00:30:09.763 --> 00:30:12.093
They have 10 articles, they have
10 copies of the user object.

00:30:12.103 --> 00:30:16.633
And then what JSON API does is it
says: okay, that field is of type user.

00:30:16.913 --> 00:30:18.682
So actually, we're
going to give you an ID.

00:30:18.692 --> 00:30:20.383
It doesn't need to be
globally unique or anything.

00:30:20.393 --> 00:30:22.583
It just needs to be unique to
the document they're sending you.

00:30:23.063 --> 00:30:25.323
So you're going to have all the articles
and then separately, you're going to

00:30:25.333 --> 00:30:27.773
have an array, or I guess a map of users.

00:30:27.958 --> 00:30:30.058
from that ID to the actual user data.

00:30:30.448 --> 00:30:33.958
And then it's very, very annoying
to deserialize that or decode or

00:30:33.958 --> 00:30:37.468
like destructure it, especially in
Rust, which is very strongly typed,

00:30:37.508 --> 00:30:38.848
but in other languages, it's fine.

00:30:38.848 --> 00:30:40.168
You just cast everything to an object.

00:30:41.168 --> 00:30:42.378
That's, uh, that's how it works.

00:30:42.917 --> 00:30:46.137
That's, that's the thing
that the Patreon API...

00:30:46.618 --> 00:30:48.437
RIP, it's not, not been
maintained for a while.

00:30:48.747 --> 00:30:49.507
That's what they used.

00:30:49.508 --> 00:30:50.718
It makes me think of the same thing.

00:30:50.718 --> 00:30:56.378
I have the parallel in my head because in
JSON, yes, every object in an array where

00:30:56.378 --> 00:30:59.818
every object is the same type, they all
describe themselves exactly the same way.

00:31:00.138 --> 00:31:03.978
And it only is useful if, I guess, some
fields are missing in some of them.

00:31:04.398 --> 00:31:07.118
But I would much rather have one
description at the beginning of the

00:31:07.118 --> 00:31:09.518
array, and then just all the data at once.

00:31:09.618 --> 00:31:10.998
Which is not even what you're doing.

00:31:10.998 --> 00:31:12.524
You're doing a third option yet.

00:31:12.565 --> 00:31:14.753
<v James Munns>Yeah, but I think
there's a lot of value to having

00:31:14.763 --> 00:31:18.763
these schemas because you can start
doing transforms and things like that.

00:31:18.763 --> 00:31:21.553
One of the things that I also in
the future want to research is the

00:31:21.553 --> 00:31:25.209
ability to have automatic migrations
or whatever you want to call it.

00:31:25.209 --> 00:31:28.689
Where if someone sends you a message
of a different format that's missing a

00:31:28.689 --> 00:31:32.459
field, and I know the embedded device
expects that field, and I've got some

00:31:32.459 --> 00:31:36.689
metadata of going: well, we need to
make sure that we insert this extra

00:31:36.699 --> 00:31:39.089
field, at least with like a none in it.

00:31:39.179 --> 00:31:41.779
So it might just be: we don't
have the data, but I know that I'm

00:31:41.779 --> 00:31:44.349
always allowed to upgrade a message.

00:31:44.349 --> 00:31:47.559
If the only difference between
these schemas are this has

00:31:47.559 --> 00:31:48.559
an option field or not.

00:31:48.569 --> 00:31:52.449
And I know that option is defaultable
or nullable or whatever you want

00:31:52.449 --> 00:31:56.169
to call it, and I can just actually
transcode your binary message to a

00:31:56.169 --> 00:32:00.547
different binary message at the cost
of having to decode and reencode it.

00:32:00.927 --> 00:32:05.997
But like I said on my proxy that might not
be a significant cost, but allows me to

00:32:05.997 --> 00:32:10.107
then not have to upgrade all my firmware
devices to speak some new protocol or

00:32:10.107 --> 00:32:11.667
update all of my clients to change.

00:32:11.897 --> 00:32:16.327
You get sort of this interesting abstract
transform that you're able to do.

00:32:16.327 --> 00:32:19.687
But this is a little beside the
self identifying point and just the

00:32:19.687 --> 00:32:24.117
value of having schemas and being
able to have like a reflection.

00:32:24.277 --> 00:32:27.317
So by building all of this, just to
have the schemas so that I could hash

00:32:27.317 --> 00:32:32.552
them, I essentially had to invent
reflection for Postcard messages

00:32:32.712 --> 00:32:34.172
so that you could get the schemas.

00:32:34.232 --> 00:32:38.562
And now that I have those, I've started
figuring out all the interesting things

00:32:38.562 --> 00:32:44.050
you can do with reflection and thinking
of messages as transcodable formats

00:32:44.060 --> 00:32:48.040
instead of just saying: the message
is open ended and can change at any

00:32:48.040 --> 00:32:52.410
time, I have essentially reintroduced
strong types into my wire format.

00:32:52.808 --> 00:32:54.358
<v Amos Wenger>I'm very excited
to actually play with that.

00:32:54.643 --> 00:32:55.423
Because, yeah...

00:32:55.615 --> 00:32:59.755
we've gone from XML: which had some of
these things baked in it has schemas and

00:32:59.755 --> 00:33:05.235
everything, to JSON: which is anything
goes, fully self describing, to: okay,

00:33:05.235 --> 00:33:09.795
maybe we need schemas for JSON too, but
now they're out of band, to like: oh,

00:33:09.795 --> 00:33:12.845
we need a way to discover API endpoints,
so let's make a standard for that.

00:33:13.250 --> 00:33:15.150
And then that's also
just a bunch of JSONs.

00:33:15.170 --> 00:33:17.540
We also need schema for
those and it's all meta.

00:33:17.550 --> 00:33:18.370
Everything is JSON.

00:33:18.390 --> 00:33:20.890
It's all a big soup and it doesn't feel...

00:33:21.074 --> 00:33:21.564
I don't know.

00:33:21.887 --> 00:33:22.657
It doesn't feel good.

00:33:22.771 --> 00:33:23.241
<v James Munns>For sure.

00:33:23.301 --> 00:33:26.031
It's still an area of active research,
but I'm gonna be launching something

00:33:26.031 --> 00:33:28.991
pretty cool with it soon, so I'm
sure I'll talk about it more then.

00:33:35.258 --> 00:33:37.088
<v Amanda Majorowicz>Hey, it's the
end of the episode, and I'm here

00:33:37.088 --> 00:33:38.468
to tell you a bit about Descript.

00:33:39.038 --> 00:33:41.408
I'm Amanda, the producer
of Self-Directed Research.

00:33:41.558 --> 00:33:44.866
Descript is a tool we've been using to do
most of the production of each episode:

00:33:45.346 --> 00:33:49.396
editing audio and video has been a breeze
simply by editing the transcribed text,

00:33:49.426 --> 00:33:52.456
like in a document, inserting slides
exactly where I want them as super

00:33:52.486 --> 00:33:56.086
easy by dragging and dropping images
or videos and creating templates and

00:33:56.086 --> 00:33:59.536
layouts as well as cutting or combining
compositions makes the production

00:33:59.536 --> 00:34:01.426
of each episode, smooth and simple.

00:34:01.532 --> 00:34:03.182
There are many more features to explore.

00:34:03.182 --> 00:34:06.362
So check it out for yourself for free
by clicking the link in our show notes.

00:34:06.482 --> 00:34:09.092
And if you decide to upgrade to
a paid plan, a portion of the

00:34:09.092 --> 00:34:10.878
purchase will support this podcast.

00:34:11.118 --> 00:34:12.288
Thanks for your support.

