WEBVTT

NOTE
This file was generated by Descript <www.descript.com>

00:00:13.425 --> 00:00:15.185
<v Amanda Majorowicz>This
is Self-Directed Research.

00:00:15.445 --> 00:00:18.225
James and Amos, our hosts, get
really excited about different

00:00:18.225 --> 00:00:21.635
topics and each week they take turns
presenting their ideas to each other.

00:00:22.205 --> 00:00:25.745
Like the most recent episodes, you can
check out the website, YouTube, or Spotify

00:00:25.755 --> 00:00:31.685
to see the video presentation in action
and visit sdr-podcast.com/episodes for

00:00:31.685 --> 00:00:35.625
previous episodes, more presentations
and videos, show notes, and transcripts.

00:00:36.025 --> 00:00:37.985
New episodes are
published every Wednesday.

00:00:38.398 --> 00:00:41.878
This week, James presents "Frame
Synchronization: Just simple

00:00:41.878 --> 00:00:43.178
enough to cause problems."

00:00:43.378 --> 00:00:44.938
But first, a quick word from him.

00:00:45.898 --> 00:00:48.528
<v James Munns>The Self-Directed Research
Podcast is looking for sponsors.

00:00:48.568 --> 00:00:51.958
If you would like to promote your
company, project, conference, or open

00:00:51.958 --> 00:00:55.678
job positions, stay tuned to the end
of the episode and send us an email to

00:00:55.678 --> 00:00:59.308
contact@sdr-podcast.com for more info.

00:01:05.555 --> 00:01:09.482
This is one of those things that
like- that valley of it seems simple

00:01:09.482 --> 00:01:13.152
enough that everyone does it and
everyone gets all the little subtle

00:01:13.162 --> 00:01:16.562
things wrong and sometimes don't even
know the right name of terms, so...

00:01:17.312 --> 00:01:19.652
today I'm talking about
frame synchronization.

00:01:20.162 --> 00:01:25.472
Which is a big fancy word that is step
one of make computers talk to computers.

00:01:25.982 --> 00:01:30.171
And it actually is a fairly
straightforward thing, but the

00:01:30.171 --> 00:01:31.771
problem really boils down to:

00:01:32.241 --> 00:01:36.851
How do we decide where one message
ends and the next message starts?

00:01:37.631 --> 00:01:41.831
Which if you live in a world of TCP
or UDP sometimes you feel like this

00:01:41.831 --> 00:01:46.336
is very simple, but even in TCP
and when I'm talking about embedded

00:01:46.336 --> 00:01:50.735
stuff, usually things like serial
ports, it gets a little trickier.

00:01:51.402 --> 00:01:52.129
<v Amos Wenger>I already have a story.

00:01:52.349 --> 00:01:53.139
I'm sorry to interrupt.

00:01:53.149 --> 00:01:58.529
But I remember I did some work regarding
diffing and patching when I worked at

00:01:58.529 --> 00:02:01.736
itch.io, the game marketplace for indies.

00:02:01.798 --> 00:02:06.208
And it's so fun because if you
read papers from people working on

00:02:06.348 --> 00:02:10.584
string sorting algorithms, which are
mostly used in biocomputer science-

00:02:10.624 --> 00:02:12.984
<v James Munns>Like DNA
sequencing patterns and stuff?

00:02:13.444 --> 00:02:15.484
<v Amos Wenger>So what they do is they're
like, "Well, you see, you have an

00:02:15.494 --> 00:02:21.214
alphabet and then you just make up,
like, maybe your alphabet is like 256

00:02:21.214 --> 00:02:26.764
values and then you make up a 257th
symbol and that's your, your stopper.

00:02:26.824 --> 00:02:27.684
That's the end value.

00:02:27.684 --> 00:02:31.871
If you encounter 257th symbol,
then you know the sequence is over.

00:02:31.871 --> 00:02:33.161
And you're like, "But that's...

00:02:33.701 --> 00:02:34.921
that's not how bytes work.

00:02:35.981 --> 00:02:37.841
That works in the paper,
but not in my code!"

00:02:37.865 --> 00:02:39.688
<v James Munns>This is exactly
what we're gonna get into.

00:02:41.668 --> 00:02:44.911
So, we're talking about usually
a sender and a receiver.

00:02:45.061 --> 00:02:47.321
And a lot of times it ends up
being bidirectional, but we can

00:02:47.321 --> 00:02:48.761
look at just one of those first.

00:02:48.801 --> 00:02:50.921
And the metaphor, which is totally...

00:02:51.321 --> 00:02:53.621
I wanted to make it equally
out of date for everyone.

00:02:53.871 --> 00:02:55.781
So we're gonna be talking about telegrams.

00:02:56.141 --> 00:02:57.711
Or telegraphs, actually.

00:02:57.721 --> 00:03:02.071
So you have, you know, an office
in one city, and then a wire that

00:03:02.071 --> 00:03:05.751
goes to another city, and you've
got someone with Morse code on that.

00:03:06.071 --> 00:03:08.631
But these are people who don't have
cell phones, and they don't have  a

00:03:08.631 --> 00:03:10.561
landline that they can call each other.

00:03:10.801 --> 00:03:14.231
They literally only have the wire
that goes from the one office

00:03:14.231 --> 00:03:18.561
in Austin to the other office in
Chicago, or whatever cities you

00:03:18.561 --> 00:03:20.021
would like to use for the metaphor.

00:03:20.451 --> 00:03:23.711
But they only have the one wire
in between them, and they've got

00:03:23.801 --> 00:03:26.991
Morse code or something so they
can decode symbols from each other.

00:03:27.441 --> 00:03:30.791
But they want to act like,
sort of like a post office.

00:03:30.841 --> 00:03:33.401
People want to walk in off the
street and say, "I want to send

00:03:33.401 --> 00:03:35.281
a message to my buddy in Chicago.

00:03:35.736 --> 00:03:37.056
Here it is on a piece of paper.

00:03:37.056 --> 00:03:38.246
Please get it there."

00:03:38.766 --> 00:03:42.596
And we have to figure out how these
two people operate with each other

00:03:42.596 --> 00:03:43.966
and negotiate with each other.

00:03:44.436 --> 00:03:47.436
And when something goes wrong, if
one of them gets up for coffee and

00:03:47.436 --> 00:03:51.426
they come back and they sit down and
they missed the first half of the

00:03:51.436 --> 00:03:56.176
message, how do they figure out how
to get unconfused with hopefully the

00:03:56.176 --> 00:03:58.036
least collateral damage possible?

00:03:58.036 --> 00:04:02.932
Like, okay, at least I only lost one
message with my unplanned coffee break.

00:04:03.223 --> 00:04:04.963
And when we're trying to come up
with this, like you're saying,

00:04:04.963 --> 00:04:08.233
like in those papers, how do we
decide which of these solutions are

00:04:08.243 --> 00:04:10.152
good and which ones aren't good?

00:04:10.162 --> 00:04:14.132
Like which ones sound good, but in the
real world are practically- like in your

00:04:14.132 --> 00:04:16.682
example, where I have a 257th value.

00:04:16.882 --> 00:04:17.162
Okay.

00:04:17.162 --> 00:04:20.732
Now, well, every symbol has to be
two bytes now, which means we've just

00:04:20.882 --> 00:04:23.482
reduced our total efficiency by half.

00:04:23.522 --> 00:04:24.512
Is that good?

00:04:24.752 --> 00:04:26.322
Or is that maybe not so good?

00:04:26.322 --> 00:04:31.192
So when we talk about good,
there's efficiency- so like, how

00:04:31.192 --> 00:04:34.742
efficient are we at using all the
values we can send over the wire?

00:04:36.182 --> 00:04:40.142
There's our robustness, so what
happens if a message gets lost?

00:04:40.562 --> 00:04:43.982
Like we walk out of the room or someone
cuts the wire or the connection 's bad.

00:04:44.392 --> 00:04:45.922
What if the data is corrupted?

00:04:46.162 --> 00:04:49.582
So like, what if we get some weird
radio noise and it makes it sound

00:04:49.592 --> 00:04:51.182
like a message is something else.

00:04:51.432 --> 00:04:52.652
Or desynchronization.

00:04:52.752 --> 00:04:55.352
This is sort of that, like: you
walk out of the room and you walk

00:04:55.352 --> 00:04:58.292
back in halfway through and you
go, "Wait, I don't know if...

00:04:58.502 --> 00:04:59.442
how many did I lose?

00:04:59.442 --> 00:05:00.582
Am I halfway through a message?

00:05:00.602 --> 00:05:01.852
When's the end of this one?"

00:05:02.042 --> 00:05:03.602
How do I get back on track?

00:05:04.392 --> 00:05:05.592
And simplicity.

00:05:05.622 --> 00:05:10.462
Like the least complexity we can get away
with is usually great because there's

00:05:10.462 --> 00:05:12.402
less places for things to go wrong.

00:05:12.642 --> 00:05:15.462
You can have like a very complex
setup but it means that now you have

00:05:15.462 --> 00:05:19.032
to think about all this failure modes
and this tree of failures that I could

00:05:19.032 --> 00:05:21.132
possibly have, and deal with that.

00:05:21.932 --> 00:05:23.202
So we're back on our telegraph.

00:05:23.202 --> 00:05:25.012
We have one sender and one receiver.

00:05:25.012 --> 00:05:28.642
And most importantly, they
don't control the content of

00:05:28.642 --> 00:05:30.002
the messages that they send.

00:05:30.272 --> 00:05:32.492
People are going to walk in
off the street and want to send

00:05:32.492 --> 00:05:34.522
whatever they want to send.

00:05:34.822 --> 00:05:37.947
And you can't really trust the
people walking in off the street

00:05:38.197 --> 00:05:41.717
aren't trying to mess with you just
because that's what they find fun.

00:05:42.007 --> 00:05:43.167
Because people do that for...

00:05:43.517 --> 00:05:43.807
fun.

00:05:44.403 --> 00:05:48.699
The only thing we can control is how
we    as telegraph operators have

00:05:48.699 --> 00:05:52.369
agreed to send these messages back
and forth- the framing, how do we

00:05:52.369 --> 00:05:56.869
package these messages and encode
them and decode them over the wire.

00:05:57.079 --> 00:06:00.679
You and I can decide on that because
we went to a convention somewhere at

00:06:00.679 --> 00:06:04.609
some point and decided and wrote down
these rules on a piece of paper but we

00:06:04.609 --> 00:06:09.379
can't make changes to people's messages
to just fit how we're feeling today.

00:06:10.909 --> 00:06:14.299
So, normally there's two main
ways that you can do this.

00:06:14.299 --> 00:06:18.109
You can either do what's called "in band"
signaling or "out of band" signaling.

00:06:18.574 --> 00:06:22.504
Out of band signaling could be something
like: we have two wires, and only you and

00:06:22.504 --> 00:06:23.954
I are allowed to use that second wire.

00:06:23.954 --> 00:06:28.154
We use that wire to just beep
every single time we're done.

00:06:28.754 --> 00:06:30.714
Or, we do something that's illegal.

00:06:30.884 --> 00:06:34.674
Like, we hold down the button
for way longer than we should.

00:06:34.694 --> 00:06:37.884
Like, someone can't walk in and tell me
to hold down the button for ten seconds.

00:06:38.131 --> 00:06:42.731
But I could decide to hold down
the button in a way that people off

00:06:42.731 --> 00:06:44.291
the street aren't allowed to do.

00:06:44.821 --> 00:06:47.761
And these go back to those efficiencies
of like: well, now I have to run

00:06:47.781 --> 00:06:52.291
two wires or now I have to spend
a bunch of time holding that down.

00:06:52.291 --> 00:06:53.581
And that gets into our trade offs.

00:06:53.581 --> 00:06:58.571
But these are things that are
illegal for the message to have

00:06:58.581 --> 00:07:03.271
in them, but are legal for you and
I, as the operators, to decide on.

00:07:03.281 --> 00:07:07.471
This is like the ethernet frame does
something that you're not allowed to

00:07:07.481 --> 00:07:10.717
have in the payload of the ethernet
value or something like that.

00:07:12.297 --> 00:07:16.732
So these are like, common
solutions that I see and how

00:07:16.742 --> 00:07:18.222
they're actually not very good.

00:07:18.252 --> 00:07:21.212
But if you haven't thought of how
all of the ways that they can fail,

00:07:21.472 --> 00:07:24.302
how you might think they are good,
but they are perhaps not so good.

00:07:25.182 --> 00:07:29.922
So one of them is just line
silence or holding down the button

00:07:29.922 --> 00:07:30.862
for a certain amount of time.

00:07:30.862 --> 00:07:33.782
You say, "Look, between every
message, we're going to sit

00:07:33.782 --> 00:07:35.292
quietly for five seconds.

00:07:35.322 --> 00:07:40.097
And I promise that while I'm tapping out
that message to you in Morse code, I'm

00:07:40.097 --> 00:07:43.527
good enough at Morse code that I'm never
gonna pause for more than five seconds."

00:07:44.007 --> 00:07:48.137
And if you hear a silence of five seconds,
you go, "Rip off that piece of paper,

00:07:48.137 --> 00:07:51.827
that message is done, get the next piece
of paper, I am ready for the next one."

00:07:52.447 --> 00:07:54.047
And this is really great
because it's simple.

00:07:54.257 --> 00:07:55.177
You don't have to think that much.

00:07:55.177 --> 00:07:58.347
You go: if it's quiet for five seconds,
I have a little stopwatch or something.

00:07:58.637 --> 00:08:00.967
If it's quiet for five
seconds, we know it's done.

00:08:01.762 --> 00:08:03.682
And it's not something
that someone can affect.

00:08:03.682 --> 00:08:06.302
You can't write in your message,
"Please pause for five seconds."

00:08:06.302 --> 00:08:07.852
Cause I go, "No, you can't ask me that.

00:08:07.852 --> 00:08:09.362
You can give me the data
that I'm going to send.

00:08:09.362 --> 00:08:10.922
You can't tell me how to send it."

00:08:12.302 --> 00:08:15.522
Now it's, you know, somewhat efficient?

00:08:15.592 --> 00:08:19.372
You know, it's not that long, depending
on how long you need it to be, it could

00:08:19.382 --> 00:08:22.149
be less efficient, but it's fairly robust.

00:08:22.269 --> 00:08:26.780
Like it's something that we don't have
to worry about: as long as you promise

00:08:26.810 --> 00:08:28.760
that the sender can never get distracted.

00:08:29.000 --> 00:08:33.370
If someone walks in and talks to me while
I'm typing out this message, can I ever

00:08:33.370 --> 00:08:35.350
be distracted for more than five seconds?

00:08:35.590 --> 00:08:39.830
Because if I can, then we have to
increase that time for it to be suitable.

00:08:40.420 --> 00:08:43.250
This sounds weird for a telegraph
operator, but if we're talking about a

00:08:43.250 --> 00:08:47.380
microcontroller or an embedded system:
if we have hardware interrupts or someone

00:08:47.380 --> 00:08:51.910
presses a button or it's time to draw
the screen, how do I guarantee that I'm

00:08:51.910 --> 00:08:56.260
never distracted from receiving messages
for more than the amount of time?

00:08:56.670 --> 00:09:01.470
And do I actually have a stopwatch that
I can accurately measure that time so

00:09:01.470 --> 00:09:05.520
that if I'm counting to five and I only
count to 4.9 because I got a little

00:09:05.520 --> 00:09:09.440
distracted and you think you've ended a
message and I don't think you've ended

00:09:09.440 --> 00:09:11.350
a message, we can get out of sync.

00:09:11.410 --> 00:09:16.120
So we do need maybe a longer amount of
time than you might think, just so that if

00:09:16.120 --> 00:09:21.250
something does go wrong or weird, or I get
distracted, it's not too little of time.

00:09:21.360 --> 00:09:25.127
So this one's maybe okay, but also
sometimes it requires some hardware

00:09:25.127 --> 00:09:29.913
capability that is more challenging
than you might think to do accurately.

00:09:30.218 --> 00:09:30.458
<v Amos Wenger>Right.

00:09:30.458 --> 00:09:33.583
And to make it more reliable,
you have to sacrifice efficiency.

00:09:34.315 --> 00:09:37.658
<v James Munns>You make it longer
so that you go, "Well, I promise

00:09:37.758 --> 00:09:41.198
I will never be distracted for
more than 200 milliseconds."

00:09:41.488 --> 00:09:41.658
<v Amos Wenger>Yeah.

00:09:41.658 --> 00:09:41.818
Yeah.

00:09:41.818 --> 00:09:41.988
It's

00:09:41.988 --> 00:09:42.788
<v James Munns>easier to imagine

00:09:42.808 --> 00:09:44.728
<v Amos Wenger>with smaller
values 'cause like yeah.

00:09:44.728 --> 00:09:47.998
The telegraph example
shows its limits here.

00:09:48.208 --> 00:09:50.248
But if you're thinking of
like, it's, I dunno, it's four

00:09:50.248 --> 00:09:51.208
milliseconds or something.

00:09:51.258 --> 00:09:51.378
Yeah.

00:09:51.378 --> 00:09:52.972
I can see them getting
stuck for that long.

00:09:53.135 --> 00:09:53.405
<v James Munns>Yeah.

00:09:53.723 --> 00:09:57.063
So the other one, and this is the one
that I see most common when people roll

00:09:57.063 --> 00:09:59.223
their own is they use a length header.

00:09:59.553 --> 00:10:04.094
So I might say, "Look, I'm going to count
the number of words in this message.

00:10:04.634 --> 00:10:08.704
And before I send the message, I'm
going to send the number of words,"

00:10:08.914 --> 00:10:13.604
and that way you can count how many
words you receive and you go: cool!

00:10:13.694 --> 00:10:17.694
He said 200, and then I count my
messages, and when I get to the 200th

00:10:17.704 --> 00:10:21.071
word I rip the piece of paper off the
sheet and I know that I am done and

00:10:21.071 --> 00:10:22.821
now I start listening for the next one.

00:10:23.245 --> 00:10:26.465
But let's say I went to the bathroom
and I come back and we're in the

00:10:26.475 --> 00:10:27.923
middle of receiving a message.

00:10:27.923 --> 00:10:31.373
You are sending me a message about
how many horses are being sent on the

00:10:31.373 --> 00:10:34.293
train to Chicago today or whatever.

00:10:34.703 --> 00:10:36.393
And I go "Shoot, shoot, shoot, shoot.

00:10:36.723 --> 00:10:37.523
Uh, cool.

00:10:37.543 --> 00:10:38.233
I need to get back.

00:10:38.233 --> 00:10:38.833
I need to get back...

00:10:38.853 --> 00:10:40.413
uh, let me just listen for a number.

00:10:40.413 --> 00:10:43.393
I'm- I'm back I was distracted, but
i'm listening for a number," and you're

00:10:43.393 --> 00:10:45.243
talking about sending 200 horses.

00:10:45.773 --> 00:10:48.733
And it's the last sentence
of the message and I go, "Oo!

00:10:48.973 --> 00:10:50.263
Oh, they said 200!"

00:10:50.403 --> 00:10:52.293
So I write down 200 and
I start counting 200.

00:10:52.343 --> 00:10:54.293
There's four words left in the message.

00:10:54.753 --> 00:11:00.258
So now, I've started counting 200, we
get to the four words in your message.

00:11:00.508 --> 00:11:02.908
Then there's some silence while we're
waiting for you to send the next

00:11:02.908 --> 00:11:04.478
message, you start counting the next one.

00:11:05.108 --> 00:11:08.518
And I get to the end of the 200, and
you're still talking about someone

00:11:08.518 --> 00:11:13.198
else's message, and I've totally lost
this one, I've become desynchronized,

00:11:13.748 --> 00:11:15.681
and everything is terrible.

00:11:16.251 --> 00:11:19.651
So this is one of those things that
I see people do when they have TCP

00:11:19.651 --> 00:11:23.451
with TLS, where you can guarantee
that all the messages will be there.

00:11:23.451 --> 00:11:26.941
You've got an operating system, buffering
messages for you and things like that.

00:11:27.581 --> 00:11:30.871
And if you have TCP and
TLS, it's probably fair.

00:11:30.921 --> 00:11:34.981
If you lost some messages, your TCP
connection is just going to fail.

00:11:35.031 --> 00:11:37.731
And TLS is going to go,
"Oops, we're out of sync."

00:11:38.071 --> 00:11:40.801
And you'll either recover
with TCP or you won't.

00:11:40.821 --> 00:11:43.181
And the connection will
be reset or whatever.

00:11:43.916 --> 00:11:45.896
But even with TCP with no TLS...

00:11:46.576 --> 00:11:50.306
there's not that much- like
TCP is a reliable protocol.

00:11:50.896 --> 00:11:54.596
But TCP itself only
has a 16 bit CRC on it.

00:11:54.876 --> 00:11:57.966
The chances of something
going wrong is not impossible.

00:11:58.256 --> 00:12:03.946
And if that 200 becomes 201 because
you have a bit error, all of the sudden

00:12:03.956 --> 00:12:05.916
now you are totally desynchronized.

00:12:05.916 --> 00:12:09.306
And I am listening into your next
message and all of a sudden it's

00:12:09.306 --> 00:12:11.316
not a number and I am very confused.

00:12:11.726 --> 00:12:14.856
Or I could lose a byte or- you know,
all these ways that things could go

00:12:14.856 --> 00:12:19.376
wrong, if you don't have a really
rock solid medium like TCP and TLS.

00:12:19.376 --> 00:12:22.656
On a serial port, if you get a
little glitch and all of a sudden

00:12:22.656 --> 00:12:25.126
you lose a byte: what do you do?

00:12:25.146 --> 00:12:25.796
How do you recover?

00:12:25.796 --> 00:12:29.996
So this length header is very efficient
because it only takes, you know, us

00:12:30.026 --> 00:12:32.666
putting the length that we're going
to put at the front of the message.

00:12:33.166 --> 00:12:35.526
But once we get out of sync, we're done.

00:12:35.616 --> 00:12:40.806
And trying to figure out what is a data
number, and what is a header number

00:12:41.046 --> 00:12:46.326
is tremendously hard to figure out if
you only have a length header prefix.

00:12:46.536 --> 00:12:51.026
<v Amos Wenger>Which is why you always add
a four byte magic number in your header.

00:12:51.716 --> 00:12:52.216
<v James Munns>It's true!

00:12:52.286 --> 00:12:53.266
And you can reduce the false positives...

00:12:53.955 --> 00:12:58.311
but there's always a chance that someone
could come in and go, "Okay, I'm going

00:12:58.311 --> 00:13:04.281
to send a message with just a number
followed by the magic word over and over

00:13:04.281 --> 00:13:07.991
and over and over and over and over again,
and see how much I can ruin your day."

00:13:09.921 --> 00:13:11.361
<v Amos Wenger>Some people
do that on purpose for fun.

00:13:11.361 --> 00:13:13.511
I think it's called a queen, a quine.

00:13:13.521 --> 00:13:14.431
What, how's it pronounced?

00:13:14.451 --> 00:13:14.803
Like a-

00:13:15.021 --> 00:13:18.201
<v James Munns>Oh, well, quines are when you
make a program that encodes a different

00:13:18.201 --> 00:13:19.761
program that eventually wraps itself.

00:13:19.771 --> 00:13:23.841
I was thinking of phreaking, like,
P H phreaking, like phone phreaking.

00:13:23.861 --> 00:13:25.861
The people who had you know,
they found a whistle...

00:13:26.391 --> 00:13:29.851
This is in band signaling, like when
you have a phone and you play a certain

00:13:29.871 --> 00:13:34.531
tone that signals the end of the call,
or 'stop billing,' and you can just

00:13:34.531 --> 00:13:37.251
play that little tone in there and
all of a sudden the hardware in the

00:13:37.251 --> 00:13:38.911
central office goes, "Oh, stop billing.

00:13:38.941 --> 00:13:39.441
Okay!"

00:13:40.201 --> 00:13:42.641
That's in band signaling,
which we don't love.

00:13:42.826 --> 00:13:45.446
<v Amos Wenger>Some people could
do it by hand, like with no

00:13:45.456 --> 00:13:46.986
hardware, no, physical whistle.

00:13:47.016 --> 00:13:51.566
I was thinking of the files that are
like valid PDF, but also a valid PNG

00:13:51.566 --> 00:13:54.776
and also a Linux executable and also
something like, that's very fun.

00:13:54.786 --> 00:13:59.026
You can- yeah, you could
just accidentally, sort of on

00:13:59.026 --> 00:14:01.856
purpose, use other people's
magic numbers because it's fun.

00:14:03.021 --> 00:14:05.141
<v James Munns>Trusting-
trusting user-provided data

00:14:05.281 --> 00:14:07.241
is always spicy and fun.

00:14:07.656 --> 00:14:10.936
<v Amos Wenger>Yes, Amanda, please,
please edit in the honk sound

00:14:10.986 --> 00:14:13.276
from, uh, Untitled Goose Game here.

00:14:14.811 --> 00:14:15.551
<v James Munns>Exactly.

00:14:17.332 --> 00:14:20.482
So, the other one that I see people
use, and this is like common back to

00:14:20.482 --> 00:14:23.592
the dial up days, is using a flag word.

00:14:23.842 --> 00:14:27.832
So we decide a certain word means
'end of message,' but like in your

00:14:27.842 --> 00:14:33.222
DNA example, if we're sending bytes,
we only have 256 different values.

00:14:33.827 --> 00:14:36.657
And someone might want
to include that value.

00:14:36.667 --> 00:14:40.077
Like, if we're sending ASCII,
there's probably some ASCII control

00:14:40.077 --> 00:14:42.297
character we can go: Okay, well,
there's literally a character

00:14:42.337 --> 00:14:44.787
called ETX for end of transmission.

00:14:45.087 --> 00:14:48.877
Which, if you're just sending ASCII
characters and you agree on the rules

00:14:48.877 --> 00:14:52.387
that someone walking in off the street
can only send ASCII characters: cool,

00:14:52.387 --> 00:14:56.867
you've got all 128 reserved bytes
that you're allowed to use- or, excuse

00:14:56.867 --> 00:15:00.919
me- 128  characters or whatever that
you're allowed to use and then half

00:15:00.949 --> 00:15:03.869
that you're not allowed to use because
they have special semantical meaning.

00:15:04.409 --> 00:15:09.009
It does mean that our efficiency takes
a hit because we're very rarely sending

00:15:09.019 --> 00:15:13.499
something in that back half, you know, we
lose a bit of efficiency, maybe it's fine.

00:15:13.889 --> 00:15:18.759
But this all breaks down when we leave
the world of plain text and we want to

00:15:18.759 --> 00:15:23.509
send a photo or a video where the color
values that we're actually sending across

00:15:23.509 --> 00:15:26.359
the network here can be the full range.

00:15:26.359 --> 00:15:28.349
Like it can be any value of a byte.

00:15:28.629 --> 00:15:32.999
And so you might accidentally send a
picture that's all brat green, and it

00:15:32.999 --> 00:15:37.539
turns out that brat green has exactly the
same value as ETX or something like that,

00:15:37.779 --> 00:15:42.409
and you're sending a bitmap across the
network that is "Oops, all ETX values."

00:15:42.879 --> 00:15:43.079
<v Amos Wenger>I'm sorry.

00:15:43.099 --> 00:15:44.709
Is this, is this a technical term?

00:15:45.179 --> 00:15:45.769
Like brat??

00:15:45.781 --> 00:15:45.911
Okay.

00:15:46.181 --> 00:15:46.991
<v James Munns>I don't know!

00:15:47.084 --> 00:15:47.424
<v Amos Wenger>I'm sorry.

00:15:47.614 --> 00:15:48.764
I didn't keep myself up to date.

00:15:48.784 --> 00:15:49.674
I just, I'm sorry.

00:15:49.759 --> 00:15:51.419
<v James Munns>Do I look
like Pantone to you?

00:15:52.165 --> 00:15:54.755
But yeah, so, I mean, the normal way
that you say it is you go: Okay, well,

00:15:54.785 --> 00:15:57.955
ETX is our end of transmission word.

00:15:58.195 --> 00:16:01.495
And if it happens to show up in
your message, tell you what, you're

00:16:01.495 --> 00:16:03.035
going to escape that message.

00:16:03.035 --> 00:16:04.935
You're going to send the character DLE.

00:16:05.210 --> 00:16:08.460
Or data something escape before that.

00:16:08.830 --> 00:16:11.967
And this is kind of like putting
backslashes before new lines in

00:16:11.967 --> 00:16:13.127
a text or something like that.

00:16:13.127 --> 00:16:16.207
You say like, "Ignore the next
thing that I'm going to send.

00:16:16.387 --> 00:16:20.227
It's a normal data byte, not a special
meta character," or something like that.

00:16:21.077 --> 00:16:24.217
But then the problem becomes: well,
okay, well, if you're sending that

00:16:24.227 --> 00:16:29.967
message that's all that escape character,
you now have half efficiency because

00:16:29.977 --> 00:16:32.797
every time you send it you have to
send the escape character behind it.

00:16:33.257 --> 00:16:37.647
And then what do I do if the two
data bytes are "Ignore escape

00:16:37.667 --> 00:16:39.567
character, end of message."

00:16:39.567 --> 00:16:42.167
And then you have to put more-  it's
like that problem where you just keep

00:16:42.167 --> 00:16:43.947
adding more backslashes until it works.

00:16:44.457 --> 00:16:48.592
And if you have a malicious
attacker, they can blow up, "Okay.

00:16:48.592 --> 00:16:52.744
Well, you said that I can send
a hundred data characters for a

00:16:52.744 --> 00:16:54.004
dollar," or something like that.

00:16:54.214 --> 00:16:54.664
"Cool.

00:16:54.774 --> 00:16:57.894
The hundred that I pick are the
worst nightmare ones for you."

00:16:57.894 --> 00:17:04.074
And all of a sudden you're sending
five, 600 bytes of transfer for 100

00:17:04.094 --> 00:17:06.594
bytes of actual data across the wire.

00:17:06.989 --> 00:17:07.179
<v Amos Wenger>Mm hmm.

00:17:07.594 --> 00:17:10.414
<v James Munns>And so this generally
works as long as you have, you know,

00:17:11.034 --> 00:17:14.574
scrambled data or whatever, if it's
encrypted or whatever, and you're

00:17:14.614 --> 00:17:18.764
unlikely to get a run of exactly the
same characters all over the place.

00:17:19.114 --> 00:17:23.664
Sure, you can maybe make it work, but it's
either unbounded or an incredibly high

00:17:23.664 --> 00:17:26.924
bound of potential worst case going on.

00:17:27.354 --> 00:17:31.214
And again, if you're on an embedded system
and you need enough room to buffer all of

00:17:31.214 --> 00:17:34.577
the bytes that you're going to receive,
you got to think about that because either

00:17:34.577 --> 00:17:37.987
you have to decode everything one byte
at a time so you can do the decoding,

00:17:38.487 --> 00:17:41.517
or you need to DMA a huge chunk of data.

00:17:41.537 --> 00:17:46.007
But if your worst case is eight times
bigger than your potential data transfer,

00:17:46.277 --> 00:17:50.127
then that's a lot of wasted space for
something that almost never happens.

00:17:52.027 --> 00:17:55.037
So these are all the ones that
I see people commonly use.

00:17:55.077 --> 00:17:55.917
These are the, "Ah!

00:17:55.937 --> 00:17:57.567
I know a trick to solve this."

00:17:57.797 --> 00:18:00.437
And then you end up with people where
you go: well, what happens if you reboot

00:18:00.467 --> 00:18:03.657
one device cause you did a firmware
update halfway through a conversation?

00:18:03.987 --> 00:18:06.977
Now they're out of sync and these
two devices can never get to each

00:18:06.977 --> 00:18:12.100
other or they get accidental buffer
overflows because: oops, my decoding

00:18:12.110 --> 00:18:15.340
used eight times more of the
space that I thought I could use.

00:18:16.470 --> 00:18:18.600
<v Amos Wenger>One problem I didn't,
I thought about, but I forgot to

00:18:18.610 --> 00:18:21.250
bring up with the length header
is that you don't always know the

00:18:21.250 --> 00:18:22.380
length of the message you're sending.

00:18:22.400 --> 00:18:25.750
I imagine in the embedded world,
most of the time you do, but

00:18:25.900 --> 00:18:28.590
I, I'm, I am now thinking about
HTTP cause you forced me to.

00:18:28.920 --> 00:18:32.330
Also, cause I've been working on
this implementation and we have,

00:18:32.340 --> 00:18:35.660
you know, we have chunked transfer
encoding and you don't know,

00:18:35.660 --> 00:18:36.950
you're just sending bits at a time.

00:18:36.950 --> 00:18:38.500
So, you know, the length of
the chunk, we don't know the

00:18:38.500 --> 00:18:39.480
length of the total message.

00:18:39.480 --> 00:18:41.480
And then you have to do
something even fancier.

00:18:42.070 --> 00:18:42.250
<v James Munns>Yeah.

00:18:42.250 --> 00:18:45.774
And these frames are chunks, but
it's still a problem of like, you

00:18:45.774 --> 00:18:49.654
can't opportunistically send things
sometimes, if you don't know the

00:18:49.654 --> 00:18:52.164
size of the chunk, which might
be variable and things like that.

00:18:52.164 --> 00:18:56.859
So it also goes back to efficiency of
like: you have to be able to buffer

00:18:56.859 --> 00:19:00.519
up some amount so that you can count
how much you buffered so that you can

00:19:00.519 --> 00:19:05.399
send it versus a totally asynchronous,
like, stream of bytes, for example.

00:19:06.963 --> 00:19:10.213
So we have enough time to get into my
favorite one, which is still from the

00:19:10.213 --> 00:19:15.193
days of dial up, but more recent dial up,
like early 90s or something like that.

00:19:15.443 --> 00:19:19.713
And it's called COBS, or
consistent Overhead Byte Stuffing.

00:19:20.193 --> 00:19:26.133
Which is actually a funny combination
of most of the previous ones, but it

00:19:26.133 --> 00:19:29.323
allows you to do things in a fairly-
like it's a little more complex

00:19:29.323 --> 00:19:35.303
than any of them, but has almost
no totally awful, pessimal case.

00:19:35.726 --> 00:19:38.976
So I'm just going to run down
some of these rules and we'll

00:19:38.976 --> 00:19:41.826
get to the point where we have a
total rule between the two of us.

00:19:42.036 --> 00:19:46.036
And now we're back to talking about
real world 8 bit bytes instead

00:19:46.036 --> 00:19:47.756
of, you know, telegraph words.

00:19:48.351 --> 00:19:50.161
<v Amos Wenger>Do I get a
guess as to how it works?

00:19:50.177 --> 00:19:53.553
Knowing that I've never-
the, acronym is is familiar.

00:19:53.947 --> 00:19:56.477
I'm pretty sure I read about this before,
but I've already forgotten all about it.

00:19:56.487 --> 00:19:59.967
Do they just change- like, does the-
is there an escape character, but it

00:19:59.977 --> 00:20:01.927
depends on the position in the stream?

00:20:02.057 --> 00:20:05.697
So that you can't, like, the
pessimal case is never really hit?

00:20:06.097 --> 00:20:06.837
Cause it changes around?

00:20:07.577 --> 00:20:08.027
<v James Munns>Sort of...

00:20:08.327 --> 00:20:09.657
but okay, let me explain it.

00:20:09.657 --> 00:20:14.607
So, first rule: a null byte
always means end of message.

00:20:14.867 --> 00:20:18.387
If you send a zero on
the wire, that's the end.

00:20:18.417 --> 00:20:21.957
That's the, you know, perforated
edges, we are now done, tear off the

00:20:21.957 --> 00:20:24.227
old message, go with the new one.

00:20:24.787 --> 00:20:28.117
So then you say, well, what if I want
to send a lot of zero data bytes?

00:20:28.117 --> 00:20:30.417
So the first rule is a zero
always means the end of message.

00:20:30.447 --> 00:20:32.957
And whenever you give me a user
message, I'm going to stick a

00:20:32.957 --> 00:20:34.947
zero at the end of your message.

00:20:34.947 --> 00:20:38.677
I'm going to append a zero to
the end so that I know that

00:20:38.687 --> 00:20:40.297
that is the end of the message.

00:20:41.067 --> 00:20:45.077
But now I need to figure out: what
do I do with all these zeros inside

00:20:45.077 --> 00:20:48.287
of the message or the potentially
the zeros inside of the message?

00:20:48.977 --> 00:20:52.717
Well, what I'm going to do is I'm
going to take my zero at the end

00:20:52.727 --> 00:20:54.597
and I'm going to work backwards.

00:20:55.157 --> 00:21:00.642
And anytime I run into a zero, I'm
going to count how many bytes it's been

00:21:00.642 --> 00:21:04.992
since the zero, like working backwards,
and I'm going to replace the zero with

00:21:05.002 --> 00:21:08.452
how many bytes until the next zero.

00:21:09.422 --> 00:21:13.392
And then when I go back to the next
user zero, I'm going to be counting

00:21:13.392 --> 00:21:18.422
again from that new position and I'm
going to count how many bytes it would

00:21:18.432 --> 00:21:20.892
be to the next zero that got replaced.

00:21:21.162 --> 00:21:24.622
So essentially what I'm doing
is I'm making a linked list of

00:21:24.692 --> 00:21:26.732
all the zeros in my message.

00:21:27.412 --> 00:21:30.002
And I work all the way until I
get to the front of the message

00:21:30.342 --> 00:21:31.712
and I put the number there.

00:21:32.162 --> 00:21:33.892
So let's say I put 14 there.

00:21:34.582 --> 00:21:36.232
So I will start with 14.

00:21:36.392 --> 00:21:40.032
I know that's a header byte
I'm going to take the next 13

00:21:40.042 --> 00:21:41.722
bytes as normal data bytes.

00:21:42.392 --> 00:21:46.412
And when I get to that 14th byte, it's
going to be a number that's not zero.

00:21:46.722 --> 00:21:47.662
Let's say it's eight.

00:21:47.922 --> 00:21:49.912
That's where a zero should have been.

00:21:50.562 --> 00:21:54.462
So the data I'm going to replace there
with a zero, and I'm going to start

00:21:54.462 --> 00:21:56.622
counting again until I get to eight.

00:21:57.172 --> 00:22:00.502
And if you ever are counting down
like this and you hit a zero,

00:22:00.682 --> 00:22:03.502
you know that you've reached
the real end of the message.

00:22:03.820 --> 00:22:06.415
<v Amos Wenger>And it also means you,
you can tell the boundaries of the

00:22:06.415 --> 00:22:09.175
messages without knowing any of
this, just by looking for zeros.

00:22:09.235 --> 00:22:09.505
Yeah.

00:22:09.555 --> 00:22:12.205
<v James Munns>Yep, before you
actually start decoding this, you

00:22:12.205 --> 00:22:15.655
can just keep buffering until you
get a zero and then you go, "Ah,

00:22:15.675 --> 00:22:17.415
now it is time to do something."

00:22:17.835 --> 00:22:21.795
And so you go look at the first byte,
you walk forward, and essentially this is

00:22:21.795 --> 00:22:23.765
also a little bit of an integrity check.

00:22:24.065 --> 00:22:28.105
If you hit the zero when you were counting
but that's not where you expected,

00:22:28.375 --> 00:22:30.995
you know that you've lost something.

00:22:31.435 --> 00:22:34.995
And it means that you go, "Welp, nope,
that was a poorly framed piece of data."

00:22:36.225 --> 00:22:41.065
And the nice thing is, is in this case,
you have at least two bytes of overhead.

00:22:41.315 --> 00:22:44.675
You've got the zero you're sticking
at the end, and you've got that number

00:22:44.675 --> 00:22:45.865
that you're sticking at the front.

00:22:46.995 --> 00:22:51.195
If you sent all zeros, if you were trying
to be an ass and sent all zeros here,

00:22:51.385 --> 00:22:55.245
that's actually perfect for me because
then every data by just becomes a one.

00:22:55.325 --> 00:22:57.615
It just goes link, link, link, link,
link, link, link, link, link, link.

00:22:57.615 --> 00:23:01.145
The worst case is actually
you send no zeros.

00:23:01.590 --> 00:23:03.160
And this is the sort of final rule.

00:23:03.380 --> 00:23:08.020
If you ever get to- I forget the number,
it's like 254, you don't go all the way

00:23:08.020 --> 00:23:13.530
to 255- but if you get a 254 and there
hasn't been a data zero, you insert

00:23:13.640 --> 00:23:17.910
one more header byte there that you
know that you're going to throw away.

00:23:17.990 --> 00:23:21.360
It's basically kind of like
sticking an extra  header byte

00:23:21.490 --> 00:23:22.810
in the middle of the message.

00:23:23.115 --> 00:23:23.895
<v Amos Wenger>Right, yeah.

00:23:24.240 --> 00:23:28.295
<v James Munns>So, you know, that if
you ever see a 254, you go: ah, I'm

00:23:28.295 --> 00:23:31.935
going to be counting towards the
next one, but there's no actual zero

00:23:31.935 --> 00:23:33.705
data byte that I'm replacing there.

00:23:33.715 --> 00:23:36.075
It's just a placeholder.

00:23:36.245 --> 00:23:42.663
Which means your worst case is bounded
in the worst case to be two bytes

00:23:42.753 --> 00:23:46.543
plus one for every 254 data bytes.

00:23:46.753 --> 00:23:47.523
<v Amos Wenger>Right, yeah.

00:23:47.523 --> 00:23:51.558
<v James Munns>Which means effectively
for it, like if your max message

00:23:51.558 --> 00:23:56.658
size is like a kilobyte, your worst
case overhead is something like

00:23:56.668 --> 00:23:58.638
six bytes or something like that.

00:23:58.878 --> 00:24:02.098
You can see it scales really
proportionally to the length.

00:24:02.118 --> 00:24:05.858
And most of the time you do just
randomly have zeros in there.

00:24:06.068 --> 00:24:08.058
So you don't even hit the
worst case very often.

00:24:08.058 --> 00:24:09.523
<v Amos Wenger>Yeah, Yeah, that's...

00:24:09.613 --> 00:24:12.043
that's so- I like that
because it's a trick.

00:24:12.043 --> 00:24:15.863
It's just someone figured out a
nice design and it's it's as old

00:24:15.863 --> 00:24:17.833
as the world itself almost...

00:24:17.883 --> 00:24:19.833
like the 70s probably?

00:24:20.288 --> 00:24:21.958
<v James Munns>And it's one of those
things where this is something that

00:24:21.978 --> 00:24:25.668
is used in telecoms, like when you're
streaming data between telecoms and

00:24:25.668 --> 00:24:28.978
things like that, and you're trying to
do message framing of where does one

00:24:28.978 --> 00:24:30.883
message end and when does one start.

00:24:31.113 --> 00:24:33.353
This is something that's fairly
simple, which means you could do

00:24:33.353 --> 00:24:35.973
it in electrical logic or whatever.

00:24:36.453 --> 00:24:42.053
And there's been a lot of papers on this,
of like: how do we do message framing?

00:24:42.053 --> 00:24:45.413
Where people come up with these
incredibly complex encoding schemes

00:24:45.413 --> 00:24:48.613
and things like that and the baseline
usually is COBS because you go "Well,

00:24:48.613 --> 00:24:53.093
it's not perfect," but to be honest,
it's dumb as hell and it's really easy

00:24:53.113 --> 00:24:55.193
to do in either hardware or software.

00:24:55.593 --> 00:24:58.373
And you get some of these
robustness checks for free.

00:24:58.373 --> 00:25:02.073
Like, you know, I can tell if I
have a properly terminated message,

00:25:02.233 --> 00:25:05.583
I have a very clear way to say
if I get a zero we're done here

00:25:05.923 --> 00:25:06.473
<v Amos Wenger>Yeah, yeah.

00:25:06.703 --> 00:25:09.873
I was thinking like, I now better
understand the slide where you

00:25:09.873 --> 00:25:11.433
said, "How do we get unconfused?"

00:25:11.473 --> 00:25:14.483
Because this is a nice way in COBS
is like, if you don't know what the

00:25:14.653 --> 00:25:17.373
heck is going on, just wait for a zero
and try again with the next frame.

00:25:17.573 --> 00:25:17.933
<v James Munns>Exactly!

00:25:17.973 --> 00:25:19.333
Yeah,  and if you ever get confused if
you're halfway through a message, and

00:25:19.333 --> 00:25:23.452
you go, "... That didn't line up," like
something just went wrong or I had an

00:25:23.462 --> 00:25:27.952
error locally, or I walked in from the
bathroom, you go, "That message is gone.

00:25:28.082 --> 00:25:28.622
Sorry.

00:25:28.762 --> 00:25:29.252
Sucks.

00:25:29.272 --> 00:25:32.392
But now I know my job is to sit
here and wait for the zero."

00:25:32.572 --> 00:25:34.052
And then we can at least
move on from there.

00:25:34.052 --> 00:25:34.432
<v Amos Wenger>Exactly.

00:25:34.451 --> 00:25:37.331
But you don't have  to like,
tell the peer: hey, can you...

00:25:37.351 --> 00:25:38.961
I don't know, like something went wrong.

00:25:38.971 --> 00:25:42.981
You just need to keep listening and ignore
until you reach another, another zero.

00:25:43.254 --> 00:25:44.959
<v James Munns>And that's your
resynchronization point.

00:25:44.999 --> 00:25:48.439
It means that you have this
perfect resynchronization point,

00:25:48.509 --> 00:25:50.249
which is a zero on the data.

00:25:50.774 --> 00:25:53.624
This is in fact so useful, I
use it in Postcard and Postcard

00:25:53.644 --> 00:25:55.154
has a built in flavor for this.

00:25:55.214 --> 00:25:59.924
So as you're serializing, you can do
the COBS encoding at the same time,

00:26:00.224 --> 00:26:04.994
or as you're deserializing you can do
the COBS decoding at the same time.

00:26:05.234 --> 00:26:09.264
Where essentially as you're pulling
bytes from the wire or pushing bytes to

00:26:09.264 --> 00:26:13.744
the wire, you can just do this encoding
or decoding because it requires like...

00:26:14.344 --> 00:26:19.434
a couple of lines of logic
and like one or two bytes of

00:26:19.774 --> 00:26:22.244
intermediate context to store.

00:26:22.544 --> 00:26:26.894
But it's basically stupid simple and you
can do it for almost free as you're going.

00:26:27.272 --> 00:26:28.892
<v Amos Wenger>God, I
like this trick so much.

00:26:29.991 --> 00:26:33.381
<v James Munns>So if people are building
it themselves and you've got a serial

00:26:33.381 --> 00:26:36.951
port from here to there and you have
to do it and you don't have a reliable

00:26:36.951 --> 00:26:42.391
line, like TCP and TLS: probably use COBS
with something like Postcard so that you

00:26:42.391 --> 00:26:46.761
can just get the encoding and decoding
of frames for free and try and think

00:26:46.761 --> 00:26:50.571
of what your hardware can support to
figure out how to hardware accelerate it.

00:26:50.591 --> 00:26:53.451
And we have talked about DMA.

00:26:53.741 --> 00:26:55.771
And so this is one of those fun
things where you can just say:

00:26:55.771 --> 00:26:58.911
hardware, give me messages until the
line goes quiet for a little bit.

00:26:59.151 --> 00:26:59.501
Okay.

00:26:59.501 --> 00:27:01.761
Now I'm going to look through
that, see if I got a zero.

00:27:02.056 --> 00:27:06.126
Pull that out, save the remainder off, and
then wait until the next time my hardware

00:27:06.126 --> 00:27:07.566
pings me for something to be ready.

00:27:07.616 --> 00:27:11.299
So, it is one of those things that it's
very easy to automate at this point.

00:27:11.329 --> 00:27:12.359
Which is why it's one of my favorite.

00:27:12.379 --> 00:27:14.959
And I have to say thanks to
Whitequark, who showed me, because

00:27:14.959 --> 00:27:16.939
I was trying to come up with
something dumb and complicated, and

00:27:16.939 --> 00:27:20.609
Whitequark went, "Have you heard of
this thing from the dial up days?"

00:27:21.303 --> 00:27:22.483
and I, I went, "No..."

00:27:22.483 --> 00:27:24.693
and it became my new favorite
thing, and it's, it's just

00:27:24.693 --> 00:27:26.403
become one of my go-to tools now.

00:27:31.918 --> 00:27:35.218
The Self-Directed Research podcast
is made possible by our sponsors.

00:27:35.458 --> 00:27:39.508
We offer 30 second host-read ads- like
this one- at the end of every episode.

00:27:39.928 --> 00:27:41.998
Not sure how to get your
message out, or what to say?

00:27:42.328 --> 00:27:42.838
Let us help!

00:27:43.194 --> 00:27:47.104
If you'd like to promote your company,
project, conference, or open job positions

00:27:47.104 --> 00:27:51.134
to an audience interested in programming
and technical deep dives, send us an

00:27:51.134 --> 00:27:56.774
email to contact@sdr-podcast.com for
more information about sponsorship.

00:27:57.424 --> 00:27:59.244
Thanks to all of our
sponsors for their support.

