And now for something completely different
A bit about how serde works well, a bit about how serde doesn't work well, and a bit about a different, questionable way of doing things
Video
Audio
Show Notes
Episode Sponsor: CodeCrafters
- serde
- Diagnostic and Statistical Manual of Mental Disorders (DSM), delayed sleep phase disorder
- David Tolnay (@dtolnay)
- James' Bluesky post
- tokei - a tool for counting lines of code, cargo-expand
- Return value optimization (RVO) and named return value optimization(NRVO)
- postcard-forth
- Atkinson Hyperlegible
- type erasure, the void type in C and
void*
or 'void star' pointer, opaque pointers offset_of!
macro,offset_of!
for enums- Virtual Method Tables, or vtables
- monomorphizing generic functions to create a vtable in
tokio
- "splatting" in Python
repr(Rust)
,repr(C)
, Experimentalrepr(crabi)
- Foreign function interface (FFI)
- Forth programming language
- the
stacker
crate, setjmp/longjmp - rkyv, Rust serialization benchmark
- chrono, nalgebra
- miri
- rust-lang RFC: Unsafe Set Enum Discriminants
- Blog post: Building Plain Old Data from Scratch
Transcript
Amos Wenger: James, what do you have for us today?
A different serde
James Munns: Today's a different serde. We've talked a lot about serde things, and now it's time to look a little deeper than Postcard and, uh, look at serde itself.
A surprising number of people that I talk to don't realize that serde comes from the name serialize and deserialize. And also everyone says it different, I've heard 'sayrd' I've heard 'sir-dee', I say 'sair-day'.
But that's one of those things that I'm sure people will let us know in the comment that they have opinions.
Amos Wenger: I say 'saird' because I'm French and also because that way my joke serde alternative works, but I don't know if I'm allowed to bring it up because it's your episode James.
James Munns: Serde as a quick recap, to understand why I'm proposing something different, it's good to understand what I'm talking about. So serde has two main parts.
It's got a front end that thinks about Rust types, so it's firmly in the world of Rust data structures.
And it's got a back end that thinks about the wire format. This is your JSON, your TOML, your whatever collection of bytes you'd like to represent your stuff when you write it to a file, or send it over the wire, or anything like that.
And everyone uses serde, that's a really good thing, because serde is much better than the alternative of casting your structs to arrays of bytes, and then memcopying those to the wire, which is a serious thing that unserious people do.
Postcard uses serde! So postcard's built on top of serde, because it has this flexibility between the front end and the back end, I can define a very compact binary format, and serde's written in a very good way that works with embedded systems, so even little microcontrollers can use serde, the same way that big servers use serde, so it's wonderful.
Despite the whole thing that I'm going to say in this presentation, you should keep using serde. Serde is a very good tool that I like. Despite having messed around with other things today, you should absolutely keep using serde...
But... serde has some problems, and we're going to talk about those a little bit.
Amos Wenger: Oh boy.
Serde has some problems - can I do it better?
James Munns: Because I am who I am, sometimes I wonder whether I could do it better.
Maybe not for the general case, but at least for what postcard needs. Because postcard is a very small, simple format, and it doesn't need necessarily all the power of what every single possible format could need. Like, uh, serde has to support for all the formats that it has to support.
Amos Wenger: Okay, I'm gonna make this harder on you by interrupting you because it's my job as a co host. You said "Because I am who I am," and I recently had a DSM party at my house. I don't know if you know what it is. It's where you read the DSM III, IV and 5, the diagnosis manual for mental disorders...
It's very interesting to see that you only have a thing if it's a problem for others. So as long as it only bothers you, you're not clinically anything.
James Munns: Hehehehehehe. Yeah. there's some sleep phase, delayed sleep phase disorder or something like that and it was weird to read Wikipedia and I go, "They're describing my life." But yeah, I think I kinda know what you're getting at.
How serde works:
James Munns: But, how serde works: so I mentioned that it's got a front end and a back end.
Serde has a data model, so it's got essentially a limited palette of types that it thinks in. The palette's not so limited, there's 29 different types.
This covers everything from primitive integer types, or primitive other numeric types,
arrays like string slices, arrays of T, byte slices, things like that,
composite types like structs and tuple, so a type that's made up of other types and
discriminated types like enum and their variants and all the different flavors of enums that you can have. So these all collectively make up the 29 data types that serde thinks in.
I mentioned that there's two parts. The front end's job is to turn Rust types into data model types, or for deserialization the other way around, data model types into Rust types.
And the backend takes those array of strings and turns them into bytes on the wire. For JSON, that's text. For postcard, that's a length prefix and then just the raw UTF-8 bytes on the wire.
But that's the two step process. We kind of have that even middle meeting point from whatever type you want to whatever format you want. There's sort of always that middle step of picking which data model type you're choosing to represent that data as.
The visitor pattern
James Munns: The other way that serde works is by using what's called the visitor pattern. Because there's sort of this two moving part thing, you kind of have to figure out: well, if I, if I was making it totally generic, I'd have to be generic over the type and the serializer or what?
And serde chooses a programming pattern that's popular in other languages. I know I've seen it in C++ and Java, I think. But the visitor pattern, which is essentially you hand the type the serializer and it drives that pattern. These types are given a serializer,
and each of those data model types have a serializing method. So the type can say:
okay, I've got this field that's an integer- or let's say specifically a u8- I'm going to call .serialize_u8()
on it, and I'm going to give it the u8, and then it will turn that into the bytes on the wire. And so the type is the one that drives the visitor, and then the data format is the one that says: this is how I turn a u8 into whatever representation makes sense in JSON or TOML or anything like that. So you would call your serialize u8, serialize str, and you're handing it the instance of that type that serde natively thinks in.
So this drives the backend. Postcard says, "Ah, I'm being told to serialize a u8. I will put that on the wire. I'm being told to serialize a str. I'm going to put that on the wire." It doesn't really think about the context of what's going on. It just says: I've been told to do this, "bleh". Data goes out on the wire or: I've been told to retrieve a u8 or a str from the wire. I will try and consume up some bytes and turn it into a str or a u8 or whatever.
Now this visitor driver code is usually derived. You usually don't have to write this even for other types
You just throw a derive serialize on there and a procedural macro goes in and generates: okay, I have seven fields. The first field is this. Do this, do this, do this, do this. And it expands out into essentially driving that visitor at all points.
So you can take a type like this, a struct that has two fields, put derive serialize on it, and you get a relatively lot more code, which runs this process.
We're implementing serialized for PWM error. It starts a struct, it does each field of the struct, and then it ends a struct.
Postcard doesn't care about the start and the end, but if you were in JSON, you might want to like indent and start a new curly brace, and then do the fields But it's got the ability to handle all of those cases.
Amos Wenger: I, um, cannot help but notice on this slide, the third argument to serialize structs is false as usize plus one plus one? I'm sure there's a reason for it, but it's a proc macro, why can't it just say two?
James Munns: I have no idea. It's probably one of those, like, " Who am I to question dtolnay when comes to writing proc macros?" These are arcane that I do not want to know about.
Amos Wenger: I will look into this. I want to look into this!
James Munns: Yeah. The other thing is serde also, I think, supports all the way back to like Rust 2015 or whatever. So sometimes it does some weird anachronistic things because it has to work for all versions of Rust and things like that.
Amos Wenger: Granted, but I'm pretty sure we've had integer literals for a while. So like zero has been there the whole time that I'm pretty curious what it's there for.
James Munns: That's a whole other episode, I guess.
And there are some problems with deriving all of that code other than just false as a integer for some reason.
Problem: this generates a LOT of code
James Munns: So the main problem is that it generates a lot of code. It can generate a sometimes double digit multiple of the input source that you are giving it.
This is a picture of a Bluesky post. This is from a customer project where it's a little bit silly and I think they've fixed it since then. But they have 50,000 lines of code in one crate of raw source code, like if you just run tokei on the project, you get about 50,000 lines.
If you do the expansion of that, so you run it through cargo-expand and then count the expanded lines, it's over 200,000 lines of code. And one of the file that is basically just type definitions was a little over a thousand lines of code, and after expansion of just that one file, became 22,000 lines of code. There's a lot going on there, but a lot of code is a lot of code.
Amos Wenger: Yeah, and the thing is, it's all generic code as well, so it's going to be instantiated using a serializer or a deserializer implementation. So that's just template in C++ parlance, I guess.
James Munns: Yeah. And this has measurable impact. I mean, the reason I was looking at this is because it took 40 seconds to do a build of that one crate and this was their main binary crate. So every time they touched it, it would take like 40 seconds to rebuild, which was painful.
And like you were saying, there's a lot of monomorphization and inlining. This isn't necessarily serde's fault, but Rust, like you said: generics are templates, basically, so it will stamp out a version of this. If you have multiple serializers, it will stamp out a version of each of these that are used. And when it comes to the optimizer, we will inline a lot of this, which means some of those instantiated copies will end up being one very large function, and then you'll have multiple versions of that function because they get inlined in different places.
And so it can end up taking a fair amount of compilation and optimization time, but then end up as a lot of code in your binary as well.
Amos Wenger: Yeah, that's what I was thinking when you were presenting the visitor pattern and how we have separate methods for separate types. I was thinking, because in my serde alternative- which it's not the topic of today's presentation, believe it or not- I just have an enum. But then if you have an enum, you have dynamic dispatch. You have to match on the discriminant of the enum and then know if it's a string, if it's a number, if it's whatever.
Also, it's inefficient for a bunch of other reasons, but the reason serde has methods per type is that it can inline everything. There's no dynamic dispatch at all. It all gets compiled down to just compact code, and you pay for that in compile times. And I think a lot of the times when people complain about Rust compile times, they're actually complaining about serde generic instantiations.
James Munns: It's one of those things: it's a useful tool. And that's one of those hard problems of in isolation of just a couple of types, it would be fine. But that customer project that I was talking about, they have hundreds of types that each have many dependencies that all generate code and to be fair, on one hand it's a problem of success of it has gotten so successful and it's so widely used. It becomes the easiest thing to point at. I don't want to point too hard because it is a good tool, but it gets creaky at a certain size.
Amos Wenger: Of course, now that I think about all this: if you were to, like, take types and put a hundred in different crates and then export non generic functions, then you could actually cache some of it. The problem is that you have all the types in one crate, in 50,000 lines of code. I'm just, realizing that I've been working for six months on an alternative when actually there was a compiler hack that could have done everyth- Don't mind me. I'll be over there.
Problem: the visitor pattern is recursive
James Munns: Okay. Another problem is that the visitor pattern is recursive. When you call serialize on a field that is being serialized, you're then calling that serialize method, which then if it has children, calls those, which calls those, which calls those, which is not always necessarily a problem.
It's a nice way of writing the code because every struct is essentially isolated to its world, and when you have to go into another world, like for serializing one of your children, you just call a single serialize method of it. So it makes it locally very simple, but when you have very deeply nested types or things like that, the call chain can get somewhat deep on this and especially the more functions you call, you're eating up stack frames and usually returning things back down those stack frames, which can add up to a little bit of cost.
Amos Wenger: Did you actually run into that in embedded or did you run into the opposite problem where like, the structure is not that deep but the stack that you have is smaller than you're used to on desktop, for example?
James Munns: It can. Especially for deserialization, data is returned by value. And sometimes because we don't have a heap vector, we will use stack allocated vectors where you say: okay, here's a bounded vector with room for 32 items. And when you return that deserialization step by value, even if you only deserialized one item, if it's a fixed size collection, essentially you have the capacity for all 32, you're going to return that by value down the stack frame. Which means you've created that whole 32 item vec in one function, and then returning it by value means you memcpy it back to the caller frame, essentially.
And this is one of those things that LLVM can optimize sometimes, but RVO is not a guaranteed optimization.
Amos Wenger: RVO is Return Value Optimization?
James Munns: Yeah, so there's RVO, which is Return Value Optimization, and NRVO, which is Named Return Value Optimization, and the idea is that essentially you make space in the caller, and use that in the callee, so that when it's populating that, it's actually populating it to the destination. And when you return, you actually end up just returning nothing, and the data is already there, is the transform you're kinda going for there.
Amos Wenger: This might be where you're driving, but as far as I know, serde has a deserialized in place, like a lot of Rust APIs include an in place version because of the lack of guaranteed RVO and NRVO, is that...?
James Munns: I haven't run into it.
Amos Wenger: It's doc(hidden), is the fun thing. So I only found out about it because I've been snooping around the source code.
James Munns: I haven't run into that. I might have to poke into that.
Amos Wenger: Yeah, you should.
Postcard: more flexibility than we need
James Munns: And for postcard it has more flexibility than we need. The visitor drivers here have the capability: so, for example, in JSON, the fields don't always have to be in the same order. You might order them differently. Like JavaScript might order the fields in one way and Rust orders it in another way, which means for certain types, you have to be able to visit each things out of order, fill them in and still end up with the ones that you need.
Postcard does not do that. It says they are always in the lexical order. Tuples are always, you know, zero to whatever. Structs are always top to bottom. And then when you get into attributes like rename or skip serializing if or skip deserializing if, there's hooks at certain point where the visitor can change behavior or even flatten things. Which one makes postcard very upset because it doesn't like that And two it's just flexibility that you don't necessarily need in all cases I'm sure there are people who love those functionalities, but at least for me and for postcard... I don't want it.
Amos Wenger: Wait, so are you paying for code that is able to handle fields out of order even though that never happens with postcard?
James Munns: So you get visited with key value pairs I would imagine and then you can do whatever you want and you could in the implementation assume that they're in the expected order? I don't know.
I think it's one of those things that if it doesn't exist, it's almost certainly optimized out. I believe that serde has the latitude for that, but I don't necessarily think that we're paying extra for that in postcard by not using it. There's certain attributes that do trigger that to happen, and these are the ones that confuse postcard, because postcard doesn't keep track of what you visited and not visited at each step. So if you try and do things out of order, it just grabs wrong fields in the wrong order I'd have to check. The answer is I don't know.
Amos Wenger: Got it.
James Munns: This is a fairly reasonable way to approach the problem of serialization. You break it down to each type, you locally figure out how to handle each type, and then you handle it, and then you do it sort of recursively. It's like a very straightforward, well thought out way to approach this problem. So how do we even think about other ways of approaching this problem? Well, uh, I spent some time looking into a weird archaic programming language called forth and it broke my brain a little bit.
Postcard-forth
James Munns: Now I'm going to talk a little bit about postcard-forth, which is an experiment I did a while back that had some interesting results. It's not ready to use by any sort of the imagination.
I'm going to slap a big old experimental sticker on that because it's- it's gross, but it's an interesting research project.
How postcard-forth works is a little different.
We do keep mostly the same data model. We say, look, there's a limited palette of types that you can convert to and from. That's sort of our bridge from Rust world into the wire world. And postcard really already has this because I only define the 29 different things you can put on the wire in postcard format. So we're kind of already there. Mostly keep the same data model.
Skipping ahead a little bit: we would have a much simpler derive macro. So rather than generating that visitor pattern, which is going to generate functions for us that are going to drive that serialization or deserialization process,
we only generate a list of field offsets and function pointers for functions that know how to serialize or deserialize that data.
So for a type that looks like this:
outer that has three fields, one of them being inner. A struct inner that has two fields, some integers, some strings, some Vecs, things like that.
Instead of something vaguely like this from serde,
where we'd have outer, which is going to call str, then serialize on the inner struct, which is going to call ser- you know what I mean. Instead of implementing the trait like this,
instead we generate something sort of like this:
which is a constant array. And each element of the array is a tuple of a usize and a function pointer. The function pointer takes a const pointer to the unit type. So basically a type erased pointer and a mutable reference to an output stream of bytes.
So what this is trying to say is we have an array of things. We have the offset from the base of the struct to this specific field, and then the function that's going to handle that.
Amos Wenger: I have several comments. First: if you're listening to this, every time James says this, he is actually pointing at a screen. You could be having this on your screen. You can find the slides at sdr-podcast.com/episodes if you want to and then find the episodes we're talking about and you can have the slide deck on there.
Second: I think it's hilarious because I'm looking at the slides, that James went the exact opposite direction that I went with the slides where he's using a pixel font. And I went with a pretty- what did I get- Atkinson Hyperlegible? So this is like polar opposites typographic choices.
Thirdly: James, isn't that ffi unsafe to have a const pointer to the unit tuple?
James Munns: What do you mean, ffi unsafe?
Amos Wenger: It probably doesn't matter actually, because this is staying inside Rust. This is a C pattern. In C, when you have an API to do something, there's always a void pointer.
James Munns: Yeah, this is void*. This is Rust void*. Yeah...
Amos Wenger: Yeah, yeah, this is void* pointer, context, whatever. you'll see that in all C style APIs. So seeing it here, brings me back. The trauma- it's all coming back to me.
James Munns: Oh yeah, this is explicitly type erased pointers to data.
Amos Wenger: But it's not exactly the same thing like I think if you put them in an extern c function, the rustc will yell at you saying like, "It's not guaranteed that this is actually same size as a void pointer or whatever..."
James Munns: You can't pass a tuple because C doesn't have a concept of zero size types and C++ doesn't either. But a pointer to a zero size type is still allowed. That's how I believe opaque type works in C is that it's a size that the compiler does not know about. And that's essentially how unsized works in C because it doesn't have a proper concept of unsized, but, uh, I could be wrong. I think you get the point that I'm going for like void* pointer here.
Amos Wenger: Sure. Sure. Sure. Sure. My other question was isn't offset off something that was just stabilized?
James Munns: Offset of enums is stabilizing. Offset of has been there for a while, but probably this year or last year or something like that.
Amos Wenger: Something like this was in a recent changelog. I forget which...
James Munns: They're still working on it because offset of enums is not totally stable yet, and we'll get to that cause there's an RFC I'm going to mention, but... offset of itself is stable at this point.
Amos Wenger: Also I'm looking at the slide, which again, go to sdr-podcast.com/episodes to find out: but does ser_deref_slice_u8
turbofish vec u8, which gives you the instantiated version of that, the monomorphized version of that function, and that's a thin function pointer.
James Munns: Yes, exactly. You can monomorphize generic functions and not put the parentheses at the end of them. So the turbofish actually does the instantiation of that specific version. So it is no longer a generic function. It is a specifically typed function pointer at this point. You use this a lot in Waker dispatch tables or what are they called... the vtable that you have to create for async functions.
Amos Wenger: Right, for futures, yeah, yeah. This makes a terrifying amount of sense, but I don't think I would have thought about it. You just blew my mind, James.
James Munns: So we ended up with this array, and even though we're still generating, you know, the length isn't so different, especially for these simple ones. The neat part about this is the offset is a usize, so it's one word, four or eight bytes depending on what platform you are. A thin function pointer is one word.
So, for those two fields, that is 8, 16, 32 bytes for that, plus it's a slice, so maybe another one and a pointer or something in there. Something on the order of a handful of words, and same for outer. So, code is usually pretty dense, but I'd probably bet that this is maybe a little shorter than even the generated optimized code, particularly before you've inlined it and stuff. And we're just working with data. It is a constant value that doesn't get used unless it gets used. It's not even code that has been generated at this point.
I've already talked about doing tricky things with macros and const functions to merge arrays of things so that you end up with one array. So maybe for that outer one, we don't even have that call to the offsets of the parent fields and then calling into some function that handles the inner. We just splat the entire list into one list and all of a sudden we have the entire makeup of this. And even though these are repr(Rust) types, offset of works with them. So you can take the base of one struct and add the offset to that field and then add offsets to the subfield and splat it and get an entirely correct offset of that subfield.
Amos Wenger: James, token listener here. First of all: where- what- wha- which context did you learn the word 'splat' in? Because I know what you mean. I'm thinking of Javascript.
James Munns: That's a Python thing. Yeah. Splatting is when you had an array of three things, if you were to splat it in Python, that would be then passing the function three arguments instead of an array of three elements.
Amos Wenger: Because this is a presentation about Rust and serde, it's the equivalent of serde flatten, right?
James Munns: Sort of, except for we're not flattening the types necessarily. We've just merged the child list with the parent list so that the parent list just- it's a flattening operation.
Same kind of thing we are flattening here for sure.
Amos Wenger: Cool. And then the other question you offhandedly mentioned, it was repr(Rust). So that's r e p r parenthesis Rust with a capital for some reason. And we have repr(C) as well. And we have other ones that I forget. What that does is it defines the layouts of structs, if I'm correct. It defines the ABI.
So if you have something that's repr(C), that is well defined, that is not going to change, that is what we use for FFI, foreign function interface. But repr(Rust) can change from one, even across two compiles, there's no guarantees that it will stay the same.
James Munns: Yeah it means you're not allowed to know, it at least are the rules of repr(Rust) so you can't rely on that but using the offset of macro says: whatever the compiler thinks this layout is, use that offset, it's out of my hands.
Turn ser/de into a "stack machine"
James Munns: What we're trying to do here is, instead of using the visitor pattern, we're turning serde into a stack machine. This is where we get back to forth. Forth is an interpreted programming language that is very simple. It has very few primitives. It has no local data. The only local data is the stack. And, the way functions are defined is essentially as arrays of other functions.
So, you name an array of function calls, and that becomes a new function. And you don't have to worry about passing data to functions, because there is no way to pass data to functions. There is only the stack. And so it ends up looking a lot like recursion, but you sort of manage it by the VM itself, rather than the compiler creating this machine for you.
So we turn this into essentially a stack machine, which is where the name postcard-forth came from. And of our stack machine or our interpreter, if you want to call it that, the input is the list of offsets and functions. So it's got essentially a to do list of: okay, for each of these offsets for serialization, I need to take my base pointer, increment it by this much, and then call this function with that base pointer.
And that function is going to know that it's expecting a pointer to a u8. So it's going to cast that void* or *const () type back to the correct type and go: okay, that's a pointer to a u8, read that and encode it onto the wire.
And that becomes our output is just some kind of stream of serialized bytes. So it could be to a vec, it could be filling up a slice and then keeping track of how many you've written to that or however you'd like to, but you get some kind of stream output primitive that you can use there.
As I mentioned, this is basically an interpreter. We're interpreting the definition that we generated at compile time of all these fields and walking through that as if we were a Python interpreter or something like that. And our output just goes to the output stream.
And Forth as an interpreter says that "data is code." One of the LISPs said "code is data" and Forth says the opposite: "data is code." You can interpret any chunk of data as if it was code, if you write code that can handle that.
And that's exactly what we're doing here is we're generating data instead of code at compile time, and then we write one interpreter that knows how to turn that data into data on the wire.
Amos Wenger: I'm gonna stop you there. Again, it's interesting because you went the exact opposite direction that I did. I also was like, "Well, I like serde, but..." But then the direction I was in, instead of doing the stack manually, I took the one stable Rust feature that generates a state machine for you, which is async functions.
And I just did that, and that also solves the infinite recursion problem. But I'll present my take on it in a different episode. It's so interesting! You're like, "Let's do everything manually." I bet you don't even have CodeGen for your thing. I bet you've been writing those arrays manually, haven't you?
James Munns: I did at first. Now I have a macro that does that. So I actually do have a version of this that works. Not everything works perfectly and only for some subset of works, but-
Amos Wenger: Is that against the rules of the podcast? Like, we can only present far out ideas that don't actually work?
James Munns: I've been talking about postcard RPC and postcard RPC exists. It's just something I keep changing.
Amos Wenger: So you keep claiming! All I see is, like, screenshots on Bluesky. I don't, I don't know. I'll believe it when I see it.
James Munns: It counts.
So why is this good?
Good things first
James Munns: One good thing is that there's only ever one serialization deserialization machine. It doesn't get monomorphized for every single version of that. And serde generates a very simple, easy to optimize chunk of code, but it generates one of those for everything. And in this, there is just one: the VM that handles whatever data that you throw at it.
And it is an explicit stack machine. You can have a stack in there, and as you traverse down the stack, you can say, "Ope, my max stack is 16," and if you ever try to push a 17th item onto the stack, the stack machine just goes, "Oops, out of stack! Serialization failed." Which is a challenge sometimes, particularly when you're deserializing unbounded types. If you're deserializing a vec of enums that can contain a vec of enums, that can- you can get sort of like that zip bomb problem when it comes to deserialization and serdeJSON has some workarounds for that and serde itself has some workarounds for that where it'll try not to allocate too much.
It's a hard problem when you don't have bounded types when you are doing recursion because the compiler has no idea what your stack limit is at your runtime and it will just go until the operating system says, "Fail! Dead."
Amos Wenger: Yeah.
James Munns: And then your program ends, which is not a very good denial of service protection.
Amos Wenger: If you just have like one very, very large type, they have to be cautious because they don't know how much stack one function will take. If it has big locals, then it's a fat frame. So they have to leave sort of a red zone of like: okay, we assume that we can call at least one more level recursion, everything related to stack bounding in serde is a huge hack, including the one that lets you technically recurse infinitely called stacker.
I don't know if you were going to mention that. Yeah.
James Munns: Oh yeah, yeah.
Amos Wenger: Just like resize the stack and then swap to it. As far as I know.
James Munns: Yeah. I've seen that. I have no idea how it works.
Amos Wenger: It's terrifying! It's just straight up changing the stack pointer.
James Munns: And embedded, if you run out of stack, you just run into your global variables and start corrupting them.
Amos Wenger: Yeah, no, it's like detecting you're going to run out of stacks and making a bigger stack, switching to it, and then I guess switching back before you unwind? I'm just terrified what happens if you switch stacks and then you recurse back- I don't know, you, you, something, setjmp/longjmp
?
Like something's-
James Munns: No idea.
Amos Wenger: Somehow GOTO into C code? I don't know how that would work, but that's terrifying.
James Munns: So, like I said, I have a version of this that at least for some values of works, works. And the interesting thing is that there's a really good serialization and deserialization benchmark written by the author of rkyv. It's calledthe Rustc Serialization Benchmark or something like that.
It's something that I've used a lot and was very helpful for tuning Postcard in that it has a couple of fairly large datasets for log files and structured save files from Minecraft and then tessellation things It exercises a pretty good set of what serde can do. And it's used as sort of a semi objective benchmark between different formats, both in size, as well as speed.
But I managed to get just enough of postcard-forth working so that I could implement the same test sets from this benchmark on this, and then compare it against postcard with serde versus postcard with forth.
Serialization benchmarks
James Munns: And so these are the four different benchmarks, and on all but one of the serialized benchmarks- these are the relative ones, but if you go to the repo, there's also the objective ones. In the first run, regular postcard is 67 percent of the speed of postcard-forth. In the second test, postcard is faster, and so postcard-forth is only 84 percent of the fastest, but in the other two, postcard-forth is 100 percent and postcard is 67.
Basically, it beats it by like 25 to 50 percent on serialization, although it does lose by 15 percent in one of the four tests. And then on deserialization, it's between 10 and 50 percent of the runtime of the other one. Of this very limited test set, on seven of eight tests, it's faster than postcard with serde. And in the one that it's slower, it's only a little bit slower.
Amos Wenger: That is impressive. I've had this question in mind: you had pointers to functions- are those only for postcard? How would you make that work for more formats? Would you just derive different tables?
James Munns: Yeah, that's a good question. I wouldn't.
Amos Wenger: Okay, cool. Just checking. You'd know I had to ask.
James Munns: Yeah, yeah, no, it's very reasonable. And this is that like cheating by doing less. I'm cheating by only supporting one format that I know the qualities of in and out. So I know postcard can run through this.
You maybe could make it work with JSON and things like that. And that would be interesting, but that's not something I've tried to do. Maybe it could be possible by just replacing essentially the set of standard functions, but it would be much more invasive than it would in serde for sure. Cause it's not built with that degree of freedom in mind.
Amos Wenger: I feel like it's a fair comparison though, because serde is uncompromising in terms of performance. It's like, "We will monomorphize and inline all the things. We're trying to be as fast as possible, even if you're paying compile times. There's just no way to opt out of that.
James Munns: And the other thing that's very relevant for me, especially for embedded is that initial testing shows that it's usually smaller, both in the amount of code that you generate and the actual binary size.
Amos Wenger: Usually? What do you mean usually?
James Munns: I say usually.. Actually in all my tests, it was smaller and faster, but you know-
Amos Wenger: How could it be?
James Munns: It's not an exhaustive test, you know, lies, damn lies and benchmarks.
To control for this, I wrote a code generation tool that would generate types that were arbitrarily deeply nested, so they would reference each other and stuff like that. And then I would build an embedded binary because it's no standard, I can control what goes into it much more. Which means I can have the baseline where I do no serialization and deserialization.
And then I just have a format that reads data in from the PC, deserializes it, and then reserializes it. I didn't actually exercise it. I can see what the real code size is because I know nothing's being optimized out because it's reading in from the PC, so it can't assume that anything can be optimized out.
My baseline firmware image that does nothing but I think say "Hello World!" And read data from the PC is something like 10 kilobytes and the whole project is 53 lines of code when I expand it. I've tweaked postcard-forth because I was trying to figure out like, is it better to do this or this?
But in most of the cases- for example, just the baseline postcard serde and postcard-forth for 512 of these nested data types that might call each other. The compiled code size for serde was 640,000 bytes, and the compiled code size for postcard-forth was 395,000 bytes, so like two thirds of the size.
The actual source code in the file, including those types that I generated, was about 10,000 lines of code. Serde turned that into 181,000 lines of code, and postcard-forth turned that into 60,000 lines of code. So again, about a third of the size.
And then when it comes to compile time, I did measure both the total compilation time and just of this crate using cargo timings. And the difference in that compilation time is about 12 seconds for postcard-forth and about 26 seconds for postcard serde. And this is like a mean test of generating 512 types where most of them include seven or eight of different ones. It's about as bad as it can get. But in this stress test, the sizes are typically about a third of generated code, about two thirds of the actual compiled binary size after optimizations. And it seems to be fairly consistent when I run it against a bunch of different generated types.
Amos Wenger: Did you go back to your customer and have them try postcard-forth?
James Munns: No! No, no, no, no. They do not need this. They're doing this on a desktop.
Amos Wenger: The public wants to know, James!
James Munns: Yeah, yeah. I don't experiment everything in my customers unless it's really good.
We're still on the positives. For deserialization, it'd be very easy to do this in place. Because if we wanted to deserialize, we could have that base pointer just be a maybe uninit, in the return location and then use that as the pointer that we walk through for deserialization. It is built to be done in place. So there are never stack copies because everything is either done from the source or to the destination.
So why is this not good? Now, we're getting to the red words on the slide.
It's another crate that requires a derived trait for every type. Unless we get reflection in Rust, there's no way to get these offsets and map them to a pointer for every type. If we get reflection, maybe this won't be a downside, but we don't, which means this is another trait that you have to PR to all of your favorite types, like chrono or nalgebra or whatever, where you want them to conditionally implement some another derived type.
Serde is great because the whole ecosystem uses it. So if it doesn't implement serde already, pretty much everyone knows what it is and they're going to just hit merge on that PR anyway.
Without mentioning all of the type erased void* pointers, the engine itself for doing this serialization or deserialization is wildly unsafe. I think it could be done correctly and you could test it with Miri, you could do that, but it's essentially like a raw scripting interpreter that you're plugging into your program, and if I get something wrong then it'll do out of bound writes and do terrible things.
Serde has the benefit, I think, almost entirely safe code. It might be entirely safe code, because it's just generating very simple Rust code that optimizes very well. This is generating not that. It is doing pointer casting, it is doing additions to pointers and reinterpreting them.
It's doing all the nasty tricks that are fun to look at, but when_ I_ say it's wildly unsafe, it is wildly unsafe.
And the other thing is that manually implementing these serialization and deserializing traits is wildly unsafe. If you have the derive macro do it, and it gets the offset of correctly and things like that: totally fine. It will work as long as there's no bugs in the derived macro. If someone tries to manually implement that, like they can manually implement serde today, you could just put an offset of a million and then you're walking off into some other heap thing or you segfault. It is like a designed exploit in a box. I don't know how I could design it worse to be exploitable. You know what I mean?
Amos Wenger: I mean, that's why we have unsafe traits, right?
James Munns: Yeah, it would be an unsafe trait for sure.
We also still need some code to get to Data model Types. You need some monomorphization. Like I mentioned way earlier, a hash set gets represented as an array on the wire, but there's no like slice inside of the hash set.
We would need to do something to go to an iterator to then each element and things like that. So there might still be some recursion or function calls or some kind of monomorphization that's required to get you to that data type.
Amos Wenger: Right, what about arrays? How do arrays work with postcard-forth?
James Munns: Right now, I think I just have a function that takes them as a slice or something like that. Like there's still some cheating for sure in there, but in my test, I only use like strings and vecs of u8s and things like that, they end up becoming that.
Yeah, especially for iterators- basically things that you can't deref to a data model type.
And for that in place deserialization, we need a whole new RFC because there's today no way to set the discriminant unsafely for an enum. So right now the only way to deserialize an enum would be to deserialize it onto the stack and then copy it to the destination. I have an RFC up to change that, but it's an RFC .It hasn't even started to be implemented yet.
Amos Wenger: Yeah, you're the one who wrote it up. When did you? Last week!
James Munns: So it comes out of this. I have a blog post a while back and it was while I was doing this is I realized there was no way to do that for enums. I could do all the spooky stuff for structs and raw data types, but enums, you have to make them on the stack and then copy them. And that's why that perf table had like a no enums field because I was trying to figure out how bad that was when compared to other types.
And it's not quite a linear program either. So when I say that it's a program and you just run top to bottom: Uh, turns out no, you need control flow too, because even just data types are complex programs.
For enums, you need branching. You are only serializing one of possible choices, or only deserializing one of possible choices.
And we need loops, because you have slices and iters, and you have to go through them, and you're not going to inline every possible length of every possible type. It really is a whole interpreter that not just needs control flow-
Amos Wenger: You're gonna have to design your own bytecode, and then the array is gonna be like type of instruction, and then a few operands, and it could be a function address, it could be something else...
James Munns: Yeah.
Amos Wenger: Is that even Rust at this point? Does that count as Rust?
James Munns: Forth has some ways that you handle that, but that's probably a whole talk for another time.
So then the question is: is it worth it? It is wildly unsafe. It is questionably implementable today. And you'll still end up with in some metrics a much worse version of serde, despite being faster and smaller and things like that.
And really the answer is: I have no idea yet. But it was something that I spent a lot of time on and I thought you might find interesting. So, uh...
Amos Wenger: I sure do.
James Munns: We'll see if some calculus changes with something that makes it really worth it.
Amos Wenger: I mean, this is a perfect setup for my take on another serde, which we don't have time for today, but this is so fun. You took all the other choices, I'm so glad we're doing this together, James.
James Munns: I'm excited to see yours now, cause you've talked about yours. I think a bit more publicly...
Amos Wenger: Yeah, but I wanted to make sure it was useful before I actually talked about it. I never knew when the time was right.
James Munns: Oh, I went the opposite.
Episode Sponsor
CodeCrafters is a service for learning programming skills by doing.
CodeCrafters offers a curated list of exercises for learning programming languages like Rust or learning skills like building an interpreter. Instead of just following a tutorial, you can instead clone a repo that contains all of the boilerplate already, and make progress by running tests and pushing commits that are checked by the server, allowing you to move on to the next step.
If you enjoy learning by doing, sign up today, or use the link in the show notes to start your free trial. If you decide to upgrade, you'll get a discount and a portion of the sale will support this podcast.